Review-Yelp-er

The Review Yelp-er (Helper) - A Review Generator that creates a review easily!

A text generation project as part of BT4222 - Mining Web Data for Business Insights.

Minimum word count restriction has posed to be a problem for writing reviews on User-Generated Content (UGC) platforms such as Yelp. This has hindered many satisfied customers to leave feedback on Yelp. We also had a user story from our very own team member while he was on exchange in the US, where he had a positive experience with Yelp’s and most of the recommendations. However, the need for a minimum word count is a chore for him, be it to give positive or negative reviews. Thus, he ended up not writing reviews for some restaurants just because of the minimum word count and this posed an issue for crowd-sourced review platform as they are unable to ensure accurate and updated reviews on the restaurants as most users do not want to write them.

Hence, we thought that we could use Deep Learning to solve this problem by creating a Review "Yelp-er". By suggesting sentences based on an input, this would help users like our teammate by giving them suggestions and possibly even writing the whole review.

Consolidated dataset

Google drive to store our dataset and models as the files are big and can't be be uploaded to github. (https://drive.google.com/open?id=1SwM1IzteNN5q229w_2KE1NFL1CSnj-Hn)

General Setup

Run pip install -r requirements.txt
Download dataset and checkpoints from google drive and place it in the instructions specified in the sections
If running on local, ensure that CUDA is installed and you have a good GPU

System Requirements

Lots of RAM (At least 20GB)
At least 1 GPU
Harddisk memory of at least 20GB (3GB for 1 model checkpoint, Yelp Dataset is 5GB before cleaning)

Data Cleaning

Objective
To preprocess the dataset, which can be found at https://www.yelp.com/dataset

Content

Basic NLP Text cleaning
Convert from json to csv
Inner join business.json and review.json
Filter to keep only current open businesses
Filter to keep review stars 1 and 5
Filter to keep reviews that have more than 1 usefulness rating

Exploratory Data Analysis

Objective
To observe the distribution of words between a good review and a bad review.
View scatterplot at https://zh-tan.github.io/review-yelper.github.io/.
It takes about 2-5mins to load.

Content

Produce scatterplot using scattertext

LSTM

Objective
Train 2 LSTM models on 1 star and 5 star reviews. Note that this is only scoped to food-related reviews.

Setup

Ensure that the reviews1_cleaned.txt and reviews5_cleaned.txt are in the datafiles folder (lstm_final/datafiles).
Code needs to be ran on tensorflow version 1
Code has to be ran on colab with 25gb ram (>12 gb ram will be used) and runtime has to be either GPU or TPU.

Content

Train on food reviews (1 star)
Train on food reviews (5 stars)

GPT-2

Objective
This is the training phase of GPT-2. Elaborated in our report and presentation, we have decided to train on both general and food models.

Setup

Ensure that gpt-2-simple package is installed and you have sufficient RAM and GPU to train the large model
Make sure that dataset is in same folder as notebook (GPT-2/)
This was trained in Colab with GPU and High RAM (30GB RAM) and Google Cloud Platform (1x P100 GPU and 15GB n1-standard-4 CPUs)
Ensure you have at least 10 GB of harddisk space

Content
Train 2 GPT-2 Large models on Food and General dataset.

Trained on General (Review star 1)
Trained on General (Review star 5)
Trained on Food (Review star 1)
Trained on Food (Review star 5)

Autoregressive Property
To demonstrate the autoregressive property of GPT-2, we have created Sankey Diagrams where it shows the prediction of the next token for GPT-2. This is important as it helps us understand how the model is predicting for 1 star vs 5 star reviews.

Review Star 1

Review Star 5
\

PCA on Tensorboard

Demo

Objective
To generate a Proof-of-Concept using the trained GPT-2 models, we created an app that simulates the autocomplete feature. Instead of just suggesting a word, the autocomplete feature is able to suggest 3 sentences based on the input given.

Setup

Ensure that the checkpoint files are in the checkpoint folder (Demo/checkpoint). Download the files from Google Drive link provided above
Ensure you are in the directory of the py files and just run main.py

Instructions

There are dropdowns for you to select the type of model to use (1 star or 5 star)
There is a scale on "complexity", which uses temperature scaling to enable the mode to generate more complex outputs. A higher number results in more flamboyant language.
Key in text
Click submit
Repeat

Example

You may key in any input you like. Let's say the fish and chips were bad but I do not know how to describe it, let's key in "The fish and chips was"
Click on submit and the model will generate 3 suggestions for you to pick
Click on any of the suggestions and the text would autocomplete your input
You may edit the model's suggestion and click on submit once the description is fitting
Repeat until you are satisfied with the length and quality of the review generated

Contributions

Lucas - Data cleaning, Exploratory Data Analysis
Adrian - Data cleaning, LSTM modelling
Zhe Hao - Data cleaning, Scattertext visualisation, GPT-2 modelling, Sankey Diagrams
Ryan - Demo, Field Study
Kai Le - Demo, Field Study

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data Cleaning and EDA		Data Cleaning and EDA
Demo		Demo
GPT-2		GPT-2
images		images
lstm_final		lstm_final
GPT-2.ipynb		GPT-2.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Review-Yelp-er

Consolidated dataset

General Setup

System Requirements

Data Cleaning

Exploratory Data Analysis

LSTM

GPT-2

Demo

Contributions

About

Releases

Packages

Languages

zh-tan/Review-Yelp-er

Folders and files

Latest commit

History

Repository files navigation

Review-Yelp-er

Consolidated dataset

General Setup

System Requirements

Data Cleaning

Exploratory Data Analysis

LSTM

GPT-2

Demo

Contributions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages