Restaurant quality prediction based on yelp reviews and spatial location
Word cloud of four types of reviews
In this section, I have divided the words in the comments into four types. They are reviews of high-scoring restaurants, reviews of low-scoring restaurants, words of high-scoring reviews, and words of low-scoring reviews. On this basis, I selected the 40 words with the highest frequency in the four categories, and used the wordcloud library to make them into a word cloud according to the frequency of occurrence.
High-star reviews’ most comment words
High-star restaurants’ most comment words
Low-star reviews’ most comment words
Low-star restaurants’ most comment words
Analyzing these word clouds, it is not difficult to see that there are many repeated words in various comments, such as food, place, really, etc. But in the comments with high scores, we can see words of praise such as love and best, while in restaurants with low scores, we can see words of dissatisfaction such as never and told.
Sentiment Analysis of reviews
In this section, I performed a Sentiment Analysis on the words in the comments. Among them, I mainly paid attention to the two attributes of polarity and subjectivity, and used seaborn.boxplot to analyze them and the star rating of the comments (as shown in the figure below).


We can clearly find that there is an obvious positive correlation between the polarity of reviews and the star rating of reviews. The higher the Polarity value, the higher the score, which indicates a strong positive. But the value of Subjectivity has no obvious relationship with star rating.
On this basis, I selected 20,000 comments and visualized their Polarity values and star ratings through pyplot.scatter. Visualizing these words at the same time, we can see that the result is correct. At the same time, the average score of reviews is about 3.5, and the average Polarity is slightly greater than 0.2. This is also in line with our expectations.

Analysis of four types of restaurant above
In this part, I continue to analyze the four restaurants identified above, and use hvplot.barh to list the review counts, polarity, stars, subjectivity, days since opening of the four restaurants respectively.
We can significantly find that the ratings of restaurants near the park are slightly higher, and the evaluation polarity of high-scoring restaurants is significantly greater than that of low-scoring restaurants. High-scoring restaurants are new restaurants that have been open for a relatively short period of time. New restaurants may have brought new operating methods, dishes and service concepts. Restaurants close to parks and high-scoring restaurants have significantly more reviews and obviously have more foot traffic. The subjectivity of high-scoring restaurant reviews is also slightly higher.