This project utilized a dataset of Amazon reviews in the product category of "Tools" to examine the role of paid reviews in the Amazon Vine program, with the aim of idenitifying any bias toward favorable reviews from Vine members. PySpark was used to perform the ETL process, to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into PostgreSQL with PGAdmin.
The overall purpose was geared toward understanding whether it is worthwhile for a company to participate in the Vine program, as the fee associated with buying favorable reviews must be weighed against the value such reviews might represent. Paying for reviews can guarantee that a product will receive positive ratings, but customers who are genuinely satisfied with a product may write positive reviews without having to be paid to do so.
- For the dataset for the "Tools" product category, there 285 reviews, and 31,545 unpaid reviews.
- Of the 285 paid reviews, 163 were 5-star. Of the 31,545 unpaid reviews, 14,614 were 5-star.
- 57% of the paid reviews had 5 stars, which seems low, and indicates that there is no guarantee that paying for reviews will produce consistently positive results.
- 46% of unpaid reviews had 5 stars, which is lower than the rate for paid reviews, though not substantially so.
This project was able to confirm bias in paid product reviews in the "Tools" product category, which is not surprising. However, these paid reviewers did not uniformly offer high praise in the form of 5-star product reviews, and a substantial percentage of unpaid reviewers offered 5-stars withought needing to be compensated.
Additional analysis would explore what kinds of guarantees Vine members agreed to in exchange for being paid, and whether there are other aspects of the reviews that offer additional value, such as the written length, inclusion of product images, or helpful insights. If a company is seeking these sorts of positive feedback, the relatively low rate of 5-star ratings may be offset somewhat.
Notably, the number of paid reviews in this product category was less than 1% of the overall number of reviews, so they could easily be missed among the large percentage that were unpaid. Nearly half of the unpaid reviews offered 5 stars, meaning that these 14,614 high assessments dwarfed the 163 5-star reviews that were paid, and for that matter the 285 paid reviews that produced any level of rating. This substantial imbalance in the number of paid vs unpaid reviews suggests that at these rates the Vine program is unlikely to produce positive results for companies in this product category.
It may be worth paying for reviews only in cases where they can be made to stand out more prominently, whether by making up a larger percentage of the total, or by tilting the scales by being overwhelmingly positive among unpaid reviews that offer more negative assessments. But depending on the tone in which they are presented, such a difference might be too obvious and lead the consumer to view the paid reviews as untrustworthy. Further analysis might examine in more detail the ways that the reviews themselves are judged, such as by looking for correlations with the "helpful votes" as assessed by fellow customers.