Computing Applications Virtual extension

Overcoming the J-Shaped Distribution of Product Reviews

By Nan Hu, Jie Zhang, and Paul A. Pavlou

Posted Oct 1 2009

Introduction
What Causes the J-Shaped Distribution?
Unbiased Estimator of Product Quality
Predicting Future Product Sales
Conclusion
References
Authors
Footnotes
Figures
Tables

While product review systems that collect and disseminate opinions about products from recent buyers (Table 1) are valuable forms of word-of-mouth communication,^2,6,12,a evidence suggests that they are overwhelmingly positive. Kadet⁹ notes that most products receive almost five stars. Chevalier and Mayzlin⁴ also show that book reviews on Amazon and Barnes & Noble are overwhelmingly positive. Is this because all products are simply outstanding? However, a graphical representation of product reviews reveals a J-shaped distribution (Figure 1) with mostly 5-star ratings, some 1-star ratings, and hardly any ratings in between. What explains this J-shaped distribution? If products are indeed outstanding, why do we also see many 1-star ratings? Why aren’t there any product ratings in between? Is it because there are no “average” products? Or, is it because there are biases in product review systems? If so, how can we overcome them?

The J-shaped distribution also creates some fundamental statistical problems. Conventional wisdom assumes that the average of the product ratings is a sufficient proxy of product quality and product sales. Many studies^{2,3,7,8,10,11} used the average of product ratings to predict sales.^b However, these studies showed inconsistent results: some found product reviews to influence product sales, while others did not. The average is statistically meaningful only when it is based on a unimodal distribution, or when it is based on a symmetric bimodal distribution. However, since product review systems have an asymmetric bimodal (J-shaped) distribution, the average is a poor proxy of product quality.

This report aims to first demonstrate the existence of a J-shaped distribution, second to identify the sources of bias that cause the J-shaped distribution, third to propose ways to overcome these biases, and finally to show that overcoming these biases helps product review systems better predict future product sales.

We tested the distribution of product ratings for three product categories (books, DVDs, videos) with data from Amazon collected between FebruaryJuly 2005: 78%, 73%, and 72% of the product ratings for books, DVDs, and videos are greater or equal to four stars (Figure 1), confirming our proposition that product reviews are overwhelmingly positive.

Figure 1 (left graph) shows a J-shaped distribution of all products. This contradicts the law of “large numbers” that would imply a normal distribution. Figure 1 (middle graph) shows the distribution of three randomly-selected products in each category with over 2,000 reviews. The results show that these reviews still have a J-shaped distribution, implying that the J-shaped distribution is not due to a “small number” problem. Figure 1 (right graph) shows that even products with a median average review (around 3-stars) follow the same pattern.

What Causes the J-Shaped Distribution?

To investigate the causes of the J-shaped distribution of product reviews, we performed a controlled experiment in which everyone reviewed a randomly selected product – a music CD titled “Mr. A-Z.” The experimental results showed a unimodal distribution, while the corresponding distribution for the very same product with Amazon’s empirical data was J-shaped (Figure 2). Also, the average of the experiment’s product reviews was significantly lower than the average of Amazon’s product reviews.

The experimental results explain the J-shaped distribution to be driven by two self-selection biases (Figure 3)-purchasing bias and under-reporting bias.

First, since only people with higher product valuations purchase a product, those with lower valuations are less likely to purchase the product, and they will not write a (negative) product review (purchasing bias). Purchasing bias causes the positive skewness in the distribution of product reviews and inflates the average.

Second, among people who purchased a product, those with extreme ratings (5-star or 1-star) are more likely to express their views to “brag or moan” than those with moderate views (under-reporting bias).

People tend to write reviews only when they are either extremely satisfied or extremely unsatisfied. People who feel the product is average might not be bothered to write a review.

Taken together, these two biases shape a J-shaped distribution. The J-shaped distribution is because people who purchase a product are more likely to write positive reviews (purchasing bias), and also people with moderate views are less passionate to exert the time and effort to report their ratings (under-reporting bias), unless it is to “brag or moan.”

The experimental results also show that product reviews follow a unimodal distribution, implying that most people have moderate tastes. The experimental results suggest that only 3% of the respondents gave the music CD a 5-star review and 7% gave the product a 1-star review. These results invalidate both rival explanations of extreme tastes and overconfidence.^c

Unbiased Estimator of Product Quality

Rational people account for these two biases when reading product reviews. Specifically, people pay more attention to extreme compared to moderate reviews, and also pay more attention to extreme negative reviews to learn what is wrong about the product to avoid.^2,4 Therefore, the average rating is not representative of true product quality, and the entire distribution of product reviews must be observed. This is consistent with the recent update of Amazon’s product review system (Table 2) that presents the distribution over the average of product ratings.

Predicting Future Product Sales

To test the superiority of our proposed “brag or moan” model with the proposed additional parameters, we compared its ability to predict future product sales versus the simple average, three weighted average scores, and the percentage of 1-star and 5-star reviews. Future product sales are a good proxy for product quality since people purchase products they believe to create value to them, and product value predicts future sales. Through our proposed brag-and-moan analytical model, we showed that to generate an unbiased estimator of product quality, we need to incorporate the following parameters besides the average of product reviews: standard deviation (to control for the divergence of opinions that makes it difficult to infer product quality due to higher uncertainty in inferring true product quality), the two modes of the bimodal distribution of product reviews (to overcome under-reporting bias), and product price (to overcome purchasing bias).

Controlling for previous product sales and the total number of product reviews, our model explains a large degree of the variance in future product sales (R² adjusted=77.29%), which is significantly higher than all five competing models. All proposed variables (the standard deviation and the two modes of the bimodal distribution of product reviews, plus product price) were significant predictors of future product sales besides the simple average.

Conclusion

To overcome the two sources of bias in product review systems (purchasing and under-reporting), we propose that people should not solely rely on the simple average that is readily available, but they should incorporate other variables, such as the standard deviation, the two modes of the online product reviews, besides product price. Therefore, product review systems must provide additional information besides the average—the standard deviation and the two modes of the product reviews.

Figures

Figure 1. The Distribution of Product Reviews on Amazon

Figure 2. Distribution of Experimental versus Amazon’s Ratings for a Music CD

Figure 3.

Tables

Table 1. Examples of Product Review Systems

Table 2. Amazon’s Product Review System for 4 Products

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Overcoming the J-Shaped Distribution of Product Reviews

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/1562764.1562800

October 2009 Issue

Published: October 1, 2009

Vol. 52 No. 10

Pages: 144-147

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Oct 3 2024

Leveraging Graph Databases for Fraud Detection in Financial Systems

Alex Williams

Architecture and Hardware

bank vault and analytics graphs, illustration

News Oct 2 2024

How Laser Communications Are Improving Satellites

Logan Kugler

Data and Information

satellite spacecraft above the Earth, illustration

BLOG@CACM Sep 30 2024

Leveraging SaaS and Cloud Solutions for Enhanced Business Agility

Alex Tray

Data and Information

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

What Causes the J-Shaped Distribution?

Unbiased Estimator of Product Quality

Predicting Future Product Sales

Conclusion

Figures

Tables

Overcoming the J-Shaped Distribution of Product Reviews

DOI

October 2009 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.