Product ratings can be incredibly helpful to users. During our research studies we’ve observed how the test subjects rely on ratings to gauge a product’s quality or value – especially in verticals where they lacks domain knowledge or have little prior product experience.
However, for users to be able use product ratings this way two key pieces of information must be present: the average rating score (obviously) and the number of ratings that average is based on. Unfortunately, some sites leave out the latter, much to the detriment of their users.
In this article we’ll present both our qualitative and quantitative findings on users’ perception of product ratings. In particular, we’ll investigate how and why most users will show a bias towards slightly poorer ratings if they are based on a higher number of reviews.
During our usability studies on category pages (1), e-commerce search (2), mobile e-commerce sites (3), and most recently product lists in general (4), we have time and again observed test subjects rely heavily on the number of user reviews when evaluating product ratings.
The reason is simple: when users don’t know how many ratings an average is based on, they can’t tell if a perfectly rated product simply has a single 5-star rating, or if its rating average is actually based on hundreds of reviews.
The flipside of this is important to be mindful of: users won’t necessarily consider the product with the highest rating average the best-rated one. Indeed, during our 1:1 usability tests, the subjects often show greater disposition towards some products with 4.5-star averages than some with perfect 5-star ratings due to the number of votes these averages are based on.
For instance, most subjects would pick a sleeping bag with a 4.5-star rating average based on 50 reviews over other sleeping bags with perfect 5-star ratings that were only based on a few reviews – they simply didn’t find the latter to be trustworthy.
So when did the subjects begin finding rating averages trustworthy? During our 1:1 “think aloud” usability tests the number seemed to around 5 reviews. However, we wanted a better idea of whether this behavior was representative of the average e-commerce customer and whether there indeed is a general tipping point for the typical user’s perception of product ratings.
Obviously 1:1 usability studies aren’t good at verifying this sort of thing because the dataset is much too small – what you want to do is take these types of qualitative findings and further test or verify them using quantitative methods. Which is exactly what we did.
We tested three different rating averages against 3,501 people to get a better idea of where the scales begin to tip in regards to the number of reviews a rating average should be based on for the typical user to find it trustworthy.
The quantitative survey results confirm the qualitative findings. For two otherwise identical products, where one product has a 5-star average based on 2 ratings, and the other has a 4.5-star average based on 12 ratings, 70% would pick the one with the higher number of ratings despite its lower average. This confirms the test observations that when a perfect average was based on only a few ratings the subjects would often pick other options with a slightly lower average but a higher number of ratings.
The survey also found this to be just as true when a higher number of ratings were used. In the second survey where users were asked to pick between a 5-star average based on 4 ratings against a 4.5-star average based on 57 ratings, almost the same percentage (74%) would pick the option with the higher number of ratings.
Interestingly, there are significant differences in the bias across different demographics – more specifically age. Younger people (18 - 44) tend to place more faith in averages based on more ratings while older (45+) people show less inclination towards this bias.
In essence, depending on the typical age of a site’s audience, user perceptions of what constitutes a “highly rated product” will differ. Notably, young audiences will show a strong bias towards good-but-not-perfect product ratings that are based on numerous reviews.
Product ratings essentially function as a type of social proof for users, letting them tap into the “wisdom of the crowd”, using good ratings as a proxy for “high quality” or “value for money.” The thinking goes that if a lot of other users are happy with a product it means that it must be a bargain or of high quality – or both. (This is also why users lacking domain knowledge or experience with the product find product ratings particularly useful because it allows them to rely on the domain knowledge and product experience of other customers.)
Displaying the number of ratings an average is based on also seems to be close to a “best practice” among e-commerce sites, with 68% of the 50 top grossing US e-commerce sites getting this right in their product list design. Meanwhile 14% of sites neglect to display the number of reviews next to their rating averages, and 10% don’t show ratings in their product list at all despite collecting them. (The last 8% don’t allow / collect user ratings in the first place.)
It is therefore strongly recommended to include this extra piece of information in the product list – specially considering the negligible amount of space it takes up. Without the number of ratings users – especially young ones – lack essential information about the rating average which renders them unable to determine whether they find the rating trustworthy or not, impeding their ability to gauge product quality and value in verticals where they have little knowledge or experience.
Authored by Jamie Appleseed on March 25, 2015
Join 30,000+ UX professionals and get a new UX article every second week.
See all 37 ‘Product Lists’ articles