Users’ Perception of Product Ratings (Qualitative & Quantitative Research Findings)

This is the 2nd in a series of 9 articles based on research findings from our e-commerce product list usability study.

Product ratings can be incredibly helpful to users. During our research studies we’ve observed how the test subjects rely on ratings to gauge a product’s quality or value – especially in verticals where they lacks domain knowledge or have little prior product experience.

However, for users to be able use product ratings this way two key pieces of information must be present: the average rating score (obviously) and the number of ratings that average is based on. Unfortunately, some sites leave out the latter, much to the detriment of their users.

“Okay, it’s only two reviews,” this subject exclaimed after opening one of the product pages, immediately clicking back to the product list, adding: “I think we can drop those ratings. You know, it’s only two people who answered.”

In this article we’ll present both our qualitative and quantitative findings on users’ perception of product ratings. In particular, we’ll investigate how and why most users will show a bias towards slightly poorer ratings if they are based on a higher number of reviews.

Usability Test Observations

During our usability studies on category pages (1), e-commerce search (2), mobile e-commerce sites (3), and most recently product lists in general (4), we have time and again observed test subjects rely heavily on the number of user reviews when evaluating product ratings.

The reason is simple: when users don’t know how many ratings an average is based on, they can’t tell if a perfectly rated product simply has a single 5-star rating, or if its rating average is actually based on hundreds of reviews.

After selecting a Start Wars figure at Toys’R’Us this test subject was very disappointed as he discovered the number of ratings the average was based on “But then again I can see it’s only a single review. That’s of course not so.. so.. this could be fake. It could just as well be the manufacturer who was in here and posted a good review.”

Compare to the above Toys’R’Us example. By including the number of reviews next to each rating average, users are able to easily tell the sample size the average is based on and determine if they find that sufficient or not.

The flipside of this is important to be mindful of: users won’t necessarily consider the product with the highest rating average the best-rated one. Indeed, during our 1:1 usability tests, the subjects often show greater disposition towards some products with 4.5-star averages than some with perfect 5-star ratings due to the number of votes these averages are based on.

For instance, most subjects would pick a sleeping bag with a 4.5-star rating average based on 50 reviews over other sleeping bags with perfect 5-star ratings that were only based on a few reviews – they simply didn’t find the latter to be trustworthy.

Quantitative Test of Users’ Rating Bias

So when did the subjects begin finding rating averages trustworthy? During our 1:1 “think aloud” usability tests the number seemed to around 5 reviews. However, we wanted a better idea of whether this behavior was representative of the average e-commerce customer and whether there indeed is a general tipping point for the typical user’s perception of product ratings.

Obviously 1:1 usability studies aren’t good at verifying this sort of thing because the dataset is much too small – what you want to do is take these types of qualitative findings and further test or verify them using quantitative methods. Which is exactly what we did.

We tested three different rating averages against 3,501 people to get a better idea of where the scales begin to tip in regards to the number of reviews a rating average should be based on for the typical user to find it trustworthy.

Methodology: In total three surveys were conducted with a total of 3,501 responds (split roughly evenly across the three surveys), testing different rating averages versus number of votes. Each survey showed the respondents two list items (shown in the result graphs) and asked them to pick which one they would purchase. Price and product description were kept the identical – the difference between the two list items were in the combination of user rating average and the number of votes. To avoid sequencing bias, the display sequence for the answer options were randomized for each respondent.

The quantitative survey results confirm the qualitative findings. For two otherwise identical products, where one product has a 5-star average based on 2 ratings, and the other has a 4.5-star average based on 12 ratings, 70% would pick the one with the higher number of ratings despite its lower average. This confirms the test observations that when a perfect average was based on only a few ratings the subjects would often pick other options with a slightly lower average but a higher number of ratings.

The survey also found this to be just as true when a higher number of ratings were used. In the second survey where users were asked to pick between a 5-star average based on 4 ratings against a 4.5-star average based on 57 ratings, almost the same percentage (74%) would pick the option with the higher number of ratings.

A demographic breakdown of the responses to the 5 vs 57 ratings survey.

Interestingly, there are significant differences in the bias across different demographics – more specifically age. Younger people (18 - 44) tend to place more faith in averages based on more ratings while older (45+) people show less inclination towards this bias.

In essence, depending on the typical age of a site’s audience, user perceptions of what constitutes a “highly rated product” will differ. Notably, young audiences will show a strong bias towards good-but-not-perfect product ratings that are based on numerous reviews.

Solution: Always Display the Number of Ratings

Product ratings essentially function as a type of social proof for users, letting them tap into the “wisdom of the crowd”, using good ratings as a proxy for “high quality” or “value for money.” The thinking goes that if a lot of other users are happy with a product it means that it must be a bargain or of high quality – or both. (This is also why users lacking domain knowledge or experience with the product find product ratings particularly useful because it allows them to rely on the domain knowledge and product experience of other customers.)

“This one only has a single rating, so that isn’t trustworthy at all”, a subject noted when seeing some of the rating averages only were based on 1-2 ratings. During testing the subjects would use the number of ratings to determine how reliable they would find the rating average.

Displaying the number of ratings an average is based on also seems to be close to a “best practice” among e-commerce sites, with 68% of the 50 top grossing US e-commerce sites getting this right in their product list design. Meanwhile 14% of sites neglect to display the number of reviews next to their rating averages, and 10% don’t show ratings in their product list at all despite collecting them. (The last 8% don’t allow / collect user ratings in the first place.)

It is therefore strongly recommended to include this extra piece of information in the product list – specially considering the negligible amount of space it takes up. Without the number of ratings users – especially young ones – lack essential information about the rating average which renders them unable to determine whether they find the rating trustworthy or not, impeding their ability to gauge product quality and value in verticals where they have little knowledge or experience.

Authored by Jamie Holst on March 25, 2015

If you have any comments on this article you can leave them on LinkedIn

User Experience Research, Delivered Twice a Month

Join 37,000+ UX professionals and get a new UX article every second week.

A screenshot of the UX article newsletter