UX Research Articles

Latest · Popular · See all

189 articles based on findings from our e-commerce usability research

Users' Perception of Product Ratings (New Qualitative & Quantitative Findings)

This is the 2nd in a series of 9 articles based on research findings from our e-commerce product list usability study.

Product ratings can be incredibly helpful to users. During our research studies we’ve observed how the test subjects rely on ratings to gauge a product’s quality or value – especially in verticals where they lacks domain knowledge or have little prior product experience.

However, for users to be able use product ratings this way two key pieces of information must be present: the average rating score (obviously) and the number of ratings that average is based on. Unfortunately, some sites leave out the latter, much to the detriment of their users.

“Okay, it’s only two reviews,” this subject exclaimed after opening one of the product pages, immediately clicking back to the product list, adding: “I think we can drop those ratings. You know, it’s only two people who answered.”

In this article we’ll present both our qualitative and quantitative findings on users’ perception of product ratings. In particular, we’ll investigate how and why most users will show a bias towards slightly poorer ratings if they are based on a higher number of reviews.

Usability Test Observations

During our usability studies on category pages (1), e-commerce search (2), mobile e-commerce sites (3), and most recently product lists in general (4), we have time and again observed test subjects rely heavily on the number of user reviews when evaluating product ratings.

The reason is simple: when users don’t know how many ratings an average is based on, they can’t tell if a perfectly rated product simply has a single 5-star rating, or if its rating average is actually based on hundreds of reviews.

After selecting a Start Wars figure at Toys’R’Us this test subject was very disappointed as he discovered the number of ratings the average was based on “But then again I can see it’s only a single review. That’s of course not so.. so.. this could be fake. It could just as well be the manufacturer who was in here and posted a good review.”

Compare Diapers.com to the above Toys’R’Us example. By including the number of reviews next to each rating average, users are able to easily tell the sample size the average is based on and determine if they find that sufficient or not.

The flipside of this is important to be mindful of: users won’t necessarily consider the product with the highest rating average the best-rated one. Indeed, during our 1:1 usability tests, the subjects often show greater disposition towards some products with 4.5-star averages than some with perfect 5-star ratings due to the number of votes these averages are based on.

For instance, most subjects would pick a sleeping bag with a 4.5-star rating average based on 50 reviews over other sleeping bags with perfect 5-star ratings that were only based on a few reviews – they simply didn’t find the latter to be trustworthy.

Quantitative Test of Users’ Rating Bias

So when did the subjects begin finding rating averages trustworthy? During our 1:1 “think aloud” usability tests the number seemed to around 5 reviews. However, we wanted a better idea of whether this behavior was representative of the average e-commerce customer and whether there indeed is a general tipping point for the typical user’s perception of product ratings.

Obviously 1:1 usability studies aren’t good at verifying this sort of thing because the dataset is much too small – what you want to do is take these types of qualitative findings and further test or verify them using quantitative methods. Which is exactly what we did.

We tested three different rating averages against 2,250 people to get a better idea of where the scales begin to tip in regards to the number of reviews a rating average should be based on for the typical user to find it trustworthy.

Methodology: In total three surveys were conducted with a total of 2,250 responds (split evenly across the three surveys), testing different rating averages versus number of votes. Each survey showed the respondents two list items (shown in the result graphs) and asked them to pick which one they would purchase. Price and product description were kept the identical – the difference between the two list items were in the combination of user rating average and the number of votes. To avoid sequencing bias, the display sequence for the answer options were randomized for each respondent.

The quantitative survey results confirm the qualitative findings. For two otherwise identical products, where one product has a 5-star average based on 2 ratings, and the other has a 4.5-star average based on 12 ratings, 62% would pick the one with the higher number of ratings despite its lower average. This confirms the test observations that when a perfect average was based on only a few ratings the subjects would often pick other options with a slightly lower average but a higher number of ratings.

The survey also found this to be just as true when a higher number of ratings were used. In the second survey where users were asked to pick between a 5-star average based on 4 ratings against a 4.5-star average based on 57 ratings, almost the same percentage (61%) would pick the option with the higher number of ratings.

A demographic breakdown of the responses to the 5 vs 57 ratings survey. See detailed age breakdown for: 2 vs 12 ratings, 4 vs 57 ratings, and 5 vs 57 ratings.

Interestingly, there are significant differences in the bias across different demographics – more specifically age. Young people tend to place more faith in averages based on more ratings while older people show less inclination towards this bias. In fact, people aged 55-64 showed a slight bias towards the 5-starred products with few ratings across the three surveys.

In essence, depending on the typical age of a site’s audience, user perceptions of what constitutes a “highly rated product” will differ. Notably, young audiences will show a strong bias towards good-but-not-perfect product ratings that are based on numerous reviews.

Solution: Always Display the Number of Ratings

Product ratings essentially function as a type of social proof for users, letting them tap into the “wisdom of the crowd”, using good ratings as a proxy for “high quality” or “value for money.” The thinking goes that if a lot of other users are happy with a product it means that it must be a bargain or of high quality – or both. (This is also why users lacking domain knowledge or experience with the product find product ratings particularly useful because it allows them to rely on the domain knowledge and product experience of other customers.)

“This one only has a single rating, so that isn’t trustworthy at all”, a subject noted when seeing some of the rating averages only were based on 1-2 ratings. During testing the subjects would use the number of ratings to determine how reliable they would find the rating average.

Displaying the number of ratings an average is based on also seems to be close to a “best practice” among e-commerce sites, with 68% of the 50 top grossing US e-commerce sites getting this right in their product list design. Meanwhile 14% of sites neglect to display the number of reviews next to their rating averages, and 10% don’t show ratings in their product list at all despite collecting them. (The last 8% don’t allow / collect user ratings in the first place.)

It is therefore strongly recommended to include this extra piece of information in the product list – specially considering the negligible amount of space it takes up. Without the number of ratings users – especially young ones – lack essential information about the rating average which renders them unable to determine whether they find the rating trustworthy or not, impeding their ability to gauge product quality and value in verticals where they have little knowledge or experience.

Post a comment or Share:

User experience research, delivered twice a month

Join 14,000+ readers and get Baymard’s research articles by RSS feed or e-mail:

Due to spam, you need JavaScript to do that.

(1-click unsubscribe at any time)

Aaron Bradley March 25, 2015 Reply to this comment

Great research, great article – thanks!

To add fuel to your “strongly recommended” practice of providing the number of ratings upon which the average is based, I’d point out that this isn’t just good for usability, but for search visibility.

That is Google, at least, requires this number (in structured data parlance, a value for the “ratingCount” property) in order to show an average star rating (“aggregate ratings”) directly in the search results.

Jamie, Baymard Institute March 26, 2015 Reply to this comment

Interesting, I did not know this, Aaron – thanks for your addition :)

Mikel Kew March 25, 2015 Reply to this comment

Great article as usual!

I’m curious as to whether there was another influencing factor at play here as well. A larger number of reviews certainly does imply that the average rating is more trustworthy, but it may also give another social proof signal of “popularity”.

An interesting test of this may be to compare choices made between two similarly rated products; one with a decent number of reviews (e.g. 30) and another with many more than that (e.g. 60).

In this case I would not be surprised if, rather than “trustworthiness” being a factor, “popularity” may play a more significant role in the decision.

Jamie, Baymard Institute March 26, 2015 Reply to this comment

That definitely sounds plausible Mikel, I’m sure this also comes into play. And again, it requires the number of reviews to be visible in the list item design so the user has sufficient information to make such evaluation.

Hannah April 1, 2015 Reply to this comment

“However, we wanted a better idea of whether this behavior was representative of the average e-commerce customer ….

Obviously 1:1 usability studies aren’t good at verifying this sort of thing because the dataset is much too small"

EXACTLY! This just proves you don’t have to know the word “dataset” to understand the principle.

Naturiste April 23, 2015 Reply to this comment

Great research!

There is also the possibility that a visitor who sees a large number of reviews to believe in some circumstances that the product is much older in the market and it can be outdated or obsolete.

shahid September 18, 2015 Reply to this comment

it’s a very great research i like the way you have explain

Jessica Lacy September 18, 2015 Reply to this comment

@ Jamie

Wow Amazing thank you for share actually these tips is really useful for everyone…

Regards
Jessica

jenifer June 24, 2016 Reply to this comment

we wanted a better idea of whether this behavior was representative of the average e-commerce custome

Post a comment!

Close overlay