UX Articles

Apparel & Accessories Sites: Always Provide an Aggregate “Fit” Subscore in the Reviews (33% Don’t)

Iva Olah

UX Researcher

Published Sep 24, 2024

Key Takeaways

  • Some users have difficulty gauging the overall “fit” in reviews
  • Users unable to find useful “fit” info in reviews risk selecting an incorrect size or discarding suitable items
  • Yet 33% of sites fail to aggregate “fit” info provided in individual reviews

In Baymard’s large-scale UX testing of Apparel & Accessories, users were observed to predominantly turn to reviews in order to determine how an apparel item fits — for example, whether a pair of pants or a top fits “too small”, “too large”, or “true to size”.

However, several participants in testing had difficulty finding the “fit” info in the user reviews, leading them to incorrectly assess an item’s overall “fit”.

As a result, without a clear way to identify and assess relevant “fit” info in reviews, some users might select the wrong size — or even pass on purchasing an item.

To enable users to efficiently pinpoint an item’s fit, apparel and accessories sites should include an aggregate “fit” subscore in the reviews section.

However, our e-commerce UX benchmark shows that 33% of sites don’t aggregate any “fit” info provided in individual reviews, leading users to — often inaccurately — try to interpret the aggregate score themselves.

This article will discuss our latest Premium research findings on how to present “fit” info in the user reviews:

  • Why users have difficulty finding useful “fit” info

  • How providing an aggregate “fit” subscore helps users efficiently determine an item’s overall fit

  • How providing “fit” info as a structured component for individual reviews helps corroborate the aggregate subscore

Why Users Have Difficulty Finding Useful “Fit” Info

“There’s a lot of reviews here.” This participant at Nike was taken aback after realizing that he’d have to wade through 2,155 reviews to find out sizing information for a pair of joggers. Without any way to specifically scan for “fit” information, he had no choice but to try to read each review to find the sizing info he needed: “Let me see…‘Go size larger than normal’, I could see that. You don’t want it too [tight]. ‘Shrinkage this year is unacceptable’…So sizing may be an issue.” He soon became overwhelmed and gave up on the joggers after reading only a handful of reviews: “The few that I’ve seen have been kind of negative on the sizing…Because I don’t feel confident enough, I’d probably pass on these.”

During testing, several participants who approached the reviews to discover the fit of an apparel item were not able to easily pinpoint any “fit” info provided by reviewers.

Some participants therefore attempted to read every review in granular detail in an effort to draw out any “fit” info within.

Even more problematically, participants had to then mentally gauge the overall consensus about “fit” across all the reviews in order to apply the feedback to their size-selection process — a task difficult to accurately perform on the fly.

In particular, in cases where an item had a large number of reviews — one item had 2,155 — participants couldn’t realistically read every review and therefore based their findings about fit on the handful of reviews that they were able to read.

As a result, given that they based their findings on a subset rather than the entirety of reviews, some participants risked incorrectly gauging the overall fit of items.

During testing, some sites did call out “fit” information in individual reviews; for example, including “Fit: true to size / runs large / runs small” as a structured component to each individual review.

However, this forces users intent on knowing the aggregate “fit” subscore for an item to mentally tally up each individual subscore on an invisible continuum from “too small” to “too big”.

By relying on their overall impression of the individual “fit” subscores (not all of which users will bother to read) — rather than on actual calculations — users’ interpretation of the aggregate score will frequently be inaccurate.

How Providing an Aggregate “Fit” Subscore Helps Users Efficiently Determine an Item’s Overall Fit

“I like when they sum it up right here without having to look through all the reviews…This ‘Size’ meter with ‘true to size’ looks like it’s leaning towards ‘Runs Large’, so that’s very important. I think I would order maybe a half size down. Then the ‘Fit: ‘Runs Narrow,’ okay, so I would probably pick a wider size.” This participant at Cole Haan was impressed that “Size” and “Fit” subscore summaries were included at the top of the reviews for a pair of boots, using them to determine that she needed to get a size that was both smaller and wider than her typical size.

“Alright, so it runs a little bit large.” A participant at Columbia Sportswear took the time to carefully interpret the aggregate “fit” subscore for a pair of tights, observing that they ran “a little bit large”.

“[‘Fit’], that’s like right in the middle. I like that they have this, they ask if it ‘Runs Small’ versus ‘Large’. ‘Quality’, like yeah I care, but I’m more interested in the ‘Fit’”. A different participant at Columbia Sportswear, interested in a pullover, was glad that “fit” was included in the aggregate subscores at the top of the reviews.

Therefore, to provide an overview of an apparel product’s fit, it’s important to include an aggregate “fit” subscore in the reviews section.

As one participant indicated, the aggregate “fit” subscore allows one to understand an item’s overall fit at a glance: “Well, I usually would read the reviews if the site didn’t have the thing [aggregate “fit” subscore] that basically said that the customer said it was true to size. So I would usually read the reviews to find out whether they thought they fit true to size.”

An aggregate “fit” subscore saves users the time and effort of having to scan individual reviews for this information — an insurmountable task if there are hundreds of reviews.

During testing, an aggregate subscore represented by a bar chart scale that was rated from “too small” to “too large” (or similar terminology) performed well.

How Providing “Fit” Info as a Structured Component for Individual Reviews Helps Corroborate the Aggregate Subscore

While it might seem redundant to provide both individual “fit” subscores in addition to an aggregate “fit” subscore, testing showed that participants differed in how they liked to digest information provided in reviews.

At one extreme some wanted to quickly get to the point about “fit”, while at the other extreme other participants enjoyed the granular aspect of “researching” all the reviews by reading through each one.

Moreover, users who take note of an aggregate “fit” subscore might still want more context around why specific reviewers rated the fit of an item the way they did.

Without an aggregate “fit” subscore, users will take longer to figure out the fit of an item.

Likewise, only providing the high-level aggregate “fit” subscore will sit uneasily with some users, who will wonder on what evidence the site relied to calculate their score.

Providing the “fit” subscores as a structured component in individual reviews as a means to corroborate the aggregate subscore provides a level of transparency around the reviews that users will appreciate — thus also boosting their trust in the site.

Notably, consider also implementing these review-specific “fit” ratings as bar charts, since text summaries may be overlooked by users.

Help Apparel Users Quickly Determine an Item’s Overall Fit from the User Reviews

“So let’s see what people are saying”. Without any “fit” aggregate subscores — let alone any subscores at all — a participant looking for a polo at Puma tried (and failed) to read each of the 21 individual reviews to gain consensus about the fit and quality of the top.

As our research has shown, arriving at a useful consensus regarding the feedback about fit in reviews can be a difficult task for users.

By including an aggregate “fit” subscore in the reviews section — and providing individual “fit” subscores in addition — sites can help users efficiently select their correct size.

However, when reviews make it difficult to determine the overall fit at a glance, users are burdened with the unrealistic task of calculating the overall “fit” score on their own.

Yet 33% of sites don’t allow users to easily understand an item’s overall fit — risking that users incorrectly gauge the fit and select the wrong size.

Getting access: Our current Apparel & Accessories research study is ongoing and new Apparel guidelines are published every month in Baymard Premium. The full study is expected to be completed in Fall 2024.

If you want to know how your apparel or accessories desktop site, mobile site, or app performs and compares, then learn more about getting Baymard to conduct an Apparel & Accessories UX Audit of your site or app.

Iva Olah

UX Researcher

Published Sep 24, 2024

Iva is a UX Researcher at Baymard. Her research areas and specializations include Apparel & Accessories, Product Lists & Filtering, and Ticketing. She has worked in UX research, design, and technical communications since 2015. Iva has a PhD in Early Modern Art History from the University of Chicago.

If you have any comments on this article you can leave them on LinkedIn

Share:

User Experience Research, Delivered Weekly

Join 60,000+ UX professionals and get a new UX article every week.

A screenshot of the UX article newsletter