E-Commerce Search Study Methodology

This study examines the user’s on-site search capabilities and experience. It specifically looks at focuses on how users search on e-commerce websites, and how sites can improve the user’s ability to find the items they are looking for, better guideline users, and generally align site search behavior with search engine logic. This includes areas such as the search field interface itself, auto-complete query suggestions, results pages and filtering, no-results page, search query types supported, and search query guidance.

This on-site e-commerce Search usability study is based on two main research components:

  1. Multiple rounds of large-scale usability testing (1:1 think-aloud user testing at 19 leading e-commerce sites) leading to 60 Search usability guidelines described in the E-commerce Search usability report, and
  2. Benchmarking of 50 leading US e-commerce sites, using the 60 Search usability guidelines, as the benchmark heuristics and scoring parameters.

Below the methodology for each of the research methods is described in detail.

To purchase access to the Search Usability Report & Benchmark go to: baymard.com/ecommerce-search

Usability Testing Methodology

One part of this research is based on a large-scale usability study of 19 major e-commerce sites. The usability study tasked real users with finding, evaluating and selecting products matching everyday purchasing tasks such as finding a case for their current camera, an outfit for a party, an interesting movie, etc.

The 1:1 “think aloud” test protocol was used to test the 19 sites: Amazon, Best Buy, Blue Nile, Chemist Direct, Drugstore.com, eBags, Gilt, Go Outdoors, H&M, IKEA, Macy’s, Newegg, Pixmania, Pottery Barn, REI, Tesco, Toys’R’Us, The Entertainer/TheToyShop.com, and Zappos. Each test subject tested 4 - 8 sites, depending on how fast they were. The duration of each subject’s test session varied between 1 and 1.5 hours, and the subjects were allowed breaks between each site tested.

In order to avoid artificially forcing the subjects to use search on the tested sites, this study was conducted as a combined e-commerce category navigation and search study. This way it was up to the test subjects themselves to decide if they preferred to search or navigate via the categories to find what they were looking for (i.e., they were never asked to use one approach over the other). Furthermore, it allowed the subjects to mix category navigation and search.

During the test sessions, more than 700 usability issues arose specific to e-commerce search. All of these issues have been analyzed and distilled into 60 guidelines on search usability, specifically for an e-commerce context. The observed search issues often proved so severe that 31% of the time the subjects were either unable to find the items they were looking for or became so frustrated that they decided to abandon. And 65% of the time, the subjects needed more than one search attempt, with three to four query iterations not being uncommon. This is despite testing major e-commerce sites and the tasks being fairly basic, such as “find a case for your laptop,” “find a sofa set you like,” etc.

For a study following the think aloud protocol, the binomial probability formula show that 95% of all usability problems with an occurrence rate of 14% or higher will be discovered on average, with 20 test subjects used. Thus, the focus of this report is not to arrive at a statistical conclusion of whether a usability issue will occur for 31% or 32.3% of your users. Instead it describes the search-specific usability issues which are most likely to occur for a large portion of your user base, and the issues which are the most harmful to their search experience. The study examines what users expect as they perform searches on e-commerce sites, what typically goes wrong in the process, why it goes wrong, and exactly what changes it will take to avoid these issues. In short: how to design a high-performing and delightful search experience for your users.

Benchmarking Methodology

The other part of this research study is a comprehensive usability benchmark. Using the 60 search usability guidelines from the large-scale usability tests as the review heuristics and scoring parameters, we’ve subsequently benchmarked the search engine’s capability to handle user’s’ search query types, the search field design, the auto-complete suggestion, the results guidance logic, and the search results capabilities (incl faceted search filters), at 50 top grossing US e-commerce sites. This has resulted in a benchmark database with 3,000 search usability parameters reviewed, 1,600 additional examples for the 60 guidelines, and 191 search page examples from top retailers, each annotated with review notes.

The total UX performance score assigned to each benchmarked sites is essentially an expression of how good (or bad) an on-site search experience a first-time user will have at the e-commerce site – based on the 60 guidelines documented in the Search Usability report.

The specific score is calculated using a weighted multi-parameter algorithm:

Below is a brief description of the main elements in the algorithm:

  • An individual guideline weight: A combination of the Severity of violating a specific guideline (either Harmful (worst), Disruptive or Interruption, as defined in the usability report), and the Frequency of occurrence of the specific guideline (i.e. how often the test subjects experienced it during the usability study).
  • A Rating describing to which degree a specific site adheres to each guideline (Adhered High, Adhered Low, Neutral, Issue resolved, Violated Low, Violated High, N/A).
  • The scores are summarized for all guidelines, and then divided by the total number of applicable guidelines (to ensure “N/A” does not influence the score).
  • The Highlights marked at the site screenshots are specific examples that the reviewer judged to be of interest to the reader. It’s the site’s overall adherence or violation of a guideline that is used to calculate the site’s usability score. Thus, you may find a specific Highlight that shows an example of how a site adheres to a guideline, even though that same site is scored to violate the guideline (typically because the site violates the guideline at another page), and vice versa.

All site reviews were conducted by Christian Holst, Jamie Appleseed and Thomas Grønne, from April 9th to May 23rd 2014. A US-based IP address was used. In the case multiple local or language versions of a site existed, the US site version was used for the benchmark.

All reviews were conducted as a new customer would experience them - hence no existing accounts or browsing history were used. The documented and benchmarked designs at each site were the search engine logic and search query capabilities, the search field design, the auto-complete suggestion, the search results page, and the “no results” page. One specific page from a site is shown in the benchmark, but the reviewer investigated 15-30 other pages which were used for the benchmark scoring as well.

Baymard Institute provide this information “as is”. It is based on the reviewers subjective judgement of each site at the time of testing and in relation to the documented guidelines. Baymard Institute can’t be held responsible for any kind of usage or correctness of the provided information.

The screenshots used may contain images and artwork that are both copyright and trademark protected by their respective owners. Baymard Institute does not claim to have ownership of the artwork that might be featured within these screenshots, and solely capture and store the website screenshots in order to provide constructive review and feedback within the topic of web design and web usability.

Citations, images, and paraphrasing may only be published elsewhere in limited extend, and only if crediting “Search Usability study by Baymard Institute, baymard.com/research/ecommerce-search