This post is part 1 of a two-part series that explores a structured, time-saving, and efficient smart search strategy, using Machine Learning strategies that we typically deploy in our services. The first part sets the context of how smart search is empowered by content filters in three of the most known online shopping platforms. In the second part, we will introduce several means of enhancing the current status of smart search by employing semantic information. 

Product research before purchase

The shopping experience nowadays consists of online research usually done on the phone, followed by the acquisition of the products. Research showed that the in-store shopping preferences of customers declined in favour of online shopping as 57% of people preferring shopping on their electronic devices. This rise in online shopping leads to more time spent online researching potential purchases. And as we need to be informed before making an online purchase, we often want to know what other buyers experienced with similar products. That is why we look at competitive products, we read the feedback provided by other people or read detailed product descriptions.

Statistics show that 39% of the potential buyers make decisions based on what their friends recommended. Others get their information from specialized platforms such as blogs – 27%, comparison websites – 15% or read other people’s opinions on forums – 9%.

product research before purchase - statistics

The user’s behaviour before and after making a purchase shows that people are more interested in collecting information rather than providing feedback. Only 27% of the people who made a purchase will contribute to the user-generated reviews. But, even with this small percentage, the amount of available information increases faster than ever. Thus, it becomes difficult to keep up with the recently published information and process all the bits found online.

user behaviour prior and post online purchases

Facilitating product research for users

Most times people are only interested in specific features of the product they want to buy. In order to reach the information the strategy employed is to skim the text to find specific information. The easiest way to automate this strategy is to perform a naive keyword-based search. But more often than not, it may be the case that the product’s description and user-generated reviews does not include the keyword you are looking for. It may also be the case that the information is conveyed using similar words. So, you still end up reading all reviews which could lead to shopping cart abandonment.

What could help in such a case is a smart search strategy. The smart search strategy has two components: phrase search and text filtering enabled by content filters. While a phrase search is supported by most content based platforms, with different accuracies, the adoption rate of content filters is still emerging.

 

What are content filters and who can benefit from them?

Content filters add super powers to traditional search techniques, enabling the users to search for key phrases rather than keywords. The technology employed in the development of these features relies on Text Mining and Natural Language Processing. With the help of such technologies, the words and multi-words used to filter the reviews are selected based on their frequency and importance.

The content filters are generated using statistical measures such as word frequency or grammar analysis to identify the concepts and the key phrases. They are not predefined topics or categories. They are continuously generated based on the available texts, reviews or product descriptions. Such content filters are referred to as Popular mentions in Tripadvisor or Themes in Amazon.

The most popular use cases for content filters are automatic text tagging and text summarisation. They can be found in platforms that allow the users to post content in any text format. Blogging platforms, online shopping websites, websites offering services, to name a few, can benefit from the employment of content filters.

With content filters, a user can quickly get to the most relevant information that interests them. For example in the content filters below, some describe the dimensions of the product while others describe the quality or functionality.

Amazon's take on content filters

The content filters assist you in reaching the relevant information in a structured, time-saving and efficient way. As the infographic shows, assisted by content filters, a user only needs to go through 8% to 20% of the information in order to find what is relevant. The content filters optimize the search experience by summarizing the information and displaying only the relevant information to the user in the form of frequent keywords or relevant concepts and keyphrases. Not only this allows for text structuring and indexing, but it very efficiently informs the user about the most important information in a specific text.

content filters significantly reduce the time-to-info

For most users, they enable for a structured, time-saving, and efficient filtering strategy. The content filters become a useful tool for users who are not familiar with a product and want to learn more about it while being presented key information. They optimise the search experience for users who know exactly what they are looking for, allowing them to select specific characteristics. For example, when users want to upgrade a device, based on the experience with the current device they may be interested in specific improvements or features.

Content filters are also a great tool for any user who wants to browse the highlights of a product or service.

Content filters vs tags

It is important to make the difference between the automatically generated content filters and the tags added manually. The table below shows the main differences between them. Being automatically generated, content filters display the exact information from the available content, providing a faster text tagging solution.

On the other hand, the tags which are added manually may be subject to bias towards subjective information. The people who add the tags may include additional information that is not included in the content they tag or may even use tags that are unrelated to the actual content they should be about.

 

How do content filters work?

Various platforms aggregating reviews about products or services took a step forward in this direction and implemented some ways to help users identify the information they are after, in a faster and more structured way. They even provide some helpful insights to suggest what users found and wanted everyone to know

One benefit of the content filters is that usually, they are (usually) not predefined.

For example, Booking.com enables users to filter the reviews based on several criteria (location, room, bar, shower, just to name a few), but these are predefined. That means that when you want to search for the number of beds in the room, you need to select the room filter. This displays only the reviews that match the selected filter. But this not allow you to select a more specific filter. That is why you need to read all the retrieved reviews to determine those that mention the number or rooms, if any. While this reduces the amount of information to go through, it is not a significant improvement and not a great UX solution. There is a smarter solution than this.

Booking's take on content filters

Platforms such as Amazon or TripAdvisor enable filter generation in real-time by using content filters. Each time the user selects a product or a service, new content filters are generated based on the information available for that specific product. We can also assume that no two products share the same filters because the user-generated reviews for each product are unique, as are the reviews.

The second benefit of the content filters is that they indicate in a semi-structured format what each user found most relevant about the products. Take for instance this review of a service providing a guided tour. The users mention that the museum was huge, the tour lasted three hours, and the guide was informed and spoke perfect English. And this information was gathered from only four content filters.

summary generation with content filters

This supports the power of the content filters of generating an automatic summary of the product from the user-generated reviews. Just by looking at the content filters you can get an idea of whether you still want to continue researching the selected product or service or you want to search for a different one.

 

How to assess the performance of content filters?

While the standard performance measures for information retrieval include precision, recall, f-measure etc, we’ll get a fit less geeky here :). We’ll only focus on performance criteria from the user’s perspective.

content filters performance assessment

1. Text match highlight

The content filters enable you to reach the information you are looking for in a structured, time-savin, more accurate, and efficient way. But what happens when even after you applied the filter you still have a lot of text to go through? You may want to find the section that mentions the information you are looking for, and this could be easily achieved by highlighting the filter content in the full text, helping you detect it easier. Many platforms provide this functionality.

2. Ranking the retrieved information

The order of displaying the results retrieved by applying the content filters can be influenced by various factors. These may include the quality of the reviews, the date when they were published or the satisfaction of the user. While it is difficult to compare two texts based solely on a phrase they have in common, it would be challenging to propose a text-based ranking. Another relevance factor that can be used in ranking the results could be the date when the review was posted. The overall satisfaction of the user towards the product (expressed via rating stars, for example) could be another factor.

3. Relevance of the retrieved information

The quality of the generated information (keywords, key phrases, and concepts) is usually high. Issues occur when the generated information is used as content filters. We would expect that the retrieved content matches the content filter, but this may not always be the case. In the case of multi-word content filters, partial matches may be allowed.

Take the following review from Tripadvisor for example. The popular mentions include “three rooms” and by selecting it, 5 pages with reviews are retrieved. The first page already displays anomalies. There is not such a big issue that reviews mentioning the word rooms are retrieved. But the big issue occurs when reviews matching the word “room” are retrieved and the semantic meaning of “room”  is not the one expected. While the filter’s meaning is area within an apartment”, the second meaning is space for movement”.

mismatch between expected and retrieved information

4. Search flexibility

The traditional keyword-based search usually works by retrieving text that was a perfect match. But when using content filters the search query may contain more than one word, so the match functionality may change. It may be the case that the order of the words in the search phrase and in the text is not the same, thus it is important that the search functionality allows for flexibility.

One simple example could be the case where the user searches for “mobile development agency” and the text contains “agency for mobile development”. That is why, it is important that the concepts extracted as filters are normalised and aggregated in such a way that all texts matching the content of the content filters are retrieved.

5. Spell check suggestions

When a search query does not retrieve answers, it may be because either the information does not exist or the text was misspelled. While little can be done in case of missing information, for misspelling, some solutions can be identified. For example, before generating the content filters a preprocessing step handling spellcheck could be performed. Also, when the users are allowed to enter text of their choice for filtering, an autocorrect functionality could be employed.

Take the mobile keyboard autocorrect, auto-complete and predictive text functionalities. They always assist the users into writing correct and faster. Not to mention Google’s spell check corrections and search predictors that assist the users in finding the exact information.

6. Language support

The online shopping platforms are usually international and they support reviews mostly any language. But they sometimes lack the ability to perform a linguistic analysis unless the content is in English. It would be useful that text-based platforms allowed the users to reach the information they require, no matter the language. It is known that the best performance for linguistic analysis is for the English language. But platforms such as Booking started allowing for reviews written in Spanish, Chinese, German or French to contribute to the generation of content filters.

 

Performance analysis for reviews platforms

The performance of the content filters is displayed in the table below, considering both the quality of the retrieved information and the number of features used for defining the search. We’ve analyzed three of the most common reviews platforms: TripAdvisor, Booking.com, and Amazon. We rate the perfomance on a scale from 0 to 5 stars based on our own experience with these platforms.

tripadvisor vs booking vs amazon

Conclusion

With the transition from in-store to online shopping, we, as potential buyers, are more interested in finding information about our future purchases and we usually do this on our electronic devices.

We are in constant need of being informed in a structured fashion thus the search and filter functionalities become mandatory on any platform. That is why, when we make our initial research we would like to be assisted in finding the relevant information as accurate and as fast as possible.

While the status of the features of the reviews platforms is in continuous change, our next post will focus on presenting how the review platforms could enhance the user experience by exploring semantic information.

***

Need smart search solutions? Our AI team can surely help.

Ioana Bărbănțan
Ioana Bărbănțan Machine Learning Engineer

Ioana Bărbănțan is a Machine Learning Engineer @Tapptitude. She has a passion for data, structure, and visual representations. Ioana got her Ph.D. in Computer Science and specialised in Machine Learning and Natural Language Processing. Her work @Tapptitude focuses on helping clients automate and optimize their data, processes and making their products smart.