visit
In today’s day and age, we all shop online, probably more so than in person. E-commerce continues to boom, with
An integral part of every e-commerce business is its recommender system (RS) – an inbuilt filtering mechanism that uses ranking and other methods to generate the most suitable results, providing shoppers with personalized suggestions based on their shopping history.
This is crucial, because it allows e-marketplaces to boost their revenues by up-selling and cross-selling to their clients, while also eliminating what’s known as the “
Having an RS that displays shopping items based on each customer’s preferences is how e-commerce platforms retain their customers. Consequently, this becomes all about establishing long-term relationships between e-stores and e-shoppers. For example, according to Accenture Research, a
Concurrently, the global AI retail market is
An ML model at the core of every RS allows that RS to carry out its functions. Every ML model, in turn, requires a high-quality labeled dataset that the algorithm can utilize to train itself. As a result, a solid RS – and hence also a successful e-commerce store – implies a source of function-specific labeled data.
This labeled data can be obtained in a number of ways, of which crowdsourcing is one of the fast and most affordable. Through ML, the data powered by human insight has a direct effect on recommender systems and e-commerce.
E-commerce companies can use this data to train their ML models (both brand new or pre-trained foundation models) and tailor them to specific tasks that bolster RS performance. This is known as model fine-tuning for downstream applications.
Another way that annotated data is successfully used by e-commerce businesses is commonly referred to as human-in-the-loop monitoring, which is essentially performance evaluation by real people. This is when crowd contributors gauge recommendation systems post-deployment and detect potential problems (for instance, data drift), so that these issues can be nipped in the bud before they take their toll on the business.
Human annotators can evaluate RS performance directly – by rating predictions provided by the model. They can also do it indirectly – by completing the same tasks the model faced and producing “ground truths” that can later be used to judge the model’s responses. The first track is great for evaluation, while the second is great for both evaluation and further RS fine-tuning. This is the case, because the data provided by annotators in the second case doesn’t just say when something is wrong but also provides the right answers.
Now, let’s look at three of the three curious data-labeling tasks that crowd contributors carry out in their effort to help fine-tune and evaluate recommender systems.
A vital feature in any effective RS is the engine’s ability to offer personalized products based on each customer’s shopping preferences (i.e., matching items to shoppers’ profiles). When an RS understands what to offer to every customer, this results in immediate purchases, as well as long-term marketplace loyalty. This loyalty stems from the fact that a customer will be reluctant to look for new shopping options elsewhere if their habits and needs are already understood and met by their favorite e-store. To achieve that, crowd contributors act as e-customers with individual shopping histories and annotate training data by rating “degrees of interest” on the items offered to them by an RS.
The goal here is to improve an RS, so that it offers accurate recommendations of relevant accessories and complementary items. A good example would be a phone case that matches the exact model of a smartphone recently purchased or carted by the e-shopper. Often, automated solutions that are meant to do this get complaints about their inaccuracy. So, in order to provide an ML model with the right data to improve an RS, crowd contributors carry out a series of pairwise comparisons. During this form of data labeling, human annotators match different items that can potentially be grouped together. After fine-tuning based on this labeled data, RS accuracy for complementary item discovery has been reported to climb to over 90%.
Serendipitous item discovery
Another useful data-labeling task that’s used to train, fine-tune, evaluate, and ultimately improve recommender systems is what’s known as serendipitous item discovery. This means that a well-trained ML algorithm at the core of an e-commerce engine will recommend new or surprising products that aren’t related to other goods bought by the same customer. Sometimes, serendipitous items may be linked to shopping behavior, but even more often, it’s about their overall “coolness” from the perspective of novelty, not necessarily usability.
Serendipitous item discovery is crucial in e-commerce, because there’s so much a well-tuned RS can do as far as suggesting complementary items; after all, there’s a finite number of products that can be used with other products. So, to get an RS to provide more options to e-shoppers, crowd contributors are asked to annotate more data and determine what can be seen as unusual products or novelty items. This is done via text and image classification tasks, with up to 20,000 items being processed in a single day.
Today, recommender systems are fundamental to success in e-commerce. These systems need to operate effectively to give e-marketplaces an edge in a highly competitive environment. Since recommender systems rely on ML models for functioning, high-quality labeled data is needed to both train and fine-tune ML models, as well as test them after deployment, which is known as performance evaluation or human-in-the-loop monitoring.
Annotated data for RS improvement has to be store-specific, and it has to be delivered to e-platforms quickly, on a continuous basis, and at an affordable rate. Crowdsourcing offers a viable alternative to lengthy and expensive in-house labeling, with platforms like Toloka serving international e-commerce clients with their large fleet of global crowd contributors and ML engineers.