In this post I'll describe a way to personalize Elasticsearch queries integrating it with Amazon Personalize. The main use case is for Elasticsearch to index products for e-commerce searches. Amazon Personalize, as the name implies, is a system that provides "personalization" to users. In summary, Amazon Personalize can return lists of products recommended for a given user. This list can be ranked too.
Elasticsearch provides the ability for queries to contain weights and boosts. Elasticsearch uses these numbers as multiplying factors when interpreting the score and, consequently, the ranking of the search results.
Amazon Personalize has the notion of "item". In most cases an "item" is a product. You will need to decide if it is a base product or a variant/SKU, but I won't elaborate on this topic in this post. In the architecture described, we'll have one instance of Amazon Personalize where an item is a product, and another instance where an item is a category. Many times, e-commerce catalogues are "sparse", in the sense that there are many products with considerable rate of product renewal and shoppers don't have many orders in their order history. While the code will only reference one Personalize instance for categories, it can be easily extended to add another instance for brands.
The code is in Python, and has a dependency on boto3. The code runs intercepting an Elasticsearch query and injecting product ids and category ids with weights and boosts.
We begin with a high-level function. For categories and for products, we retrieve recommendations, we rank these recommendations and we inject the weights and boosts into the Elasticsearch query:
How do we know that the recommended products returned by Amazon Personalize will be related to the query? Well, we don't know. We can keep in mind the principle of generality of the query. If the query is general (searching for "electronics"), then personalization can have more "influence". Conversely, if the query is specific (searching for "an iPod with 32 Gb of memory"), then personalization should not have much "influence". The code presented here could be extended such that when there are facets included in the query, then the retrieval of product recommendations from Personalized is skipped, while leaving the calls to retrieve recommendations for categories and brands.
These are the basic functions that retrieve recommendations from Personalize:
These are our functions that retrieve the ranking from Personalize:
Another detail of the architecture is that there are two campaigns for products and two campaigns for categories. The first campaign has recipe "hrnn" (for recommendations) and the second campaign has recipe "rank" (for ranking).
The code assigns weights to products in descending order, based on the ranking returned by Personalize. The initial weight and the "step" can be tuned according to the data.
The most important function is the one that injects the boost and weight values into the Elasticsearch query.
This is the complete source code with sample configuration:
Disclaimers:
The code above is provided as-is. The author assumes no responsibility for the misuse of the code.
The code above was created by the author for Pivotree, while an employee of Pivotree. The blog post and code fragments are shared here publicly with permission.
No comments:
Post a Comment