It’s remarkable how so many things are made better with great searches. Google made it easy for normal folks to find whatever they needed online, no matter how obscure. IDEA’s fuzzy matching and symbol search helped programmers forget the directory structure of their code bases.
AirTag added an advanced spatial location to my cat. A well-crafted discovery feature can help add that “wow” factor that iconic, habit-forming products have.
In Search of Meaning
Semantic Search is a search method for surfacing highly relevant results based on the meaning of the query, context, and content. It goes beyond simple keyword indexing or filtering.
It allows users to find things more naturally and with better support nuance than highly sophisticated but rigid traditional relevancy methods.
In practice, it feels like the difference between asking a real person or talking to a machine.
Tech companies from all over the world are racing to add these capabilities to their existing products. Instacart published an extensive article on how they added semantic deduplication to their search experience.
Examples of companies implementing some form of semantic search include eBay, Shopee, Ikea, Walmart, and many others.
The reason for this rush towards semantic search is simple: more relevant results = happier customers = more money. Discovery, relevancy, and trustworthiness are some of the hardest problems to solve in e-commerce, and an entire ecosystem exists to help companies solve them.
Vectors to the Rescue
There is an emerging group of highly capable semantic search SaaS offerings. A prime example is Algolia’s NeuralSearch - if you want a top-notch, batteries-included system that will take care of most of the complications of implementing search right, this is a great place to look.
Sadly, you are going to pay - a lot. This might be OK for a low to medium-traffic site or a POC, but do your math before you fully commit to them.
Don’t worry though: you can still create an awesome semantic search experience even if you have a more down-to-earth budget. It will just require a bit of doing.
Many companies working on semantic search today are using document embeddings - a way of representing meaning as vectors.
Since semantic search alone may not be able to provide enough relevant hits, traditional full-text indexing is used as a backup method. A feedback loop is added to track user interactions, and use them to provide super relevant results through result re-ranking.
This is what the architecture looks like:
This system has three key processes: indexing, querying, and tracking.
Indexing is done by converting a document’s content to an embeddings vector through a text-to-vector encoder (ex: OpenAI’s Embeddings API) and inserting it into a Vector Database (ex; Qdrant, Milvus, Pinecone, etc.).
Documents are also indexed in a traditional full-text search engine (ex: Elasticsearch). This combination is usually referred to as “hybrid search.”
Querying relies on encoding incoming queries into vectors (preferably using the same encoder as the previous step) and querying the vector database using them. These results are then combined with traditional full-text results and re-ranked for relevancy.
Search re-ranking is usually a complex problem, and often relies on a mix of machine learning and heuristics.
Tracking involves capturing important user interactions - ex: clicking on results, liking items, etc. - and using these events to update the machine learning models involved in re-ranking.
This provides a feedback loop that uses user input to continuously improve relevancy. Snowplow is an example of a capable tracking system.
What’s Next?
If you have the budget for a SaaS solution, then congratulations: you are well on your way to impressing your users with a spanking new search function. If, like most of us, you are not made of money, then it’s time to roll up your sleeves.
Implementation can be a daunting challenge. If you need any help, I to help you get started. In either case, you should seriously consider whether your users could benefit from semantic search.
It’s a hard problem to solve, but the upside is definitely there and users are getting more used to this raised bar every day. Happy searching!