By Beatrice GambaBeatrice Gamba

3 years ago

If you work in SEO, you have been reading about Google and Bing becoming semantic search engines but, what does Semantic Search really mean for users, and how things work under the hood? Semantic Search helps you surface the most relevant results for your users based on search intent and not just keywords. Semantic (or […]

If you work in SEO, you have been reading about Google and Bing becoming semantic search engines but, what does Semantic Search really mean for users, and how things work under the hood?

Semantic Search helps you surface the most relevant results for your users based on search intent and not just keywords.

Semantic (or Neural) Search uses state of the art deep learning models to provide contextual and relevant results to user queries. When we use semantic search we can immediately understand the intent behind our customers and provide significantly improved search results that can drive deeper customer engagement. This can be essential in many different sectors but – here at WordLift – we are particularly interested in applying these technologies to: travel brands, e-commerce and online publishers.  

Information is often unstructured and available in different silos, using semantic search our goal is to use machine learning techniques to make sense of content and to create a context. When moving from syntax (for example how often a term appears on a webpage) to semantics, we have to create a layer of metadata that can help machines grasp the concepts behind each word. Google defines this ability to connect words to concepts as “Neural Matching” or *super synonyms* that help better match user queries with web content. Technically speaking this is achieved by using neural embeddings that transform words (or other types of content like images, video or audio clips) to fuzzier representations of the underlying concepts.

As part of the R&D work that we’re doing, in the context of the EU-cofounded project called WordLift Next Generation, I have built the prototype using an emerging open-source framework called Jina AI and the beautiful photographic material published by Salzburgerland Tourismus (also a partner in the Eurostars research project) and Österreich Werbung 🇦🇹 (Austrian National Tourist Office).

I have created this first prototype:

  • ☝️ to understand how modern search engines work;
  • ✌️ to re-use the same #SEO data that @wordliftit publishes as structured *linked* data for internal search.

How does Semantic Search work?

Bringing structure to information, is what WordLift does by analyzing textual information using NLP and named entity recognition, and now also images using deep learning models.

With semantic search, these capabilities are combined to let users find exactly what they need naturally.

In Jina, Flows are high-level concepts that define a sequence of steps to accomplish a task. Indexing and querying are two separate Flows; inside each flow, we run parallel Pods to analyze the content. A Pod is a basic processing unit in a Flow that can run as a dockerized application.

This is strategic as it allows us to distribute the load efficiently. In this demo, Pods are programmed to create neural embeddings: one pod to processes text and one for images. Pods can also run in parallel and the results (embeddings from the caption and embeddings from the image) are combined into one single document.

This ability to work with different content types is called multi-modality.

The user uses a text in the query to retrieve an image or vice-versa; the user uses an image, in the query, to retrieve its description.

See in the example below; I make a search using natural language at the beginning and right after, I send an image (from the results of the first search) as query to find its description 👇

Are you ready to innovate your content marketing strategy with AI? Let’s talk!

What is Jina AI?

Han Xiao, Jina AI’s CEO, calls Jina the “TensorFlow” for search 🤩. Besides the fact that I love this definition, Jina is completely open source, and designed to help you build neural (or semantic) search on the cloud. Believe me it is truly impressive. To learn more about Jina, watch Han’s latest video on YouTubeJina 101: Basic concepts in Jina“.

How can we optimize content for Semantic Search?

Here is what I learned from this experiment:

  1. When creating content, we shall focus on concepts (also referred to as entities) and search intents rather than keywords. An entity is a broader concept that groups different queries. The search intent (or user intent) is the user’s goal when making the query to the search engine. This intent can be expressed using different queries. The search engines interpret and disambiguate the meaning behind these queries by using the metadata that we provide.
  2. Information Architecture shall be designed once we understand the search intent. We are used to thinking in terms of 1 page = 1 keyword, but in reality, as we transition from keywords to entities (or concepts), we can cover the same topic across multiple documents. After crawling the pages, the search engine will work with a holistic representation of our content even when it has been written across various pages (or even different media types).
  3. Adding structured data for text, images, and videos adds precious data points that will be taken into account by the search engine. The more we provide high-quality metadata, the more we help the semantic search engine improve the matching between content and user intent.
  4. Becoming an entity in Google’s Knowledge Graph also greatly helps Google understand who we are and what we write about. It can have an immediate impact across multiple queries that refer to the entity. Read this post to learn more how to create an entity in Google’s graph

Working with Semantic Search Engines like Google and Bing, require an update of your content strategy and a deep understanding of the principles of Semantic SEO and machine learning.

Are you ready for the next SEO?
Try WordLift today!