What are embeddings?

In natural language processing the goal is to have machines understand human language. Unfortunately, machine learning and deep learning algorithms only work with numbers so how can we convert the meaning of a word to a number?

This is what embeddings are for. Teaching language to computers by translating meanings into mathematical vectors (series of numbers).

Word embeddings

In word embeddings, the vectors of semantically similar terms are close to each other. In other terms, words that have a similar meaning will have a similar distance in a multi-dimensional vector space.

Here is a classical example – “king is to queen as man is to woman” encoded in the vector space as well as verb Tense and Country and their capitals are encoded in low dimensional space preserving the semantic relationships.

Knowledge graph embeddings

We can use the same technique used for words also for analyzing nodes (entities) and edges (relationship) in a knowledge graph. By doing so we can encode the meanings in a graph in a format (numerical vectors) that we can use for machine learning applications.

You can create graph embeddings from the Knowledge Graph that WordLift creates by reading the article above.

In the following presentation, I introduce the concept of multidimensional meanings using a song from “The Notorious B.I.G.”, undoubtedly one of the biggest rappers of all time. The song is called What’s Beef?.

In the text of the song, there is a play on the homophones “I see you” and “ICU” which is the acronym for intensive care unit most interestingly the word Beef assumes different meanings in every sentence. As we can see meanings change based on the words of each sentence. The idea that meanings can be derived by the analysis of the closest words was introduced by Firth, an English linguist and a leading figure in British linguistics during the 1950s.

Firth is known as the father of distributional semantics a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data

It is using this exact framework (studying semantic similarities between terms inside a given context window) that we can train a machine to understand its meaning.

Cosine similarity

When we want to analyze the semantic similarity of two documents (or two queries), and we have turned these documents into mathematical vectors, we can use the cosine of the angle between their respective vectors.

The real advantage is that two similar documents might still be far apart when calculating the Euclidean distance if they use different words with similar meanings. We might have, for example, the term ‘soccer’ that appears fifty times in one document and ten times in another. Still, they will be considered similar when we analyze their respective vectors within the same multidimensional space. 

The reason is that even if the terms used are different, as long as their meaning is similar, the orientation of their vectors will also be similar. In other words, a smaller angle between two vectors represents a higher degree of similarity.

Embeddings are one of the different techniques we can use to analyze and cluster queries. See our web story on keyword research using AI to find out more.

Must Read Content

Why Do We Need Knowledge Graphs?
Learn what a knowledge graph brings to SEO with Teodora Petkova

2x Your Video SEO in 3 Simple Steps
Use videos to increase traffic to your websites

SEO Automation in 2021
Improve the SEO of your website through Artificial Intelligence

The Power of Product Knowledge Graph for E-commerce
Dive deep into the power of data for e-commerce

Touch your SEO: Introducing Physical SEO
Connect a physical product to the ecosystem of data on the web

Are you ready for the new SEO?
Try WordLift now!




Learn how to use structured data to get better SEO rankings. Practical SEO tips, webinars, and case studies right in your inbox once a week.

Learn how to use structured data to get better SEO rankings. Practical SEO tips, webinars, and case studies right in your inbox once a week.

Congrats you are in!

Stand out on search in 2019. Get 50% off WordLift until January 7th Buy Now!

x