What are embeddings?

In natural language processing the goal is to have machines understand human language. Unfortunately, machine learning and deep learning algorithms only work with numbers so how can we convert the meaning of a word to a number?

This is what embeddings are for. Teaching language to computers by translating meanings into mathematical vectors (series of numbers).

Word embeddings

In word embeddings, the vectors of semantically similar terms are close to each other. In other terms, words that have a similar meaning will have a similar distance in a multi-dimensional vector space.

Here is a classical example – “king is to queen as man is to woman” encoded in the vector space as well as verb Tense and Country and their capitals are encoded in low dimensional space preserving the semantic relationships.

Knowledge graph embeddings

We can use the same technique used for words also for analyzing nodes (entities) and edges (relationship) in a knowledge graph. By doing so we can encode the meanings in a graph in a format (numerical vectors) that we can use for machine learning applications.

You can create graph embeddings from the Knowledge Graph that WordLift creates by reading the article above.

In the following presentation, I introduce the concept of multidimensional meanings using a song from “The Notorious B.I.G.”, undoubtedly one of the biggest rappers of all time. The song is called What’s Beef?.

In the text of the song, there is a play on the homophones “I see you” and “ICU” which is the acronym for intensive care unit most interestingly the word Beef assumes different meanings in every sentence. As we can see meanings change based on the words of each sentence. The idea that meanings can be derived by the analysis of the closest words was introduced by Firth, an English linguist and a leading figure in British linguistics during the 1950s.

Firth is known as the father of distributional semantics a research area that develops and studies theories and methods for quantifying and categorizing semantic similarities between linguistic items based on their distributional properties in large samples of language data

It is using this exact framework (studying semantic similarities between terms inside a given context window) that we can train a machine to understand its meaning.

Cosine similarity

When we want to analyze the semantic similarity of two documents (or two queries), and we have turned these documents into mathematical vectors, we can use the cosine of the angle between their respective vectors.

The real advantage is that two similar documents might still be far apart when calculating the Euclidean distance if they use different words with similar meanings. We might have, for example, the term ‘soccer’ that appears fifty times in one document and ten times in another. Still, they will be considered similar when we analyze their respective vectors within the same multidimensional space. 

The reason is that even if the terms used are different, as long as their meaning is similar, the orientation of their vectors will also be similar. In other words, a smaller angle between two vectors represents a higher degree of similarity.

Embeddings are one of the different techniques we can use to analyze and cluster queries. See our web story on keyword research using AI to find out more.