Let’s start with the end. In the experiment I am sharing today we measured the impact of a specific improvement on the structured data of a website that references 500+ Local Business (more specifically the site promotes Lodging Business such as hotels and villas for rent). Before diving into the solution; let’s have a look at the results that we obtained using a Causal Impact analysis. If you are a marketing person or an SEO you constantly struggle to measure the impact of your actions in the most precise and irrefutable way; Casual Impact, a methodology originally introduced by Google, helps you exactly with this. It’s a statistical analysis that builds a Bayesian structural time series model that helps you isolate the impact of a single change being made on a digital platform.
In a week, after improving the existing markup, we could see a positive increase of +5.09% of clicks coming from Google Search – this improvement is statistically relevant, unlikely to be due to random fluctuations and the probability of obtaining this effect by chance is very small 🔥🔥
We did two major improvements to the markup of these local businesses:
Improve the quality of NAP (Name, Address and Phone number) by reconciling the entities with entities in Google My Business (viia Google Maps APIs) and by making sure we had the same data Google has or better;
Adding, for all the reconciled entities, the hasMap property with a direct link to the Google CID Number (Customer ID Number), this is an important identifier that business owners and webmasters should know – it helps Google match entities found by crawling structured data with entities in GMB.
Google My Business is indeed the simplest and most effective way for a local business to enter the Google Knowledge Graph. If your site operates in the travel sector or provides users with immediate access to hundreds of local businesses, what should you do to market your pages using schema markup against a fierce competition made of the business themselves or large brands such as booking.com and tripadvisors.com?
How can you be more relevant for both travelers abroad searching for their dream holiday in another country and for locals trying to escape from large urban areas?
The approach, in most of our projects, is the same regardless of the vertical we work for: knowledge completion and entity reconciliation; these reallyare two essential building blocks of our SEO strategy.
By providing more precise information in the form of structured linked data we are helping search engines find the searchers we’re looking for, at the best time of their customer journey.
Another important aspect is that, while we’re keen on automating SEO (and data curation in general), we understand the importance of the continuous feedback loop between humans and machines: domain experts need to be able to validate the output and to correct any inaccurate predictions that the machine might produce.
There is no way out – tools like WordLift needs to facilitate the process and web scale it but they cannot replace human knowledge and human validation (not yet at least).
LocalBusiness markup works for different types of businesses from a retail shop to a luxury hotel or a shopping center and it comes with sub-types (here is the full list of the different variants from the schema.org website).
All the sub-types, when it comes to SEO and Google in particular, shall contain the following set of information:
Name, Address and Phone number (and here consistency plays a big role and we want to ensure that the same entity on Yelp shows the same data on Apple Maps, Google, Bing and all the other directories that clients might use)
Reference to the official website (this becomes particularly relevant if the publisher does not coincide with the business owner)
Reference to the Google My Business entity (the 5% lift – we have seen above is indeed related to this specific piece of information) using the hasMap property
Location data (and here, as you might image, we can do a lot more than just adding the address as a string of text)
In order to improve the markup and to add the hasMap property on hundreds of pages we’ve added a new functionality in WordLift’s WordPress plugin (that also works already for non-WordPress websites) that helps editors:
Trigger the reconciliation using Google Maps APIs
Review/Approve the suggestions
Improve structured data markup for Local Business
From the screen below the editor can either “Accept” or “Discard” the provided suggestions.
WordLift reconciles an entity with a loose match with the name of the business, the address and/or the phone number.
Adding location markup using containedInPlace/containsPlace and linked data
As seen in the json-ld above we have added – in a previous iteration (and independently from the testing that was done this time) two important properties:
the inverse-property containsPlace (on the pages related to villages and regions) to help search engines clearly understand the location of the local businesses.
This data is also very helpful to compose the breadcrumbs as it will help the searcher understand and confirm the location of a business. Most of us, still make searches like “WordLift, Rome” to find a local business and more likely we will click on results where we can confirm that – yes, WordLift office is indeed located in Italy > Lazio > Rome.
To extract this information along with the sameAs links to Wikidata and GeoNames (one of the largest geographical databases with more than 11 million locations) we used our linked data stack and an extension called WordLift Geo to automatically populate the knowledge graph and the JSON-LD with the containedInPlace and containsPlace properties.
We have seen a +5.09% increase in clicks (after only one week) on pages where we added the hasMap property and improved the consistency of NAP (business name, address and phone number) on a travel website listing over 500+ local businesses
We did this by interfacing the Google Maps Places APIs and by providing suggestions for the editor to validate/reject the suggestions
Using containedInPlace/containsPlace is also a good way to improve the structured data of a local business and you should do this by adding also sameAs links to Wikidata and/or GeoNames to facilitate disambiguation
As most of the searches for local businesses (at least in travel) are in the form of “[business name][location where the business is located]”; we have seen in the past an increased in the CTR when schema Breadcrumb use this information from containedInPlace/containsPlace (see below 👇)
One key aspect in SEO, if you are a local business (or deal with local business), is to have the correct location listed in Google Maps and link your website with Google My Business. The best way to do that is to properly markup your Google Map URL using schema markup.
What is the hasMap property and how should we use it? In 2014 (schema v 1.7) the hasMap property was introduced to link a web page of a place with the URL of a map. In order to facilitate the link between a web page and the corresponding entity on Google Maps we can use the following snippet in the JSON-LD “hasMap”: “https://maps.google.com/maps?cid=YOURCIDNUMBER”
What is the Google CID number? Google customer ID (CID) is a unique number used to identify a Google Ads account. This number can be used to link a website with the corresponding entity in Google My Business.
How can I find the Google CID number using Google Maps? Search the business in Google Maps using the business nameView the source code (use view-source: followed by the url in your browser)Click CTRL+F and search the source code for “ludocid”The CID will be the string of numbers after “ludocid\\u003d” and before #lrd
SERP analysis is an essential step in the process of content optimization to outrank the competition on Google. In this blog post I will share a new way to run SERP analysis using machine learning and a simple python program that you can run on Google Colab.
SERP (Search Engine Result Page) analysis is part of keyword research and helps you understand if the query that you identified is relevant for your business goals. More importantly by analyzing how results are organized we can understand how Google is interpreting a specific query.
What is the intention of the usermaking that search?
What search intent Google is associating with that particular query?
The investigative work required to analyze the top results provide an answer to these questions and guide us to improve (or create) the content that best fit the searcher.
While there is an abundance of keyword research tools that provide SERP analysis functionalities, my particular interest lies in understanding the semantic data layer that Google uses to rank results and what can be inferred using natural language understanding from the corpus of results behind a query. This might also shed some light on how Google does fact extraction and verification for its own knowledge graph starting from the content we write on webpages.
Falling down the rabbit hole
It all started when Jason Barnard and I started to chat about E-A-T and what technique marketers could use to “read and visualize” Brand SERPs. Jason is a brilliant mind and has a profound understanding of Google’s algorithms, he has been studying, tracking and analyzing Brand SERPs since 2013. While Brand SERPs are a category on their own the process of interpreting search results remains the same whether you are comparing the personal brands of “Andrea Volpini” and “Jason Barnard” or analyzing the different shades of meaning between “making homemade pizza” and “make pizza at home”.
Hands-on with SERP analysis
In this pytude (simple python program) as Peter Norvig would call it, the plan goes as follow:
we will crawl Google’s top (10-15-20) results and extract the text behind each webpage,
we will look at the terms and the concepts of the corpus of text resulting from the download, parsing, and scraping of web page data (main body text) of all the results together,
we will then compare two queries “Jason Barnard” and “Andrea Volpini” in our example and we will visualize the most frequent terms for each query within the same semantic space,
After that we will focus on “Jason Barnard” in order to understand the terms that make the top 3 results unique from all the other results,
Finally using a sequence-to-sequence model we will summarize all the top results for Jason in a featured snippet like text (this is indeed impressive),
At last we will build a question-answering modelon top of the corpus of text related to “Jason Barnard” to see what facts we can extract from these pages that can extend or validate information in Google’s knowledge graph.
Text mining Google’s SERP
Our text data (Web corpus) is the result of two queries made on Google.com (you can change this parameter in the Notebook) and of the extraction of all the text behind these webpages. Depending on the website we might or might not be able to collect the text. The two queries I worked with are “Jason Barnard” and “Andrea Volpini” but you can query of course whatever you like.
One of the most crucial work, once the Web corpus has been created, in the text mining field is to present data visually. Using natural language processing (NLP) we can explore these SERPs from different angles and levels of detail. Using Scattertext we’re immediately able to see what terms (from the combination of the two queries) differentiate the corpus from a general English corpus. What are, in other words, the most characteristic keywords of the corpus.
And you can see here besides the names (volpini, jasonbarnard, cyberandy) other relevant terms that characterize both Jason and myself. Boowa a blue dog and Kwala a yellow koala will guide us throughout this investigation so let me first introduce them: they are two cartoon characters that Jason and his wife created back in the nineties. They are still prominent as they appear on Jason’s article on a Wikipedia as part of his career as cartoon maker.
Visualizing term associations in two Brand SERPs
In the scatter plot below we have on the y-axis the category “Jason Barnard” (our first query), and on the x-axis the category for “Andrea Volpini”. On the top right corner of the chart we can see the most frequent terms on both SERPs – the semantic junctions between Jason and myself according to Google.
Not surprisingly there you will find terms like: Google, Knowledge, Twitter and SEO. On the top left side we can spot Boowa and Kwala for Jason and on the bottom right corner AI, WordLift and knowledge graph for myself.
Comparing the terms that make the top 3 results unique
When analyzing the SERP our goal is to understand how Google is interpreting the intent of the user and what terms Google considers relevant for that query. To do so, in the experiment, we split the corpus of the results related to Jason between the content that ranks in position 1, 2 and 3 and everything else.
Summarizing Google’s Search Results
When creating well-optimized content professional SEOs analyze the top results in order to analyze the search intent and to get an overview of the competition. As Gianluca Fiorelli, whom I personally admire a lot, would say; it is vital to look at it directly.
Since we now have the web corpus of all the results I decided to let the AI do the hard work in order to “read” all the content related to Jason and to create an easy to read summary. I’ve experimented quite a lot lately with both extractive and abstractive summarization techniques and I found that, when dealing with an heterogeneous multi-genre corpus like the one we get from scraping web results, BART (a sequence-to-sequence text model) does an excellent job in understanding the text and generating abstractive summaries (for English).
Let’s it in action on Jason’s results. Here is where the fun begins. Since I was working with Jason Barnard a.k.a the Brand SERP Guy, Jason was able to update his own Brand SERP as if Google was his own CMS 😜and we could immediately see from the summary how these changes where impacting what Google was indexing.
Here below the transition from Jason marketer, musicians and cartoon maker to Jason full-time digital marketer.
Can we reverse-engineer Google’s answer box?
As Jason and I were progressing with the experiment I also decided to see how close a Question Answering System running Google , pre-trained models of BERT, could get to Google’s answer box for the Jason-related question below.
Quite impressively, as the web corpus was indeed, the same that Google uses, I could get exactly the same result.
This is interesting as it tells us that we can use question-answering systems to validate if the content that we’re producing responds to the question that we’re targeting.
Ready to transform your marketing strategy with AI?Let's talk!
Lesson we learned
We can produce semantically organized knowledge from raw unstructured content much like a modern search engine would do. By reverse engineering the semantic extraction layer using NER from Google’s top results we can “see” the unique terms that make web documents stand out on a given query.
We can also analyze the evolution over time and space (the same query in a different region can have a different set of results) ofthese terms.
While with keyword research tools we always see a ‘static’ representation of the SERP by running our own analysis pipeline we realize that these results are constantly changing as new content surfaces the index and as Google’s neural mind improves its understanding of the world and of the person making the query.
By comparing different queries we can find aspects in common and uniqueness that can help us inform the content strategy (and the content model behind the strategy).
Are you ready to run your first SERP Analysis using Natural Language Processing?
All of this wouldn’t happen without Jason’s challenge of “visualizing” E-A-T and brand serps and this work is dedicated to him and to the wonderful community of marketers, SEOs, clients and partners that are supporting WordLift. A big thank you also goes to the open-source technologies used in this experiment:
The Bidirectional Encoder Representations from Transformers (BERT) is an AI developed by Google as a means to help machines understand language in a manner more similar to how humans understand language. Specifically, it’s pre-trained, unsupervised natural language processing (NLP) model that seeks to understand the nuances and context of human language.
It was released as an open-source program by Google in 2018 but had an official launch in November 2019. It is now being used in Google searches in all languages, globally and impacts featured snippets.
What is BERT used for?
BERT is primarily used to provide better query results by using its understanding of language nuance to deliver more useful results. This goes not only for standard snippets, but for featured snippets as well. It’s said that it will impact at least 1 out of every 10 search results going forward.
When BERT uses its understanding of nuance in language, it can understand a user’s intentions through connecting words, such as: and, but, to, from, with, etc. So rather than utilizing only keywords, BERT can understand a user’s query request by examining words like “and” or “verses” in delivering SERP results.
An example of how BERT uses NLP to distinguish a user’s search intent.
In an example provided by Google, if you search for “parking on a hill with no curb,” you would get SERP results and a featured snippet detailing what you need to do if you’re parking a vehicle on a hill where there is no curb. Thanks to BERT’s NLP, Google knows that the word “no” means that there is no curb, whereas previously, if you searched for the same query, you would’ve received results on parking on a hill WITH a curb because your query included the keyword “curb” but Google didn’t understand the significance of the word no.
What is BERTSUM?
BERTSUM is a variant of BERT that is used for extractive summarization of content. Essentially, BERTSUM can be used to extract summaries of web pages and content for several different web pages and sites. This has been known to be particularly useful when writing meta descriptions for hundreds or even thousands of webpages on a site, rather than having to write each one individually.
BERT’s effect on RankBrain
RankBrain, being Google’s first AI used to understand queries, has been used to understand queries and content since 2015. While it shares some things in common with BERT, they do not perform the same functions and BERT has not replaced RankBrain. RankBrain can do things like, understand what a user is looking for even if they misspelled it or used incorrect grammar whereas BERT seeks to understand the nuances of the language used in a search query.
Therefore, while they both share a lot in common and both perform NLP functions for the Google SERP, they are not the same.
One of the most fascinating features of deep neural networks applied to NLP is that, provided with enough examples of human language, they can generate text and help us discover many of the subtle variations in meanings. In a recent blog post by Google research scientist Brian Strope and engineering director Ray Kurzweil we read:
“The content of language is deeply hierarchical, reflected in the structure of language itself, going from letters to words to phrases to sentences to paragraphs to sections to chapters to books to authors to libraries, etc.”
Following this hierarchical structure, new computational language models, aim at simplifying the way we communicate and have silently entered our daily lives; from Gmail “Smart Reply” feature to the keyboard in our smartphones, recurrent neural network, and character-word level prediction using LSTM (Long Short Term Memory) have paved the way for a new generation of agentive applications.
From keyword research to keyword generation
As usual with my AI-powered SEO experiments, I started with a concrete use-case. One of our strongest publishers in the tech sector was asking us new unexplored search intents to invest on with articles and how to guides. Search marketers, copywriters and SEOs, in the last 20 years have been scouting for the right keyword to connect with their audience. While there is a large number of available tools for doing keyword research I thought, wouldn’t it be better if our client could have a smart auto-complete to generate any number of keywords in their semantic domain, instead than keyword data generated by us?The way a search intent (or query) can be generated, I also thought, is also quite similar to the way a title could be suggested during the editing phase of an article. And titles (or SEO titles), with a trained language model that takes into account what people search, could help us find the audience we’re looking for in a simpler way.
What makes an RNNs “more intelligent” when compared to feed-forward networks, is that rather than working on a fixed number of steps they compute sequences of vectors. They are not limited to process only the current input, but also everything that they have perceived previously in time.
This characteristic makes them particularly efficient in processing human language (a sequence of letters, words, sentences, and paragraphs) as well as music (a sequence of notes, measures, and phrases) or videos (a sequence of images).
Here above you can see the difference between a recurrent neural network and a feed-forward neural network. Basically, RNNs have a short-memory that allow them to store the information processed by the previous layers. The hidden state is looped back as part of the input. LSTMs are an extension of RNNs whose goal is to “prolong” or “extend” this internal memory – hence allowing them to remember previous words, previous sentences or any other value from the beginning of a long sequence.
The LSTM cell where each gate works like a perceptron.
Imagine a long article where I explained that I am Italian at the beginning of it and then this information is followed by other let’s say 2.000 words. An LSTM is designed in such a way that it can “recall” that piece of information while processing the last sentence of the article and use it to infer, for example, that I speak Italian. A common LSTM cell is made of an input gate, an output gate and a forget gate. The cell remembers values over a time interval and the three gates regulate the flow of information into and out of the cell much like a mini neural network. In this way, LSTMs can overcome the vanishing gradient problem of traditional RNNs.
If you want to learn more in-depth on the mathematics behind recurrent neural networks and LSTMs, go ahead and read this article by Christopher Olah.
Let’s get started: “Io sono un compleanno!”
After reading Andrej Karpathy’s blog post I found a terrific Python library called textgenrnn by Max Woolf. This library is developed on top of TensorFlow and makes it super easy to experiment with Recurrent Neural Network for text generation.
Before looking at generating keywords for our client I decided to learn text generation and how to tune the hyperparameters in textgenrnn by doing a few experiments.
AI is interdisciplinary by definition, the goal of every project is to bridge the gap between computer science and human intelligence.
I started my tests by throwing in the process a large text file in English that I found on Peter Norvig’s website (https://norvig.com/big.txt) and I end up, thanks to the help of Priscilla (a clever content writer collaborating with us), “resurrecting” David Foster Wallace with its monumental Infinite Jest (provided in Italian from Priscilla’s ebook library and spiced up with some of her random writings).
At the beginning of the training process – in a character by character configuration – you can see exactly what the network sees: a nonsensical sequence of characters that few epochs (training iteration cycles) after will transform into proper words.
As I became more accustomed to the training process I was able to generate the following phrase:
“Io sono un compleanno. Io non voglio temere niente? Come no, ancora per Lenz.”
“I’m a birthday. I don’t want to fear anything? And, of course, still for Lenz.”
David Foster Wallace
Unquestionably a great piece of literature ?that gave me the confidence to move ahead in creating a smart keyword suggest tool for our tech magazine.
The dataset used to train the model
As soon as I was confident enough to get things working (this means basically being able to find a configuration that – with the given dataset – could produce a language model with a loss value equal or below 1.0), I asked Doreid, our SEO expert to work on WooRank’s API and to prepare a list of 100.000 search queries that could be relevant for the website.
To scale up the number we began by querying Wikidata to get a list of software for Windows that our readers might be interested to read about. As for any ML, project data is the most strategic asset. So while we want to be able to generate never-seen-before queries we also want to train the machine on something that is unquestionably good from the start.
The best way to connect words to concepts is to define a context for these words. In our specific use case, the context is primarily represented by software applications that run on the Microsoft Windows operating system. We began by slicing the Wikidata graph with a simple query that provided us with the list of 3.780+ software apps that runs on Windows and 470+ related software categories. By expanding this list of keywords and categories, Doreid came up with a CSV file containing the training dataset for our generator.
The first rows in the training dataset.
After several iterations, I was able to define the top performing configuration by applying the values below. I moved from character-level to word-level and this greatly increased the speed of the training. As you can see I have 6 layers with 128 cells on each layer and I am running the training for 100 epochs. This is indeed limited, depending on the size of the dataset, by the fact that Google Colab after 4 hours of training stops the session (this is also a gentle reminder that it might be the right time to move from Google Colab to Cloud Datalab – the paid version in Google Cloud).
Here we see the initial keywords being generated while training the model
Rock & Roll, the fun part
After a few hours of training, the model was ready to generate our never-seen-before search intents with a simple python script containing the following lines.
Here a few examples of generated queries:
where to find google drive downloads
where to find my bookmarks on google chrome
how to change your turn on google chrome
how to remove invalid server certificate error in google chrome
how to delete a google account from chrome
how to remove google chrome from windows 8 mode
how to completely remove google chrome from windows 7
how do i remove google chrome from my laptop
You can play with temperatures to improve the creativity of the results or provide a prefix to indicate the first words of the keyword that you might have in mind and let the generator figure out the rest.
Takeaways and future work
“Smart Reply” suggestions can be applied to keyword research workand is worth assessing in a systematic way the quality of these suggestions in terms of:
validity – is this meaningful or not? Does it make sense for a human?
relevance – is this query really hitting on the target audience the website has? Or is it off-topic? and
impact – is this keyword well-balanced in terms of competitiveness and volume considering the website we are working for?
The initial results are promising, all of the initial 200+ generated queries were different from the ones in the training set and, by increasing the temperature, we could explore new angles on an existing topic (i.e. “where is area 51 on google earth?”) or even evaluate new topics (ie. “how to watch android photos in Dropbox” or “advertising plugin for google chrome”).
It would be simply terrific to implement – with a Generative Adversarial Network (or using Reinforcement Learning) – a way to help the generator produce only valuable keywords (keywords that – given the website – are valid, relevant and impactful in terms of competitiveness and reach). Once again, it is crucial to define the right mix of keywords we need to train our model (can we source them from a graph as we did in this case? shall we only use the top ranking keywords from our best competitors? Should we mainly focus on long tail, conversational queries and leave out the rest?).
One thing that emerged very clearly is that: experiments like this one (combining LSTMs and data sourcing using public knowledge graphs such as Wikidata) are a great way to shed some light on how Google might be working in improving the evaluation of search queries using neural nets. What is now called “Neural Matching” might most probably be just a sexy PR expression but, behind the recently announced capability of analyzing long documents and evaluating search queries, it is fair to expect that Google is using RNNs architectures, contextual word embeddings, and semantic similarity. As deep learning and AI, in general, becomes more accessible (frameworks are open source and there is a healthy open knowledge sharing in the ML/DL community) it becomes evident that Google leads the industry with the amount of data they have access to and the computational resources they control.
This experiment would not have been possible without textgenrnn by Max Woolf and TensorFlow. I am also deeply thankful to all of our VIP clients engaging in our SEO management services, our terrific VIP team: Laura, Doreid, Nevine and everyone else constantly “lifting” our startup, Theodora Petkova for challenging my robotic mind ?and my beautiful family for sustaining my work.
Making sense of data using AI is becoming crucial to our daily lives and has significantly shaped my professional career in the last 5 years.
When I began working on the Web it was in the mid-nineties and Amazon was still a bookseller with a primitive website.
At that time it became extremely clear that the world was about to change and every single aspect of our society in both cultural and economic terms was going to be radically transformed by the information society. I was in my twenties, eager to make a revolution and the Internet became my natural playground. I dropped out of school and worked day and night contributing to the Web of today.
Twenty years after I am witnessing again to a similar – if not even more radical – transformation of our society as we race for the so-called AI transformation. This basically means applying machine learning, ontologies and knowledge graphs to optimize every process of our daily lives.
At the personal level I am back in my twenties ? (sort of) and I wake up at night to train a new model, to read the latest research paper on recurrent neural networks or to test how deep learning can be used to perform tasks on knowledge graphs.
The beauty of it is that I have the same feeling of building the plane as we’re flying it that I had in the mid-nineties when I started with TCP/IP, HTML and websites!
Wevolver: an image I took at SXSW
AI transformation for search engine optimization
In practical terms, the AI transformation here at WordLift (our SEO startup) works this way: we look at how we help companies improve traffic coming from search engines. We analyze complex tasks and break them down into small chunks of work and we try to automate them using narrow AI techniques (in some cases we simply tap at the top of the AI pyramid and start using ready-made APIs, in some other cases we develop/train our own models). We tend to focus (in this phase at least) to trivial repetitive tasks that can bring a concrete and measurable impact on the SEO of a website (i.e. more visits from Google, more engaged users, …) such as:
We test these approaches on a selected number of terrific clients that literally fuel this process, we keep on iterating and improving the tooling we use until we feel ready to add it back into our product to make it available to hundreds of other users.
We take on a small handful of clients projects each year to help them boost their qualified traffic via our SEO Management Service.
All along the journey, I’ve learned the following lessons:
1. The AI stack is constantly evolving
AI introduces a completely new paradigm: from teaching computers what to do, to providing the data required for computers to learn what to do.
In this pivotal change, we still lack the infrastructure required to address fundamental problems (i.e. How do I debug a model? How can I prevent/detect a bias in the system? How can I predict an event in the context in which the future is not a mere projection of the past?). This basically means that new programming languages will emerge and new stacks shall be designed to address these issues right from the beginning. In this continuing evolving scenario libraries like TensorFlow Hub represent a concrete and valuable example of how the consumption of reusable parts in AI and machine learning can be achieved. This approach also greatly improves the accessibility of these technologies by a growing number of people outside the AI community.
2. Semantic data is king
AI depends on data and any business that wants to implement AI inevitably ends up re-vamping and/or building a data pipeline: the way in which the data is sourced, collected, cleaned, processed, stored, secured and managed. In machine learning, we no longer use if-then-else rules to instruct the computer but we instead let the computer learn the rules by providing a training set of data. This approach, while extremely effective, poses several issues as there is no way to explain why a computer has learned a specific behavior from the training data. In Semantic AI, knowledge graphs are used to collect and manage the training data, and this allows us to check the consistency of this data and to understand, more easily, how the network is behaving and where we might have a margin for improvement. Real-world entities and the relationships between them are becoming essential building blocks in the third era of computing. Knowledge graphs are also great in “translating” insights and wisdom from domain experts in a computable form that machine can understand.
3. You need the help of subject-matter experts
Knowledge becomes a business asset when it is properly collected, encoded, enriched and managed. Any AI project you might have in mind always starts with a domain expert providing the right keys to address the problem. In a way, AI is the most human-dependent technology of all times. For example, let’s say that you want to improve your SEO for images on your website. You will start by looking at best practices and direct experiences of professional SEOs that have been dealing with this issue for years. It is only through the analysis of the methods that this expert community would use that you can tackle the problem and implement your AI strategy. Domain experts know, clearly in advance, what can be automated and what are the expected results from this automation. A data analyst or an ML developer would think that we can train an LSTM network to write all the meta-descriptions of a website on-the-fly. A domain expert would tell you that Google only uses meta descriptions 33% of the times as search snippets and that, if these texts are not revised by a qualified human, they will produce little or no results in terms of actual clicks (we can provide a decent summary with NLP and automatic text summarization but enticing a click is a different challenge).
4. Always link data with other data
External data linked with internal data helps you improve how the computer will learn about the world you live in. Rarely an organization controls all the data that an ML algorithm would need to become useful and to have a concrete business impact. By building on top of the Semantic Web and Linked Data, and by connecting internal with external data we can help machines get smarter. When we started designing the new WordLift’s dashboard whose goal is to help editors improve their editorial plan by looking at how their content ranks on Google, it immediately became clear that our entity-centric world would have benefited from query and ranking data gathered by our partner WooRank. The combination of these two pieces of information helped us create the basis for training an agent that will recommend editors what is good to write and if they are connecting with the right audience over organic search.
To shape your AI strategy and improve both technical and organizational measures we need to study carefully the business requirements with the support of a domain expert and remember that, narrow AI helps us build agentive systems that do things for end-users (like, say, tagging images automatically or building a knowledge graph from your blog posts) as long as we always keep the user at the center of the process.
In this article, we explore how to evaluate the correspondence between title tags and the keywords that people use on Google to reach the content they need. We will share the results of the analysis (and the code behind) using a TensorFlow model for encoding sentences into embedding vectors. The result is a list of titles that can be improved on your website.
“A title tag is an HTML element that defines the title of the page. Titles are one of the most important on-page factors for SEO. […]
They are used, combined with meta descriptions, by search engines to create the search snippet displayed in search results.”
Every search engine’s most fundamental goal is to match the intent of the searcher by analyzing the query to find the best content on the web on that specific topic. In the quest for relevancy a good title influence search engines only partially (it takes a lot more than just matching the title with the keyword to rank on Google) but it does have an impact especially on top ranking positions (1st and 2nd according to a study conducted a few years ago by Cognitive SEO). This is also due to the fact that a searcher is likely inclined to click when they find good semantic correspondence between the keyword used on Google and the title (along with the meta description) displayed in the search snippet of the SERP.
What is semantic similarity in text mining?
Semantic similarity defines the distance between terms (or documents) by analyzing their semantic meanings as opposed to looking at their syntactic form.
“Apple” and “apple” are the same word and if I compute the difference syntactically using an algorithm like Levenshtein they will look identical, on the other hand, by analyzing the context of the phrase where the word apple is used I can “read” the true semantic meaning and find out if the word is referencing the world-famous tech company headquartered in Cupertino or the sweet forbidden fruit of Adam and Eve.
A search engine like Google uses NLP and machine learning to find the right semantic match between the intent and the content. This means the search engines are no longer looking at keywords as strings of text but they are reading the true meaning that each keyword has for the searcher. As SEO and marketers, we can also now use AI-powered tools to create the most authoritative content for a given query.
There are two main ways to compute the semantic similarity using NLP:
we can compute the distance of two terms using semantic graphsand ontologies by looking at the distance between the nodes (this is how our tool WordLift is capable of discerning if apple – in a given sentence – is the company founded by Steve Jobs or the sweet fruit). A very trivial, but interesting example is to, build a “semantic tree” (or better we should say a directed graph) by using the Wikidata P279-property (subclass of).
we can alternatively use a statistical approach and train a deep neural network to build – from a text corpus (a collection of documents), a vector space model that will help us transform the terms in numbers to analyze their semantic similarity and run other NLP tasks (i.e. classification).
There is a crucial and essential debate behind these two approaches. The essential question being: is there a path by which our machines can possess any true understanding? Our best AI efforts after all only create an illusion of an understanding. Both rule-based ontologies and statistical models are far from producing a real thought as it is known in cognitive studies of the human brain. I am not going to expand here but, if you are in the mood, read this blog post on the Noam Chomsky / Peter Norvig debate.
Text embeddings in SEO
Word embeddings (or text embeddings) are a type of algebraic representation of words that allows words with similar meaning to have similar mathematical representation. A vector is an array of numbers of a particular dimension. We calculate how close or distant two words are by measuring the distance between these vectors.
In this article, we’re going to extract embedding using the tf.Hub Universal Sentence Encoder, a pre-trained deep neural network designed to convert text into high dimensional vectors for natural language tasks. We want to analyze the semantic similarity between hundreds of combinations of Titles and Keywords from one of the clients of our SEO management services. We are going to focus our attention on only one keyword per URL, the keyword with the highest ranking (of course we can also analyze multiple combinations). While a page might attract traffic on hundreds of keywords we typically expect to see most of the traffic coming from the keyword with the highest position on Google.
We are going to start from the original code developed by the TensorFlow Hub team and we are going to use Google Colab (a free cloud service with GPU supports to work with machine learning). You can copy the code I worked on and run it on your own instance.
Our starting point is a CSV file containing Keyword, Position (the actual ranking on Google) and Title. You can generate this CSV from the GSC or use any keyword tracking tool like Woorank, MOZ or Semrush. You will need to upload the file to the session storage of Colab (there is an option you can click in the left tray) and you will need to update the file name on the line that starts with:
df = pd.read_csv( … )
Here is the output.
Let’s get into action. The pre-trained model comes with two flavors: one trained with a Transformer encoder and another trained with a Deep Averaging Network (DAN). The first one is more accurate but has higher computational resource requirements. I used the transformer considering the fact that I only worked with a few hundreds of combinations.
In the code below we initiate the module, open the session (it takes some time so the same session will be used for all the extractions), get the embeddings, compute the semantic distance and store the results. I did some tests in which I removed the site name, this helped me see things differently but in the end, I preferred to keep whatever a search engine would see.
The semantic similarity – the degree to which the title and the keyword carry the same meaning – is calculated, as the inner products of the two vectors.
An interesting aspect of using word embeddings from this model is that – for English content – I can easily calculate the semantic similarity of both short and long text. This is particularly helpful when looking at a dataset that might contain very short keywords and very long titles.
The result is a table of combinations from rankings between 1 and 5 that have the least semantic similarity (Corr).
It is interesting to see that it can help, for this specific website, to add to the title the location (i.e. Costa Rica, Anguilla, Barbados, …).
With a well-structured data markup we are already helping the search engine disambiguate these terms by specifying the geographical location, but for the user making the search, it might be beneficial to see at a glance the name of the location he/she is searching for in the search snippet. We can achieve this by revising the title or by bringing more structure in the search snippets using schema:breadcrumbs to present the hierarchy of the places (i.e. Italy > Lake Como > …).
In this scatter plot we can also see that the highest semantic similarity between titles and keywords has an impact on high rankings for this specific website.
Semantic Similarity between keywords and titles visualized
Start running your semantic content audit
Crawling your website using natural language processing and machine learning to extract and analyze the main entities, greatly helps you improve the findability of your content. Adding semantic rich structured data in your web pages helps search engines match your content with the right audience. Thanks to NLP and deep learning I could see that to reduce the gap – between what people search and the existing titles – it was important – for this website – to add the Breadcrumbs markup with the geographical location of the villas. Once again AI, while still incapable of true understanding, helps us become more relevant for our audience (and it does it at web scele on hundreds of web pages).
Solutions like the TF-Hub Universal Encoder bring, in the hands of SEO professionals and marketers, the same AI-machinery that modern search engines like Google use to compute the relevancy of content. Unfortunately, this specific model is limited to English only.
Are you ready to run your first semantic content audit?