The Bidirectional Encoder Representations from Transformers (BERT) is an AI developed by Google as a means to help machines understand language in a manner more similar to how humans understand language. Specifically, it’s pre-trained, unsupervised natural language processing (NLP) model that seeks to understand the nuances and context of human language.
It was released as an open-source program by Google in 2018 but had an official launch in November 2019. It is now being used in Google searches in all languages, globally and impacts featured snippets.
What is BERT used for?
BERT is primarily used to provide better query results by using its understanding of language nuance to deliver more useful results. This goes not only for standard snippets, but for featured snippets as well. It’s said that it will impact at least 1 out of every 10 search results going forward.
When BERT uses its understanding of nuance in language, it can understand a user’s intentions through connecting words, such as: and, but, to, from, with, etc. So rather than utilizing only keywords, BERT can understand a user’s query request by examining words like “and” or “verses” in delivering SERP results.
An example of how BERT uses NLP to distinguish a user’s search intent.
In an example provided by Google, if you search for “parking on a hill with no curb,” you would get SERP results and a featured snippet detailing what you need to do if you’re parking a vehicle on a hill where there is no curb. Thanks to BERT’s NLP, Google knows that the word “no” means that there is no curb, whereas previously, if you searched for the same query, you would’ve received results on parking on a hill WITH a curb because your query included the keyword “curb” but Google didn’t understand the significance of the word no.
What is BERTSUM?
BERTSUM is a variant of BERT that is used for extractive summarization of content. Essentially, BERTSUM can be used to extract summaries of web pages and content for several different web pages and sites. This has been known to be particularly useful when writing meta descriptions for hundreds or even thousands of webpages on a site, rather than having to write each one individually.
BERT’s effect on RankBrain
RankBrain, being Google’s first AI used to understand queries, has been used to understand queries and content since 2015. While it shares some things in common with BERT, they do not perform the same functions and BERT has not replaced RankBrain. RankBrain can do things like, understand what a user is looking for even if they misspelled it or used incorrect grammar whereas BERT seeks to understand the nuances of the language used in a search query.
Therefore, while they both share a lot in common and both perform NLP functions for the Google SERP, they are not the same.
One of the most fascinating features of deep neural networks applied to NLP is that, provided with enough examples of human language, they can generate text and help us discover many of the subtle variations in meanings. In a recent blog post by Google research scientist Brian Strope and engineering director Ray Kurzweil we read:
“The content of language is deeply hierarchical, reflected in the structure of language itself, going from letters to words to phrases to sentences to paragraphs to sections to chapters to books to authors to libraries, etc.”
Following this hierarchical structure, new computational language models, aim at simplifying the way we communicate and have silently entered our daily lives; from Gmail “Smart Reply” feature to the keyboard in our smartphones, recurrent neural network, and character-word level prediction using LSTM (Long Short Term Memory) have paved the way for a new generation of agentive applications.
From keyword research to keyword generation
As usual with my AI-powered SEO experiments, I started with a concrete use-case. One of our strongest publishers in the tech sector was asking us new unexplored search intents to invest on with articles and how to guides. Search marketers, copywriters and SEOs, in the last 20 years have been scouting for the right keyword to connect with their audience. While there is a large number of available tools for doing keyword research I thought, wouldn’t it be better if our client could have a smart auto-complete to generate any number of keywords in their semantic domain, instead than keyword data generated by us?The way a search intent (or query) can be generated, I also thought, is also quite similar to the way a title could be suggested during the editing phase of an article. And titles (or SEO titles), with a trained language model that takes into account what people search, could help us find the audience we’re looking for in a simpler way.
What makes an RNNs “more intelligent” when compared to feed-forward networks, is that rather than working on a fixed number of steps they compute sequences of vectors. They are not limited to process only the current input, but also everything that they have perceived previously in time.
This characteristic makes them particularly efficient in processing human language (a sequence of letters, words, sentences, and paragraphs) as well as music (a sequence of notes, measures, and phrases) or videos (a sequence of images).
Here above you can see the difference between a recurrent neural network and a feed-forward neural network. Basically, RNNs have a short-memory that allow them to store the information processed by the previous layers. The hidden state is looped back as part of the input. LSTMs are an extension of RNNs whose goal is to “prolong” or “extend” this internal memory – hence allowing them to remember previous words, previous sentences or any other value from the beginning of a long sequence.
The LSTM cell where each gate works like a perceptron.
Imagine a long article where I explained that I am Italian at the beginning of it and then this information is followed by other let’s say 2.000 words. An LSTM is designed in such a way that it can “recall” that piece of information while processing the last sentence of the article and use it to infer, for example, that I speak Italian. A common LSTM cell is made of an input gate, an output gate and a forget gate. The cell remembers values over a time interval and the three gates regulate the flow of information into and out of the cell much like a mini neural network. In this way, LSTMs can overcome the vanishing gradient problem of traditional RNNs.
If you want to learn more in-depth on the mathematics behind recurrent neural networks and LSTMs, go ahead and read this article by Christopher Olah.
Let’s get started: “Io sono un compleanno!”
After reading Andrej Karpathy’s blog post I found a terrific Python library called textgenrnn by Max Woolf. This library is developed on top of TensorFlow and makes it super easy to experiment with Recurrent Neural Network for text generation.
Before looking at generating keywords for our client I decided to learn text generation and how to tune the hyperparameters in textgenrnn by doing a few experiments.
AI is interdisciplinary by definition, the goal of every project is to bridge the gap between computer science and human intelligence.
I started my tests by throwing in the process a large text file in English that I found on Peter Norvig’s website (https://norvig.com/big.txt) and I end up, thanks to the help of Priscilla (a clever content writer collaborating with us), “resurrecting” David Foster Wallace with its monumental Infinite Jest (provided in Italian from Priscilla’s ebook library and spiced up with some of her random writings).
At the beginning of the training process – in a character by character configuration – you can see exactly what the network sees: a nonsensical sequence of characters that few epochs (training iteration cycles) after will transform into proper words.
As I became more accustomed to the training process I was able to generate the following phrase:
“Io sono un compleanno. Io non voglio temere niente? Come no, ancora per Lenz.”
“I’m a birthday. I don’t want to fear anything? And, of course, still for Lenz.”
David Foster Wallace
Unquestionably a great piece of literature ?that gave me the confidence to move ahead in creating a smart keyword suggest tool for our tech magazine.
The dataset used to train the model
As soon as I was confident enough to get things working (this means basically being able to find a configuration that – with the given dataset – could produce a language model with a loss value equal or below 1.0), I asked Doreid, our SEO expert to work on WooRank’s API and to prepare a list of 100.000 search queries that could be relevant for the website.
To scale up the number we began by querying Wikidata to get a list of software for Windows that our readers might be interested to read about. As for any ML, project data is the most strategic asset. So while we want to be able to generate never-seen-before queries we also want to train the machine on something that is unquestionably good from the start.
The best way to connect words to concepts is to define a context for these words. In our specific use case, the context is primarily represented by software applications that run on the Microsoft Windows operating system. We began by slicing the Wikidata graph with a simple query that provided us with the list of 3.780+ software apps that runs on Windows and 470+ related software categories. By expanding this list of keywords and categories, Doreid came up with a CSV file containing the training dataset for our generator.
The first rows in the training dataset.
After several iterations, I was able to define the top performing configuration by applying the values below. I moved from character-level to word-level and this greatly increased the speed of the training. As you can see I have 6 layers with 128 cells on each layer and I am running the training for 100 epochs. This is indeed limited, depending on the size of the dataset, by the fact that Google Colab after 4 hours of training stops the session (this is also a gentle reminder that it might be the right time to move from Google Colab to Cloud Datalab – the paid version in Google Cloud).
Here we see the initial keywords being generated while training the model
Rock & Roll, the fun part
After a few hours of training, the model was ready to generate our never-seen-before search intents with a simple python script containing the following lines.
Here a few examples of generated queries:
where to find google drive downloads
where to find my bookmarks on google chrome
how to change your turn on google chrome
how to remove invalid server certificate error in google chrome
how to delete a google account from chrome
how to remove google chrome from windows 8 mode
how to completely remove google chrome from windows 7
how do i remove google chrome from my laptop
You can play with temperatures to improve the creativity of the results or provide a prefix to indicate the first words of the keyword that you might have in mind and let the generator figure out the rest.
Takeaways and future work
“Smart Reply” suggestions can be applied to keyword research workand is worth assessing in a systematic way the quality of these suggestions in terms of:
validity – is this meaningful or not? Does it make sense for a human?
relevance – is this query really hitting on the target audience the website has? Or is it off-topic? and
impact – is this keyword well-balanced in terms of competitiveness and volume considering the website we are working for?
The initial results are promising, all of the initial 200+ generated queries were different from the ones in the training set and, by increasing the temperature, we could explore new angles on an existing topic (i.e. “where is area 51 on google earth?”) or even evaluate new topics (ie. “how to watch android photos in Dropbox” or “advertising plugin for google chrome”).
It would be simply terrific to implement – with a Generative Adversarial Network (or using Reinforcement Learning) – a way to help the generator produce only valuable keywords (keywords that – given the website – are valid, relevant and impactful in terms of competitiveness and reach). Once again, it is crucial to define the right mix of keywords we need to train our model (can we source them from a graph as we did in this case? shall we only use the top ranking keywords from our best competitors? Should we mainly focus on long tail, conversational queries and leave out the rest?).
One thing that emerged very clearly is that: experiments like this one (combining LSTMs and data sourcing using public knowledge graphs such as Wikidata) are a great way to shed some light on how Google might be working in improving the evaluation of search queries using neural nets. What is now called “Neural Matching” might most probably be just a sexy PR expression but, behind the recently announced capability of analyzing long documents and evaluating search queries, it is fair to expect that Google is using RNNs architectures, contextual word embeddings, and semantic similarity. As deep learning and AI, in general, becomes more accessible (frameworks are open source and there is a healthy open knowledge sharing in the ML/DL community) it becomes evident that Google leads the industry with the amount of data they have access to and the computational resources they control.
This experiment would not have been possible without textgenrnn by Max Woolf and TensorFlow. I am also deeply thankful to all of our VIP clients engaging in our SEO management services, our terrific VIP team: Laura, Doreid, Nevine and everyone else constantly “lifting” our startup, Theodora Petkova for challenging my robotic mind ?and my beautiful family for sustaining my work.
Making sense of data using AI is becoming crucial to our daily lives and has significantly shaped my professional career in the last 5 years.
When I began working on the Web it was in the mid-nineties and Amazon was still a bookseller with a primitive website.
At that time it became extremely clear that the world was about to change and every single aspect of our society in both cultural and economic terms was going to be radically transformed by the information society. I was in my twenties, eager to make a revolution and the Internet became my natural playground. I dropped out of school and worked day and night contributing to the Web of today.
Twenty years after I am witnessing again to a similar – if not even more radical – transformation of our society as we race for the so-called AI transformation. This basically means applying machine learning, ontologies and knowledge graphs to optimize every process of our daily lives.
At the personal level I am back in my twenties ? (sort of) and I wake up at night to train a new model, to read the latest research paper on recurrent neural networks or to test how deep learning can be used to perform tasks on knowledge graphs.
The beauty of it is that I have the same feeling of building the plane as we’re flying it that I had in the mid-nineties when I started with TCP/IP, HTML and websites!
Wevolver: an image I took at SXSW
AI transformation for search engine optimization
In practical terms, the AI transformation here at WordLift (our SEO startup) works this way: we look at how we help companies improve traffic coming from search engines. We analyze complex tasks and break them down into small chunks of work and we try to automate them using narrow AI techniques (in some cases we simply tap at the top of the AI pyramid and start using ready-made APIs, in some other cases we develop/train our own models). We tend to focus (in this phase at least) to trivial repetitive tasks that can bring a concrete and measurable impact on the SEO of a website (i.e. more visits from Google, more engaged users, …) such as:
We test these approaches on a selected number of terrific clients that literally fuel this process, we keep on iterating and improving the tooling we use until we feel ready to add it back into our product to make it available to hundreds of other users.
We take on a small handful of clients projects each year to help them boost their qualified traffic via our SEO Management Service.
All along the journey, I’ve learned the following lessons:
1. The AI stack is constantly evolving
AI introduces a completely new paradigm: from teaching computers what to do, to providing the data required for computers to learn what to do.
In this pivotal change, we still lack the infrastructure required to address fundamental problems (i.e. How do I debug a model? How can I prevent/detect a bias in the system? How can I predict an event in the context in which the future is not a mere projection of the past?). This basically means that new programming languages will emerge and new stacks shall be designed to address these issues right from the beginning. In this continuing evolving scenario libraries like TensorFlow Hub represent a concrete and valuable example of how the consumption of reusable parts in AI and machine learning can be achieved. This approach also greatly improves the accessibility of these technologies by a growing number of people outside the AI community.
2. Semantic data is king
AI depends on data and any business that wants to implement AI inevitably ends up re-vamping and/or building a data pipeline: the way in which the data is sourced, collected, cleaned, processed, stored, secured and managed. In machine learning, we no longer use if-then-else rules to instruct the computer but we instead let the computer learn the rules by providing a training set of data. This approach, while extremely effective, poses several issues as there is no way to explain why a computer has learned a specific behavior from the training data. In Semantic AI, knowledge graphs are used to collect and manage the training data, and this allows us to check the consistency of this data and to understand, more easily, how the network is behaving and where we might have a margin for improvement. Real-world entities and the relationships between them are becoming essential building blocks in the third era of computing. Knowledge graphs are also great in “translating” insights and wisdom from domain experts in a computable form that machine can understand.
3. You need the help of subject-matter experts
Knowledge becomes a business asset when it is properly collected, encoded, enriched and managed. Any AI project you might have in mind always starts with a domain expert providing the right keys to address the problem. In a way, AI is the most human-dependent technology of all times. For example, let’s say that you want to improve your SEO for images on your website. You will start by looking at best practices and direct experiences of professional SEOs that have been dealing with this issue for years. It is only through the analysis of the methods that this expert community would use that you can tackle the problem and implement your AI strategy. Domain experts know, clearly in advance, what can be automated and what are the expected results from this automation. A data analyst or an ML developer would think that we can train an LSTM network to write all the meta-descriptions of a website on-the-fly. A domain expert would tell you that Google only uses meta descriptions 33% of the times as search snippets and that, if these texts are not revised by a qualified human, they will produce little or no results in terms of actual clicks (we can provide a decent summary with NLP and automatic text summarization but enticing a click is a different challenge).
4. Always link data with other data
External data linked with internal data helps you improve how the computer will learn about the world you live in. Rarely an organization controls all the data that an ML algorithm would need to become useful and to have a concrete business impact. By building on top of the Semantic Web and Linked Data, and by connecting internal with external data we can help machines get smarter. When we started designing the new WordLift’s dashboard whose goal is to help editors improve their editorial plan by looking at how their content ranks on Google, it immediately became clear that our entity-centric world would have benefited from query and ranking data gathered by our partner WooRank. The combination of these two pieces of information helped us create the basis for training an agent that will recommend editors what is good to write and if they are connecting with the right audience over organic search.
To shape your AI strategy and improve both technical and organizational measures we need to study carefully the business requirements with the support of a domain expert and remember that, narrow AI helps us build agentive systems that do things for end-users (like, say, tagging images automatically or building a knowledge graph from your blog posts) as long as we always keep the user at the center of the process.
In this article, we explore how to evaluate the correspondence between title tags and the keywords that people use on Google to reach the content they need. We will share the results of the analysis (and the code behind) using a TensorFlow model for encoding sentences into embedding vectors. The result is a list of titles that can be improved on your website.
“A title tag is an HTML element that defines the title of the page. Titles are one of the most important on-page factors for SEO. […]
They are used, combined with meta descriptions, by search engines to create the search snippet displayed in search results.”
Every search engine’s most fundamental goal is to match the intent of the searcher by analyzing the query to find the best content on the web on that specific topic. In the quest for relevancy a good title influence search engines only partially (it takes a lot more than just matching the title with the keyword to rank on Google) but it does have an impact especially on top ranking positions (1st and 2nd according to a study conducted a few years ago by Cognitive SEO). This is also due to the fact that a searcher is likely inclined to click when they find good semantic correspondence between the keyword used on Google and the title (along with the meta description) displayed in the search snippet of the SERP.
What is semantic similarity in text mining?
Semantic similarity defines the distance between terms (or documents) by analyzing their semantic meanings as opposed to looking at their syntactic form.
“Apple” and “apple” are the same word and if I compute the difference syntactically using an algorithm like Levenshtein they will look identical, on the other hand, by analyzing the context of the phrase where the word apple is used I can “read” the true semantic meaning and find out if the word is referencing the world-famous tech company headquartered in Cupertino or the sweet forbidden fruit of Adam and Eve.
A search engine like Google uses NLP and machine learning to find the right semantic match between the intent and the content. This means the search engines are no longer looking at keywords as strings of text but they are reading the true meaning that each keyword has for the searcher. As SEO and marketers, we can also now use AI-powered tools to create the most authoritative content for a given query.
There are two main ways to compute the semantic similarity using NLP:
we can compute the distance of two terms using semantic graphsand ontologies by looking at the distance between the nodes (this is how our tool WordLift is capable of discerning if apple – in a given sentence – is the company founded by Steve Jobs or the sweet fruit). A very trivial, but interesting example is to, build a “semantic tree” (or better we should say a directed graph) by using the Wikidata P279-property (subclass of).
we can alternatively use a statistical approach and train a deep neural network to build – from a text corpus (a collection of documents), a vector space model that will help us transform the terms in numbers to analyze their semantic similarity and run other NLP tasks (i.e. classification).
There is a crucial and essential debate behind these two approaches. The essential question being: is there a path by which our machines can possess any true understanding? Our best AI efforts after all only create an illusion of an understanding. Both rule-based ontologies and statistical models are far from producing a real thought as it is known in cognitive studies of the human brain. I am not going to expand here but, if you are in the mood, read this blog post on the Noam Chomsky / Peter Norvig debate.
Text embeddings in SEO
Word embeddings (or text embeddings) are a type of algebraic representation of words that allows words with similar meaning to have similar mathematical representation. A vector is an array of numbers of a particular dimension. We calculate how close or distant two words are by measuring the distance between these vectors.
In this article, we’re going to extract embedding using the tf.Hub Universal Sentence Encoder, a pre-trained deep neural network designed to convert text into high dimensional vectors for natural language tasks. We want to analyze the semantic similarity between hundreds of combinations of Titles and Keywords from one of the clients of our SEO management services. We are going to focus our attention on only one keyword per URL, the keyword with the highest ranking (of course we can also analyze multiple combinations). While a page might attract traffic on hundreds of keywords we typically expect to see most of the traffic coming from the keyword with the highest position on Google.
We are going to start from the original code developed by the TensorFlow Hub team and we are going to use Google Colab (a free cloud service with GPU supports to work with machine learning). You can copy the code I worked on and run it on your own instance.
Our starting point is a CSV file containing Keyword, Position (the actual ranking on Google) and Title. You can generate this CSV from the GSC or use any keyword tracking tool like Woorank, MOZ or Semrush. You will need to upload the file to the session storage of Colab (there is an option you can click in the left tray) and you will need to update the file name on the line that starts with:
df = pd.read_csv( … )
Here is the output.
Let’s get into action. The pre-trained model comes with two flavors: one trained with a Transformer encoder and another trained with a Deep Averaging Network (DAN). The first one is more accurate but has higher computational resource requirements. I used the transformer considering the fact that I only worked with a few hundreds of combinations.
In the code below we initiate the module, open the session (it takes some time so the same session will be used for all the extractions), get the embeddings, compute the semantic distance and store the results. I did some tests in which I removed the site name, this helped me see things differently but in the end, I preferred to keep whatever a search engine would see.
The semantic similarity – the degree to which the title and the keyword carry the same meaning – is calculated, as the inner products of the two vectors.
An interesting aspect of using word embeddings from this model is that – for English content – I can easily calculate the semantic similarity of both short and long text. This is particularly helpful when looking at a dataset that might contain very short keywords and very long titles.
The result is a table of combinations from rankings between 1 and 5 that have the least semantic similarity (Corr).
It is interesting to see that it can help, for this specific website, to add to the title the location (i.e. Costa Rica, Anguilla, Barbados, …).
With a well-structured data markup we are already helping the search engine disambiguate these terms by specifying the geographical location, but for the user making the search, it might be beneficial to see at a glance the name of the location he/she is searching for in the search snippet. We can achieve this by revising the title or by bringing more structure in the search snippets using schema:breadcrumbs to present the hierarchy of the places (i.e. Italy > Lake Como > …).
In this scatter plot we can also see that the highest semantic similarity between titles and keywords has an impact on high rankings for this specific website.
Semantic Similarity between keywords and titles visualized
Start running your semantic content audit
Crawling your website using natural language processing and machine learning to extract and analyze the main entities, greatly helps you improve the findability of your content. Adding semantic rich structured data in your web pages helps search engines match your content with the right audience. Thanks to NLP and deep learning I could see that to reduce the gap – between what people search and the existing titles – it was important – for this website – to add the Breadcrumbs markup with the geographical location of the villas. Once again AI, while still incapable of true understanding, helps us become more relevant for our audience (and it does it at web scele on hundreds of web pages).
Solutions like the TF-Hub Universal Encoder bring, in the hands of SEO professionals and marketers, the same AI-machinery that modern search engines like Google use to compute the relevancy of content. Unfortunately, this specific model is limited to English only.
Are you ready to run your first semantic content audit?
Machine learning (ML) is a subfield of Artificial Intelligence for the studying of algorithms that computer systems can use to derive knowledge from data.
Machine learning algorithms are used to build mathematical models of sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task.
There are three different types of machine learning algorithms:
Supervised Learning. The data is labeled with the expected outcome in a “training dataset” that will help the system train itself to predict the outcome on new (previously unseen) data samples.
Unsupervised Learning. Here the machine has no inputs in terms of expected outcomes and labels but it simply gets the features as numerical attributes and will find the hidden structure of the dataset.
Reinforcement Learning. It helps with decision-making tasks. The system gets a reward when it is capable of making measurable progress on a given action without knowing how to get to the end. A typical example is the chess game. The system learns by evaluating the results of a single action (i.e. moving of one square the horse).
In this post, I’ll walk through the analysis of Google Search Console data combined with a machine learning clustering technique to provide an indication on what pages can be optimized to improve the organic traffic of a company website. I will also highlight the lessons I learned while using machine learning for an SEO task.
Interestingly, website owners when I propose to use their data are usually very relieved that AI can take care of the mundane, repetitive SEO work like analyzing GSC data; this allows the clients of our SEO management service and our own team, to focus on more complex, value-adding work such as content writing, content enrichment, and monetization.
Machine learning is fun
This experiment is designed for anyone: no specific coding skill is required. A good grip on Google Sheets is more than enough to get you started in the data mining of your website’s GSC data.
We will use Orange, an open source tool built on top of Python, for data mining and analysis that uses a visual programming front-end (a graphical user interface that lets you do what a developer would do using a Jupyter notebook environment and Python, yay!).
You can install Orange from Anaconda, a popular Python data science platform, or simply download it and install it from their website. We will also use data from a web crawler to extract information about the length of the title and the length of the meta description. This can be done using a WooRank account, Sitebulb or any other web crawler of your choosing.
Stand on the shoulder of giants
Dealing with machine learning is indeed a paradigm shift. The basic idea is that we provide highly curated data to a machine and the machine will learn from this data, it will program itself and it will help us in the analysis by either grouping data points, making predictions or extracting relevant patterns from our dataset. Choosing the data points and curating the dataset, in machine learning, is as strategic as writing the computer program in traditional computer science. By deciding the type of data you will feed the machine you are transferring the knowledge required to train the machine. To do so, you need the so-called domain experts and when I started with this experiment I came across a tweet from Bill Slawski that indicated me the importance of comparing search impressions to clicks on a page as the most valuable piece of data from the Google Search Console.
A3 One of my favorite GSC Data points is comparing search impressions to click on a page, if they vary widely is a sign that rewriting titles or snippets would be a good idea – less of a sign of success of an ongoing campaign, but helpful to know about. #SEOChat
As we learned from Aleyda another important aspect to winning the game is to focus only on pages that have already a strong position (between 3 and 5 she says). This is extremely important, as it will speed up the process and bring almost immediate results. Of course, the bracket might be different for smaller websites (in some cases working with pages with a position between 3 and 10 might also be valuable).
How do I get the data from Google Search Console into Google Sheet?
Luckily GSC provides fast and reliable access to your data via APIs, and you can use a Google Sheet Add On called searchanalyticsforsheets.com that automatically retrieve the data and stores it in Google Sheet without writing a line of code. It is free, super simple to use and well documented (kudos for the developing team ?).
If you are more familiar with Python you can also use this script by Stephan Solomonidis on GitHub that would do pretty much the same work with only a few lines of code.
In my dataset, I wanted to have both queries and pages in the same file. A page usually ranks for multiple intents and it is important to see what is the main query we want to optimize for.
How can I merge two datasets in one?
Aggregating data from the crawler with data from GSC can be done directly in Orange using the merge data widget that horizontally combines two datasets by using the page as a matching attribute. I used, instead, Google Sheets with a combination of ARRAYFORMULA (it will run the function on an entire column) and VLOOKUP (this does the actual match and brings both title length and meta description length in the same table).
index (the columns from the crawler dataset that we want to import for the length of the title and of the meta description)
is_sorted (typically set to FALSE since the two tables we’re merging don’t follow the same order)
Prepare data with loving care
Data curation is essential to obtain any valid results with artificial intelligence. Data preparation also is different for each algorithm. Each machine learning algorithm requires data to be formatted in a very specific way and before finding the right combination of column and yield useful insights I did several iterations. Missing data and wrong formatting (when migrating data in Orange in our case) have been issues to deal with. Generally speaking for missing data there are two options, either remove the data points or fill it up with average values (there are a lot more options to consider but this is basically what I did in the various iterations). Formatting is quite straightforward, we simply want Orange to properly see each informative feature as a number (and not as a string).
The dataset we’re working with is made of 15784 rows each one containing a specific combination of page and query. We have 3 informative features in the dataset (clicks, impression, and position) and 5 labels (page, query, CTR, title and meta description length). Page and query are categorical labels (we can group the data by the same query or by the same page). CTR is a formula that calculates clicks/impression * 100 and for this reason is not an informative feature. Labels or calculated values are not informative: they don’t help the algorithm in clustering the data. At the same time, they are extremely useful to help us understand and read the patterns in the data.
Dataset configuration in Orange
Introducing k-Means for clustering search queries
When looking at thousands of combination of queries across hundreds of web pages selecting the pages that have the highest potential in terms of SEO optimization is an intimidating task. This is particularly true when you have never done such analysis before or when you are approaching a website that you don’t know (as we do – in most cases – when we start a project with one new client that is using our technology).
We want to be able to group the combination of pages that can be more easily improved by updating the title and the snippet that describes the article. We also want to learn something new from the data that we collected to improve the overall quality of the content that we will produce in the future. Clustering is a good approach as it will break down the opportunities in a limited number of groups and it will unveil the underlying patterns in the data.
A cluster refers to a collection of data points aggregated together by a certain degree of similarity.
What is k-Means Clustering?
K-Means clustering is one ofthe simplest and most popular unsupervised machine learning algorithms. It will make inferences using only input features (data points like the numbers of impressions or the number of clicks) without requiring any labeled outcome.
K-Means will average the data by identifying a centroid for each group and by grouping all records in a limited number of cluster. A centroid is the imaginary center of each cluster.
The pipeline in Orange
Here is how the flow looks like in Orange. We’re importing the CSV data that we have created using the File widget, we’re quickly analyzing the data using the Distribution Widget. We have the k-Means Widget at the center of the workflow that receives data from the Select Rows Widget (this is a simple filter to work only on records that are positioned in SERP between 3 and 10) and sends the output to a Scatter Plot that will help us visualize the clusters and understand the underlying patterns. On another end, the k-Means sends the data to a Data Table widget that will produce the final report with the list of pages we need to work on and their respective queries. Here we also use a Select Rows widget to bring in our final report only the most relevant cluster.
The data analysis pipeline in Orange
Here is how the distribution of rankings looks like.
The silhouettescore in k-Means helps us understand how similar each combination is to its own cluster (cohesion) compared to other clusters (separation).
The silhouette score ranges from 0 to 1 (a high value indicates that the object is well matched to its own cluster). By using this value the algorithm can define how many clusters we need (unless we specify otherwise) and the level of cohesion of each group. In our case 3 cluster represent the best way to organize our data and to prioritize our work. From the initial 15784 samples (the rows in our dataset) we have now selected 1010 instances (these are all the combination with pages in position 3-10) that have been grouped by k-Means.
k-Means configuration parameters
SEO Data Analysis: What is the data telling us
We will use Orange’s intelligent data visualization to find informative projections. In this way, we can see how the data has been clustered. The projections are a list of attribute pairs by average classification accuracy score that shows us the underlying patterns in our dataset. Here are the top 4 I have chosen to evaluate.
1. Focus on high impressions and low CTR and here is the list of pages to optimize
Scatter Plot #1 – CTR vs Impressions (the size of the symbols indicates the CTR)
There is no point in working on cluster C1, either there are very little impressions or the CTR is already high. Where it hurts the most is on C3 and following we have C2 cluster.
We have now a total of 56 combinations of pages and queries that really deserve our attention (C2 and C3). Out of this batch, there are 18 instances in C3 (the most relevant group to address) and this basically means working on 16 pages (2 pages are getting traffic from 2 queries each).
The final report with the pages to work on
This is the list for our content team to optimize. New titles and improved meta description will yield better results in a few weeks.
2. Positions don’t matter as much as impressions
Scatter Plot #2 – Positions vs Impressions
Our three clusters are well distributed across all selected positions. We might prefer – unless there are strategic reasons to do otherwise – to improve the CTR of a page with a lower position but a strong exposure rather than improving the clicks on a higher ranking result on a low volume keyword.
3. Write titles with a length between 40 and 80 characters
Google usually displays the first 50–60 characters of a title tag. MOZ research suggests that you can expect about 90% of your titles to display properly when contained under the 60 characters. From the data we gathered we could see that, while the vast majority is working under 60 characters we can still get a healthy CTR with titles up to 78 characters and not shorter than 38 characters.
Scatter Plot #3 – CTR vs Title Length
4. Write Meta Description with a length between 140 and 160 characters
At the beginning of May last year, the length of meta description on Google has been shortened after the last update in December 2017, when the length was extended up to 290 characters. In other words, Google is still testing various length and if on a desktop it displays 920 pixels (158 characters) on mobile you will see up to 120 characters in most cases.
Meta description length in 2019 according to blog.spotibo.com
This means that the correct length is also dependent on the percentage of mobile users currently accessing the website. Once again we can ask the data what should be the preferred number of characters by looking at clusters C2 and C3. Here we can immediately see that the winning length is between 140 and 160 chars (highest CTR = bigger size of the shapes).
Scatter Plot #4 – CTR vs Meta Description Length
Make Your Website Smarter with AI-Powered SEO: just focus on your content and let the AI grow the traffic on your website!
These are really the first steps towards a future where SEOs and marketers have instant access to insights provided by machine learning that can drive a stronger and sustainable growth of web traffic without requiring a massive amount of time in sifting through spreadsheets and web metrics.
While it took a few weeks to set up the initial environment, to test the right combination of features and to share this blog post with you, now to process hundreds of thousands of combinations anyone can do it out in just a few minutes! This is also the beauty of using a tool like Orange that, after the initial setup, requires no coding skills.
We will continue to improve the methodology while working for our VIP clients, validating the results from these type of analysis and eventually improve our product to bring these results to an increasing number of people (all the clients of our semantic technology).
Keep following us and drop me a line to learn more about AI for SEO!