Select Page
How to write meta descriptions using BERT

How to write meta descriptions using BERT

If you are confused about meta descriptions in SEO, why they are important and how to nail it with the help of artificial intelligence, this article is for you.  If you are eager to start experimenting with an AI-writer, read the full article. At the end, I will give you a script to help you write meta descriptions on scale using BERT: Google’s pre-trained, unsupervised language model that has recently gained great momentum in the SEO community after both, Google and BING announced that they use it for providing more useful results.     I used to underestimate the importance of meta descriptions myself: after all Google will use it only on 35.9% of the cases (according to a Moz analysis from last year by the illustrious @dr_pete). In reality, these brief snippets of text, greatly help to entice more users to your website and, indirectly, might even influence your ranking thanks to higher click-through-rate (CTR) While Google can overrule the meta descriptions added in the HTML of your pages, if you properly align:
  1. the main intent of the user (the query you are targeting), 
  2. the title of the page and
  3. the meta description 
There are many possibilities to improve the CTR on Google’s result pages. In the course of this article we will investigate the following aspects and, since it’s a long article, feel free to jump to the section that interests you the most — code is available at the end.

What are meta descriptions?

As usual I tend to “ask”  “experts” online a definition to get started, and with a simple query on Google, we can get this definition from our friends at WooRank: Meta descriptions are HTML tags that appear in the head section of a web page. The content within the tag provides a description of what the page and its content are about. In the context of SEO, meta descriptions should be around 160 characters long. meta description definition Here’s an example of what a meta description usually looks like (from that same article): meta description example

How long should your meta description be?

We want to be, as with any other content on our site, authentic, conversational and user-friendly. Having said that, in 2020, you will want to stick to the 155-160 characters limit (this corresponds to 920 pixels). We also want to keep in mind that the “optimal” length might change based on the query of the user. This means that you should really do your best in the first 120 characters and think in terms of creating a meaningful chain by linking the query, the title tag and the meta description. In some cases, within this chain it is also very important to consider the role of the breadcrumbs. As in the example above from WooRank I can quickly see that the definition is coming from an educational page of their site: this fits very well with my information request.         

What meta descriptions should we focus on?

SEO is a process: we need to set our goals, analyze the data we’re starting with, improve our content, and measure the results. There is no point in looking at a large website and saying, I need to write a gazillion of meta descriptions since they are all missing. It would simply be a waste of time. Besides the fact that in some cases – we might decide not to add a meta description at all. For example, when a page covers different queries and the text is already well structured we might leave it to Google to craft the best snippet for each super query (they are super good at it ?). We need to look at the critical pages we have – let’s not forget that writing a good meta description is just like writing an ad copy — driving clicks is not a trivial game. As a rule of thumb I prefer to focus my attention on: 
  • Pages that are already ranking on Google (position > 0); adding a meta description to a page that is not ranking will not make a difference.
  • Pages that are not in the top 3 positions: if they are already highly ranked, unless I can see some real opportunities – I prefer to leave them as they are.
  • Pages that have a business value: on the wordlift website (the company I work for), there is no point in adding meta descriptions to landing pages that have no organic potential. I would rather prefer to focus on content from our blog. This varies of course but is very important to understand what type of pages I want to focus on.
This criteria can be useful, especially if you plan to programmatically crawl our website and choose where to focus our attention using crawl data. Keep on reading and we’ll get there, I promise. 

A quick introduction to single-document text summarization

Automatic text summarization is a challenging NLP task to provide a short and possibly accurate summary of a long text. While, with the growing amount of online content, the need for understanding and summarizing content is very high. In pure technological terms, the challenge for creating well formed summaries is huge and results are, most of the time, still far from being perfect (or human-level). The first research work on automatic text summarization goes back to 50 years ago and various techniques. Since then, they have been used to extract relevant content from unstructured text.   “The different dimensions of text summarization can be generally categorized based on its input type (single or multi document), purpose (generic, domain specific, or query-based) and output type (extractive or abstractive).” A Review on Automatic Text Summarization Approaches, 2016.

Extractive vs Abstractrive 

Let’s have a quick look at the different methods we have for compressing a web page.  Extractive and Abstractive Summarization “Extractive summarization methods work by identifying important sections of the text and generating them verbatim; […] abstractive summarization methods aim at producing important material in a new way. In other words, they interpret and examine the text using advanced natural language techniques in order to generate a new shorter text that conveys the most critical information from the original text” Text Summarization Techniques: A Brief Survey, 2017. With simple words with extractive summarization we will use an algorithm to select and combine the most relevant sentences in a document. Using abstractive summarization methods, we will use sophisticated NLP techniques (i.e. deep neural networks) to read and understand a document in order to generate novel sentences.  In extractive methods a document can be seen as a graph where each sentence is a node and the relationships between these sentences are weighted edges. These edges can be computed by analyzing the similarity between the word-sets from each sentence. We can then use an algorithm like Page Rank (we will call it Text Rank in this context) to extract the most central sentences in our document-graph.  Text Rank algorithm

The carbon footprint of NLP and why I prefer extractive methods to create meta descriptions

In a recent study, researchers at the University of Massachusetts, Amherst, performed a life cycle assessment for training several common large AI models with focus on language models and NLP tasks. They found that training a complex language model can emit five times the lifetime emissions of the average American car (including whatever is required to manufacture the car itself!).  While automation is key we don’t want to contribute to the pollution of  our planet by misusing the technology we have. In principle, using abstract methods and deep learning techniques offers a higher degree of control when compressing articles into 30-60 word paragraphs but, considering our end goal (enticing more clicks from organic search), we can probably find a good compromise without spending too many computational (and environmental) resources. I know it sounds a bit naïve but…it is not and we want to be sustainable and efficient in everything we do.

What is BERT?

BERT: The Mighty Transformer 

Now, provided the fact that a significant amount of energy has been already spent to train BERT (1,507 kWh according to the paper mentioned above), I decided it was worth testing it for running extractive summarization  Bert from Sesame Street I have also to admit that It has been quite some time since I entertained myself with automatic text-summarization of online content and I have experimented with a lot of different methods before getting into BERT.   BERT is a pre-trained unsupervised natural language processing model created by Google and released as an open source program (yay!) that does magic on 11 of the most common NLP tasks. BERTSUM, is a variant of BERT, designed for extractive summarization that is now state-of-the-art (here you can find the paper behind it).  Derek Miller, leveraging on these progresses, has done a terrific work for bringing this technology to the masses (myself included) by creating a super sleek and easy-to-use Python library that we can use to experiment BERT-powered extractive text summarization at scale. A big thank you also goes to the HuggingFace team since Derek’s tool uses their Pytorch transformers library ?. 

Long live AI, let’s scale the generation of meta descriptions with our adorable robot [CODE IS HERE]

So here is how everything works in the code linked to this article.  Infograph of our AI
  1. We start with a CSV that I generated using the WooRank’s crawler (here you can tweak the code and use any CSV that helps you detect where on the site MDs are missing and where it can be useful to add them); the file provided in the code has been made available on Google Drive (this way we can always look at the data before running the script).
  2. We analyze the data from the crawler and build a dataframe using Pandas.
  3. We then choose what URLs are more critical: in the code provided I basically work on the analysis of the wordlift.io website and focus only on content from the English blog that has already a ranking position. Feel free to play with the Pandas filters and to infuse your own SEO knowledge and experience to the script.
  4. We then crawl each page (and here you might want to define the CSS class that the site uses in the HTML to detect the body of the article – hence preventing you from analyzing menus and other unnecessary elements in the page).
  5. We ask BERT (with a vanilla configuration that you can fine-tune) to generate a summary for each page and to write it on a csv file.
  6. With the resulting CSV we can head back to our beloved CMS and find the best way to import the data (you might want to curate BERT’s suggestions before actually going live with it – once again – most of the cases we can do better then the machine).
Super easy, not too intensive in computational terms and…environmentally friendly ? Have fun playing with it! Always remember, it is a robot friend and not a real replacement of your precious work. BERT can do the heavy lifting of reading the page and highlighting what matters the most but it might still fail in getting the right length or in adding the proper CTA (i.e. “read more to find …”).

Final thoughts and future work

The beauty of automation and agentive SEO is in general, as I like to call it, that you gain super powers while still remaining in full control of the process. AI is far from being magic or becoming (at least in this context) a replacement for content writers and SEOs, rather AI is a smart assistant that can augment our work.  There are some clear limitations with extractive text summarization that are related to the fact that we deal with sentences and if we have long sentences in our web page, we will end up having a snippet that is far too long to become a perfect meta description. I plan to keep on working to fine-tune the parameters to get the best possible results in terms of expressiveness and length but…so far only a 10-15% is good enough and doesn’t require any extra update from our natural intelligence. A vast majority of the summaries look good and it is substantial but still goes beyond the 160 character limits.  There is, of course, a lot of potential in these summaries beyond the generation of meta descriptions for SEO  – we can for instance create a “featured snippet” type of experience to provide relevant abstracts to the readers. Moreover, if the tone of the article is conversational enough, the summary might also become a speakable paragraph that we can use to introduce the content on voice-enabled devices (i.e. “what is the latest WordLift article about?”). So, while we can’t let the machine really run the show alone, there is a concrete value in using BERT for summarization. 
Pagination SEO for WordPress — Boost Session Length and Page Views

Pagination SEO for WordPress — Boost Session Length and Page Views

Pagination allows website editors to split long content into different pages. This technique really belongs to the ABC of web design and information architecture, but — still — pagination SEO best practices are debated. Therefore, dealing with it is not that easy as it could seem.

In this article, we are going to guide you over the dos and don’ts of pagination from an SEO standpoint and to present you WordLift Pagination, a quick and easy-to-use plugin to apply an SEO friendly pagination to your WordPress articles.

What are the benefits of WordLift Pagination on your editorial content? The impact of the pagination plugin on the engagement metrics is terrific.

WordLift Pagination — Engagement Metrics Growth: Pages / Session +104%; Session Duration +70%; Bounce Rate -19%

Source: Google Analytics of Windows Report on a selection of articles where the Pagination Plugin has ben applied

Is pagination good for SEO?

Pagination helps SEO as long as it helps the reader consume content in a simpler way. We measured a 4% increase in rankings on long articles that had been paginated: accessing the content, from mobile devices, was faster and simpler (the table of content helps readers jump to specific sections).

Why is our pagination giving such a good impact on the website’s metrics?

For years, we’ve been huge fans of the long-form articles, since Google seemed to appreciate the capacity of a piece of content to approach a topic with a detailed, in depth approach.

With the roll-out of the Mobile First Index, something started to change… again. Obviously, long-form articles can take longer time to load — because of the presence of multimedia content such as images, audios, and videos. That’s why Google started to prefer short content for some keywords.

We have noticed that some SERPs are now dominated by lighter content with 800 words or even less that contain few media and are rendered in less than one second on smartphones and other mobile devices.

So… what happens when you have a long-form article which is outranked by short content? Well, here is where the WordLift Pagination comes in very handy by fragmenting the content into short fraggles (if the word fraggle doesn’t sound familiar to you, you definitely have to watch this webinar by Cindy Krum) which make enough sense to answer to searcher’s intent.

Before discussing further the functionalities and results of our pagination plugin, I’d like to give you an overview on the state of pagination SEO. On the editorial strategic side, the first question you need to ask yourself is…

Article pagination: when should I use it?

Pagination is used to divide lists of articles and products, to provide an easy way to access to the multimedia content of a gallery, and to break long-form articles into digestible chunks of information.

Let’s focus on article pagination: when and why should you apply it to your content?

  • When the SERP you are competing for is dominated by short, straight-to-the-point content: in this case, just a second or two on the mobile page speed can make a lot of difference in your traffic metrics.
  • When your article serves different specific search intents together with a broader one. In this case, breaking the content into small chunks of information can help your users find immediately what they are looking for.
  • When your article contains many multimedia items that could make the page heavier and hard to access from mobile devices/connections. Dividing the content into different pages allows the browser to download small pieces of content instead of a heavy page crowded with images and videos. It would result in higher page speed.

As you can notice, in both cases the UX should be on top of concerns. Pagination only makes sense when it adds something to the user experience.

Pagination and SEO: a complicated relationship

Pagination has always been quite problematic for SEO. In fact, as Rand Fishkin highlighted almost 10 years ago,

Pagination […] affects two critical elements of search engine accessibility.

  • Crawl Depth: Best practices demand that the search engine spiders reach content-rich pages in as few “clicks” as possible (turns out, users like this, too). This also impacts calculations like Google’s PageRank (or Bing’s StaticRank), which determine the raw popularity of a URL and are an element of the overall algorithmic ranking system.
  • Duplicate Content: Search engines take duplication very seriously and attempt to show only a single URL that contains any given piece of content. When pagination is implemented improperly, it can cause duplicate content problems, both for individual articles and the landing pages that allow browsing access to them.

For years, SEO experts dealt with this issues using rel=“next” and rel=“prev”. These link attributes were used to help search engines understand that the linked pages where included in the context of a pagination.

Adding more complexity to the matter, this March Google announced that it no longer uses rel=“next” and rel=“prev” as an indexing signal.

As you can imagine, the SEO community reacted to this tweet feeling lost and confused. A few days after, John Mueller specified that Google treats paginated pages as normal ones for its indexing and ranking purposes.

Not all the search traffic comes from Google, and even if Googlebot is ignoring this link attributes, Bing is not.

So, the problem is still there: how to deal with pagination from an SEO standpoint?

Dos and Don’ts for Pagination SEO

Below, you will find a list of best practices. All the technical SEO aspects have already been incorporated in our SEO Pagination Plugin.

  1. Create unique URLs for each paginated page. Each page should have a unique URL to allow Google to crawl and index your content.
  2. Use crawlable links to paginated pages and allow paginated pages to be indexed.
  3. Use the right signals to indicate to Google that paginated pages are canonical URLs and should be indexed.
  4. Put the links to all the paginated pages on each of them in order to reduce click depth.
  5. Create unique and useful content on pagination pages.
  6. Manage pagination keyword cannibalization.

Here you find some outdated or ineffective strategies that you should avoid if you don’t want pagination to be penalizing for your website:

  1. Don’t let Google decide how to prioritize your paginated content. Give clear signals to the crawlers to be sure that your content will be interpreted and indexed appropriately.
  2.  Don’t create a View All version of your paginated content — keep in mind that you need to serve the UX. If a content is too long for your users, then it doesn’t make sense to create a separate View All version for search engines.
  3. Don’t use the first page as the canonical page for all paginated pages. This would give crawlers a wrong signal, because the content of each page is different.
  4. Don’t add noindex to the paginated pages and don’t use any other technique to discourage or block crawlers.
  5. Don’t use infinite scrolling or load more, because if you do certain crawlers could not be able to actually crawl all your content.

Meet the WordLift Pagination — the SEO-friendly Pagination Plugin

In WordLift, we want SEO to be as easy as possibile, automating tasks so that our users can focus on crafting great unique content. That’s why we have developed WordLift Pagination, the first SEO-Friendly Pagination Plugin — which helps you add pagination to your content in a snap, without even worry about SEO, because it does it for you.

How does WordLift Pagination impact on session length and page views?

The first experiment with WordLift Pagination was conducted with our VIP client Windows Report. We applied the pagination to long-form articles on windowsreport.com. The results on the engagement metric was unexpectedly positive even for us.

Pages per Session increased from 1,13 to 2,31 - Pagination SEO - WordLift Plugin

Source: Google Analytics 

Splitting single page content in paginated articles had a positive impact on pages per session, session duration, and even on page rankings (+4%). These results prove that the WordLift Pagination improved the user experience, and triggered the growth of all the engagement metrics.

Pagination and Page Speed

The pagination plugin also creates a huge impact on page speed.

In the context of a large website with an average page speed of 2 seconds, 14 of the pages created with the pagination plugin are the fastest pages on the site according to the new Speed report of the GSC.

SEO Pagination and Page SpeedWordLift Pagination — Page Speed 22 milliseconds

How does this affect the rankings?

In the case of our client Windows Report, the rankings of the paginated articles went up by 4% on average, which results in an impactful improvement in terms of traffic. Our assumption is that the growth of the rankings was the direct consequence of a better mobile UX — which is mainly, but not only, related with an increased page speed.

The improvements in terms of engagement can be also read as a signal of a UX that really works.

What can you do with WordLift Pagination?

Here is what our new stand-alone plugin does for your long-form articles:

  • Splits your articles into different pages on the basis of your headings
  • Adds a Table of Content linked to the single pages that have been generated for the readers who only need to read specific chunks of the article
  • Adds a set of numbered navigation links on the bottom of each page for the readers who want to read the article consequentially.

To have it on your pages, all you have to do is installing the plugin and adding a flag on the long-form articles that you need to split into different pages. It’s that easy! ?

Ready to add pagination to your content in a snap? Install our WordLift Pagination now!