In this article, I will share some of the ways natural language processing and the combination of semantic web technologies and machine-learning can help you outsmart your competitors and gain a true SEO advantage.
We hear a lot about AI these days and what it can do to help business, social networks and large organizations improve their competitiveness. In this article, I will focus on how AI-powered SEO can be used to help publishers increase the level of engagement of their readership and boost the findability of their content.
How are search engines using Natural Language Processing?
Search engines are becoming capable of better understanding the intents of the searchers thanks to the continuous advancements of their linguistic AI capabilities.
From detecting synonyms to disambiguating previously unseen queries with the help of named entity recognition, part of speech tagging, named entity disambiguation and sentiment analysis, natural language processing, the research fields that focus on transforming natural language into machine-computable information, is playing a central role in the way commercial search engines like Google and Bing as well as personal digital assistants, process our requests, index websites and find relevant content on the Web.
While in the past search engines like Google worked with statistical models built around keywords and links, we’re now seeing semantic graphs and machine learning algorithms deeply influencing the quality of the results as well as the way these results are presented to the end-user.
From voice responses to featured snippets, from interactive widgets like the news carousel to zero-results SERP.
Try asking the Google Assistant “What is Semantic SEO?” and you will see the implication of an AI-first ecosystem where machines are trained with semantically-rich data to be able to answer using natural language rather than with a set of blue links.
Now that we’ve looked briefly at the evolution of search engines let’s move our attention to the other side of the coin: the content being published.
How can NLP help improve SEO?
NLP and semantic annotations help content being understood by machines. Adding semantic processing in a publishing workflow means using natural language processing to add a layer of semantically structured information that describes your content.
There are a number of ways that NLP is used today to improve SEO and user engagement. I will walk you through few use cases and some leading examples of advanced SEO strategies.
Here at WordLift, we call this method structured data markup automation. You can use tools like the Redlink Semantic Platform, Alchemy from IBM or the APIs of Bing to extract entities. These entities, and their unique identifiers, can be used to describe your content to search engines and… yes, this is exactly what WordLift does 😉 and you can see it in action on websites with millions of visitors such as thenextweb.com, windowsreport.com as well as established publishers like Reuters or the BBC.
Is structured data really helping website traffic grow?
Just recently Google shared three business cases to promote the usage of structured data and to educate webmasters in improving the quality of their schema.org markup. Results can be astonishing. See Eventbrite’s case study: their website experienced a 100% growth in organic traffic from Google Search to event listing pages.
“Within two or three weeks we started seeing a visual difference in our event search results on Google,” Allen Jilo, an Eventbrite product manager, says. “The Google Search experience definitely helps drive more eyeballs to event pages. And when those people convert, it translates to incremental ticket sales for our event creators.”
2. Internal link building and content discovery
In-links help users discover content from your website and they also help search engines evaluate what your content is about and how effective the user experience can be for a user that arrives for the first time on a particular webpage. A strong logical internal linking structure helps your SEO significantly.
Wanna learn more? I've written a guide on how to conduct an effective internal linking strategy. Have a look!
With NLP and entity extraction algorithms, you can see what concepts can be detected by a machine. These algorithms are trained using machine learning techniques on large semantical databases extracted from Wikipedia or other openly available corpora of text. By looking at the list of extracted entities you, as a writer, might decide that your article deserves a contextual background and an introduction to some of the concepts that the NLP detected; this will help the reader (as well as search crawlers) understand “things” that they might not otherwise understand in-depth.
Just to give you an idea 10% of the searches that we make daily are meant to help us better understand things that we don’t know well. These searches usually are directed to Wikipedia or can be answered directly by the search engine with its knowledge graph panels. With NLP you can provide immediately this information to the reader without having him or her jump somewhere else.
We see a lot of these examples where NLP is used to create in-links that matter to the reader. Look for example at how The Guardian is using it in its articles to connect articles around “Russia” and “Vladimir Putin”.
3. Content recommendation
When content is annotated using natural language processing, the metadata is stored in a machine-readable format like JSON-LD, Microdata or RDF. Machine Learning is good at classifying information and predicting for instance, what the user will like to read next.
Content recommendations greatly improve what SEOs called dwell time, the time that users spend on a website between the click on a search result and the return back to the SERP.
The more the recommendations are good the more readers remain engaged with the content.
Adding a semantic layer of metadata to the content greatly improves the machine learning models that we can build to help the user jump from one article to another.
A great work in this area has been done by PoolParty and here you can find an interesting presentation on their Semantic Classifier and how it can help you create content recommendations that combine semantic enrichments produced by NLP with neuronal networks. The intersection between NLP, Semantic Graphs, and Machine Learning is also referred to as Semantic AI.
4. Smart redirections and 404s handling
This is a fairly narrow and yet very powerful mechanism that allows a website, like Quora, that is built around topics to route the user to the right topic by intercepting all the alternative names a concept might have. You can see it in action by directing your browser to a topic page like this:
You will notice that your browser automatically redirects the request to the topic page for Search Engine Optimization located at the URL:
The web server of Quora has been configured to understand that “SEO” is equivalent for “Search Engine Optimization” and this is done by de-referencing the entity Search Engine Optimization in public knowledge graphs like DBpedia where all the synonyms for a given concept are described.
In other words for every topic page, by de-referencing each concept with the equivalent entity in large linguistic graphs, Quora is able to configure multiple 301 redirects to intercepts requests without having to worry about how each user is calling a specific concept. Yes, this configuration can be also easily implemented via WordLift 😉 when entities are detected by the NLP, WordLift de-reference them using Wikidata, DBpedia, Yago and other large semantic graphs.
5. Topic targeting
In recent years the attention has moved – at least for some SEO Experts – from targeting keywords to targeting topic clusters. As search engines are more capable of understanding the world around us and disambiguation comes into play the same results can be presented to the user across multiple searches that share the same intent; the competition is no longer on targeting a specific keyword but rather becomes about being relevant for a specific topic.
Relevancy, practically speaking, is achieved by expanding a topic in all the directions that might be of interest to our user.
With linguistic AI and word-vectors, we can start exploring a concept to see how it is semantically related to other concepts. This can guide us in building the proper context around it – there is a very interesting article on Word-Vectors and SEO implications that you should read to learn more about this technique.
If you want to start immediately playing with Word Vector I also suggest you spend some time playing with Google Semantris. You will see what machine learning can do when applied to semantics.
6. SERP Analysis with NLP
When you start analyzing multiple keywords and how they behave over time you basically look at the top 10 or 20 results for each keyword and how Google ranks the content behind each website. As keywords to track increase it becomes extremely complex to understand the trends behind all these web pages.
Back in 2013, as we were doing agency work for a Fortune 500 company, we started to use natural language processing across SERPs to get an immediate overview of what “entities” were driving these rankings and how the content was evolving as Google was updating its results on the target keywords.
I was very pleased to find a great presentation by Stephan Solomonidis that describes exactly this same process.
NLP and entity extraction, as well as Semantic AI (the use of knowledge graphs and machine-learning), are heavily used today by large online properties like TheGuardian or TNW, as well as social networks like Quora. With tools like WordLift, these technologies can be immediately used on personal blogs, e-commerce websites and mid-sized content magazine to improve SEO and to boost user engagement.
APIs provided by Microsoft, IBM, Google or open source technology providers like Redlink can help SEOs in quickly reading content more effectively with the of an AI that can scan pages at the speed of light.
Use NLP and large public graphs like Wikidata and DBpedia to improve the structured data markup on your pages
Create relevant links and describe the topic that is relevant to your target audience
Exploit semantically-rich metadata to improve the quality of the content recommendation on your site
Configure smart redirections and 301s by de-referencing entities and expanding the synonyms of a given topic (so that users can always find the page they want on your website)
Play with word-vectors to find inspirations on concepts that you might want to cover in order to become relevant for a specific topic 6. Analyse your competitors using NLP to quickly track what concept is driving the changes in the SERPs of Google.
24 years ago, before anyone had ever heard of a tweet, a Facebook timeline, or even what it meant to “Google” something, there was France.com. Jean-Noel Frydman began this website as a way for those around the world to get an up-close look at what France has to offer.
The world of technology is constantly changing, but not always in a positive way. Unfortunately, in 2015, the French Ministry of Foreign Affairs created a lawsuit against France.com stating that the use of the name “France” was against French law. The laws regarding the online world are not apparent to everyone, but they can be found on ICANN.org.
Despite the unfortunate determination of Frydman’s website, he still stands by and he is now launching his online business consultancy. Check out our interview below!
You’ve been in the digital marketing space since 1994, when you started your first business online, France.com. Can you tell us a little bit more about your background and how you began your path as a digital entrepreneur?
I was working in Los Angeles, in the movie distribution business. I had no knowledge of computers, but when I stumbled upon the Internet, I thought it would be a very exciting place to launch my business. I had no idea how the online world worked, but that was not a disadvantage since nobody knew anything about this nascent industry.
How did you manage to build a successful online business in the travel industry over the years?
By focusing on one theme (in our case France) and trying to cover it as well and as extensively as possible. We were also able to be successful by creating new products for that niche and trying to consistently bring innovation to help people make the best of their trip to France.
You were starting to use WordLift – how much has SEO changed in the course of these years?
SEO has been constantly changing since the dawn of the search engines. However, I would leave the details of how it has changed to the minds of Wordlift, who know this much better than me! But in the small time that we used Wordlift, I was impressed by the results. If only we still had the site to show off your accomplishments!
You also managed to build a good relationship with the French institutions overseas, didn’t you? And then what happened?
Yes, in fact my first employer was with the Ministry of Foreign Affairs, and our first client at France.com was the same minister. I have worked closely with the French government tourist Office for the last 24 years. I am unsure of what happened to our relationship. These institutions that backed our activities for 24 years suddenly decided to paint us as ‘digital pirates’. Why the turn of heart? The only explanation is a desire from these institutions to profit from having France.com without having to indemnify us or purchase it legally.
We have talked with Yves Mulkers, founder of 7wData, to discover his experience as a publisher on how data helped him grow his editorial business. See how he gained a +60% on organic users acquisition.
Most people don’t get excited about data and very few would think that there is something exciting about it, but Yves Mulkers is an exception. Graduated in chemistry and then converted to the IT industry, in 2015 he founded the online magazine 7wData, which hosts trending news about the world of data and all its facets. 7wData’s purpose is to help people understand how data can work for them.
In fact, Yves describes his website this way:
“7wData is a blogging platform to foster innovation & matchmaker between people and products, and foremost is here to trigger your data appetite.”
Passioned about music, technology, and data, Yves started to join the dots of his own interests and career in his DJ days, trying to organize his major vinyl collection by building his own record management tool and CRM system.
He was always looking for tools and technology that support his vision and maximizing and optimizing his skills.
“Organizing, structuring, modeling, sharing knowledge and stories have been like a red-wire in everything I do where I like to inspire people with the things I do” he said.
How did the idea of 7wData come up?
I needed a place to share all the inspiration I get and gather, where I can gather my network around my favorite topic at the moment: data. When I started, we were still talking about business intelligence and data warehouses. The early days of the hype of big data, which would solve all our data problems.
7wData helps me to keep my network informed and inspired by what you can do with data, how should do it, and what value it can bring to you. It as well pushes me to stay on the forefront of what is happening in the market.
In your experience, what are the aspects that make people more curious about data, AI, and innovation?
Data is for most people not sexy, not tangible, it’s technical and has to do with IT and geeky stuff. But people slowly start to understand that with all this data, technology and mathematics we can be supported in our everyday lives. The time is now, that we can try a lot of things, very quickly, without major investments. Try many, fail early, and learn, compare it as how nature works and evolution of the species came about.
You are passionate about AI. How do you think these new technologies are changing online publishing?
AI will play an important role in automating the typical reporting like sports facts. Artificial Intelligence will allow finding relevant information in a speedy manner. It was big at the time when search engines started to arise, back in the 90’s.
Great to be able to find information in such an easy way, at the tip of your fingers. But seeing the huge volumes of information, it takes us more than ever to find relevant, meaningful information.
This is where AI will help us into be relevant and selective.
How are you using the data you produce with WordLift on your website?
My big vision is to build a one-stop place where you can find all information regarding data: inspiring stories, how-to’s, who-is, which products, what jobs are available and what skills do you need. This to help everybody from c-level to practitioner make the best out of data.
WordLift helps me build that knowledge graph and it also interconnects the content assets, with a minimal effort. At the same time, it prepares and optimizes my content for search engines and voice search, which allows people to find my content in an even easier way.
WordLift’s unique approach is the semantics it has on board. Most tools work only on the literal terms, whereas WordLift can identify for example A.I in a sentence, and knows it is the same as AI, Artificial Intelligence…
Can you already see WordLift‘s effect on your website?
We are still in the evaluation phase, and remain skeptical, because things, working so well, always make us suspicious.
But to call it by the numbers, we saw in the first weeks of implementation a 30% increase in traffic compared to previous periods. To be honest, we never did any SEO optimization before.
Almost all our content is now wordlifted, and we still see a week-over-week increase.
After 3 months using WordLift, organic users acquisition grew by 60% – when compared to the previous period
If we compare to the same period last year, we see search traffic increase of 60%.
Looking back ten years from now, we’ll probably say: “it all started with a hair salon reservation.” In fact, what seemed a simple conversation, in reality, opened up a Pandora box, for better or worse. Yet beside socio-cultural evaluations, this will have an enormous impact on businesses, online and offline.
In many other articles about Google Duplex, several perspectives have been taken into account. In this article instead, I want to give you a different angle on why, from the business standpoint, it makes sense for Google to move in that direction. In fact, when companies like Google, which most important asset is its users’ data, make a move, I believe it is essential to understand why.
The Turing Test is a thing of the past
When in the 1950s Alan Turing was thinking about machine intelligence he started with a what seemed a simple question: “Can machines think?” However, this questions carries many hidden philosophical problems. Not the least, how would you define thinking? That is why Alan Turing turned the question upside down. Rather than thinking or define thinking. Alan Turing decided to look at the problem from another perspective: “Can machines do what we (as thinking entities) can do?”
Listening to this conversation, would you even guess that this is a conversation between a human and a machine? I didn’t, and I bet you neither. But how did we get here and what implications does this have for the future?
The digital divide of small localized businesses vs. large tech conglomerates
When Google is working on practical applications for its Google assistants, deciding where to focus their effort is critical. In fact, if we look at the data related to small businesses digitalization, you realize how they are slow at adapting to the modern technological landscape.
In other words, although things like AI and machine learning resonate in the marketing world and it is the primary concern of tech giants like Google, Facebook or Amazon. In reality, small business owners not only are unconcerned about those topics. But they are still in the process of understanding why they need digitalization at all for their businesses. For instance, if you think about a small restaurant or a hair salon, which are businesses taken as an example of Google’s Duplex experiment. You realize that is easier for Google to get offline, rather than those small businesses join the online world.
As those local activities mainly rely on word of mouth and traditional media, it would be tough for Google to reach those businesses (although Google has already moved in that direction with Google my business). What to do then?
If a small business doesn’t go online, Google goes offline
We give Google for granted. Yet It’s hard to keep in mind that Google, as a digital business, monetizes thanks to the data of people that are always online. What about people offline? Google Duplex might be a way to close the digital divide and leverage on people always connected to start gathering data about businesses that are offline:
In other words, Google Duplex becomes the middleman that allows Google’s Assistants to collect critical data about offline businesses for voice search.
While technologies pass, data stay, and Google Duplex can be the growth engine toward voice search
AI, machine learning and the plentiful of new technological applications that are springing up thanks to those are at the center of today’s debate. However, although technologies play a crucial role what truly matters is data. In fact, on the one hand, new machine learning models allow the processing of large amounts of data. Thus, if in the past the data gathered couldn’t be of much use by companies or governments as we didn’t have the computing power and intelligence to process that. Now, this is possible.
On the other hand, we have to keep in mind that data is what matters. When Google and Facebook offer free services to users, they are not volunteering; they are building up a business. As voice search is expected to become a $40 billion market (in US alone) by 2022; Google Duplex can really become the growth engine that allows Google to gather the most important data through voice to take over the market.
“Hey Google,” this is a country for old men!
If you think about it, this might the most ingenious business strategy. While in the last two decades Google used the data of users to build up a business that as of 2017 made over $95 billion from advertising there was still a disconnect. In fact, while the gathering of data from Google depended and it still does from the level of digitalization of its users; in the future, it will not.
If you think about digital assistants, like Google Home, those are consumer products ready to be in any home, independently from the use of a computer. In fact, in a few months over six million home speakers were sold.
You might expect though that voice search will disrupt – after all – the usage of computers. Isn’t – in a way – voice search the natural evolution of traditional search? In reality, if we look at the statistics, voice search will not just take over some market share related to traditional search, but it will take over old media:
In short, people asked about what media were Smart Speakers replacing; the answers were staggering. In the top seven media replaced by Smart Speakers, four of them (Radio, TV, Printed Press and Sonos) are traditional media.
Why is this important at all? For a few reasons – I argue. First, Voice Search might disrupt once and forever old media. In fact, while the web is still on a race against traditional media (it was just in 2017 that digital ad spending surpassed TV ad spending), Voice Search has the potential to disrupt it as it will have many potential practical applications for worldwide households.
Second, the web created a greater divide between generations. This might not be the case for Voice Search. Smart Speakers can be activated with something humans have been using forever: spoken language.
Third, as the Mark Zuckerberg Senate Testimony showed, those tech giants business models aren’t easy to understand. We saw the scenes of struggling adult men and women trying to make sense of Facebook. In a way, many on the web read this, as a lack of intelligence from those politicians. In reality, it seems clear to me that companies like Facebook and Google, thanks to their asymmetric business models, make it hard for people to understand how they operate. As Voice Search will make it hard for Google to monetize on ads (imagine the only answer given by the Smart Speaker was an ad, would you trust it?) would they be willing to experiment with alternative, symmetric business models?
In this article, we saw how Google Duplex might be opening new business scenarios for Google. However, we also saw how Google Duplex would help the tech giant from Mountain View to target a few things at once. The critical aspects are:
Close the digital divide between tech giants and small offline businesses
Start collecting critical data by using connected users to collect data from offline small businesses
Although AI and machine learning are critical technologies that allow Google to become more sophisticated; the real asset is data
While the web is still competing with traditional media, Voice Search isn’t only taking market share from the web, but mostly from media like TV, Radio and Printed
While computers are still hard to understand for older or less tech-savvy people. Voice search is pretty much a technology that can be used by anyone
As Voice Search will make it harder for Google to monetize with ads alone, would this be an opportunity to experiment with more symmetrical business models?
Those are open questions. It is clear though that the power of Voice Search is its ability to unite digital to non-digital, millennials to baby-boomers, tech-savvy to non-tech savvy. From the business standpoint, it will be all about Voice Search domination!
Anywhere we look, there is an area where Artificial Intelligence is changing the rules of the game. From personal assistants that are changing the way we interact with machines; to self-driving cars and diagnostics systems that can diagnose certain diseases more accurately than doctors. That isn’t only due to the buzz of the media (that also contributes). This is due instead to the fact that AI is a vast area that touches several disciplines.
In this article, I want to show you how a field of Artificial Intelligence, called Natural Language Processing (NLP) helped Quora to become one of the most popular Q&A sites in the world. In fact, NLP has become so critical to Quora that it also has open vacancies for NLP engineers.
As WordLift was born as a university project from a Natural Language Processing research, we always look for best practices to see how the industry is evolving and helping the web become smarter.
Natural Language Processing applications
The main aim of NLP is to help computers’ program to process large amounts of natural language data by making sense of that. On platforms like Quora, with hundreds of millions of users keeping the quality of its content high is critical.
Hundreds of millions of people use Quora to discover high-quality answers to questions important to them. The quality of our content and the civility of our community are two important factors that make Quora special. We want to maintain that quality even as billions of people start using Quora.
The most effective way to be able to keep high-quality standards, while growing the user base is the ability to process that data in a way that makes it more valuable to its users. In fact, as explained further:
Such a rich dataset puts us in a unique position to use various Natural Language Processing (NLP) techniques to solve exciting problems critical to our success.
How’s Quora applying NLP? Here are 13 interesting ways.
Quality is critical for any platform to survive. For a Q&A platform like Quora, this is even more important. In fact, where users contribute all Quora content, how does it make sure to keep its quality high? First, we must define quality. Quora looks at things like writing style, readability, completeness, and trustworthiness.
This is a ranking issue. In fact, Quora has to look at many variables and give it a ranking based on its relevance and helpfulness such that most helpful answers show at the top.
To define the relevance an helpfulness Quora it needs to have five properties:
Answers the question that was asked.
Provides knowledge that is reusable by anyone interested in the question.
Answers that are supported by rationale.
Demonstrates credibility and is factually correct.
Is clear and easy to read.
Quora also gives an example of how it uses Natural Language Processing to extract relevant data to assess and rank answers:
As you can see Quora looks at various things; most of those are in the form of text. However, the text needs to be converted into data. Once that content becomes structured data that is how NLP helps turn that text in machine-readable data easily processed by its algorithms.
This gives Quora the opportunity to be more sophisticated in creating ranking systems by considering things such as author credibility, formatting, upvotes, and many other variables.
The flip of the coin of answers’ quality is questions’ quality. In fact, if you know Quora chances are you found it through Google’s search. In fact, if we look at its marketing mix more than 80% of traffic comes from search:
That’s because Quora is well positioned on so-called long-tail keywords that allow Quora to take over the SERP. Of course, this is also thanks to the fact that Quora can provide quality content to those answers. Yet, if Quora didn’t use a process driven by AI and machine learning, it would have been impossible to leverage on such a mole of natural language data.
That is why Quora uses the same ranking system that we saw above also to assess the relevance of Questions.
When you type something in Quora’s box, that has several functions:
In fact, this isn’t only a Q&A tool that allows anyone to ask something but also a way to search anything on the platform. You might think that the retrieval of information from that search box is mainly based on keyword matching. However, that is not the case as specified by Quora engineering team:
We use NLP techniques in this information retrieval problem space to help us better understand user queries and questions, as well as better rank content in the form of questions, answers, topics and user biographies. Unlike regular search engines with simple keyword matching, we can also support searches done with longer queries that are in the form of questions well.
Other NLP applications that make Quora smarter
When you transform the text into structured data, suddenly that knowledge which before was only accessible to humans becomes easily accessible to machines. Natural language processing helps make that transition, which translates human text in machine-readable data that can be fed to a system to make it more relevant for its users.
In this article, we saw how Quora uses NLP in three key areas. However, that is just the beginning. There are other areas in which NLP is crucial for Quora’s success:
Automatic Grammar Correction
Duplicate Question Detection
Related Question Generation
Topic Biography Quality
Automatic Answer Wikis
Hate SpeechHarassment Detection
Question Edit Quality
At WordLift we also use NLP to automate an important part of the digital marketing strategy, the so-called SEO. This article wanted to show you all the practical way in which AI is helping startups to build smarter systems that become more useful to their users beyond the buzz and hype created by media.
If you want to try NLP on your website, book a demo and let’s talk about your project. 🤓