Table of content:
- What is Natural Language Generation (NLG)?
- What is the difference between Natural Language Processing (NLP) and Natural Language Generation (NLG)?
- How can NLG help improve SEO?
- AI-generated content for SEO: a real-world case
- How to use NLG to create SEO-friendly content
What is Natural Language Generation (NLG)?
Natural Language Generation (NLG) is the use of artificial intelligence (AI) to generate written or spoken narratives from a set of data. NLG is related to human-computer and machine-human interaction, including computational linguistics, natural language processing (NLP), and natural language understanding (NLU).Sophisticated NLG software can be trained on large amounts of data, large amounts of numerical data, recognize patterns, and information in a way that is easy for humans to understand.
What is the difference between Natural Language Processing (NLP) and Natural Language Generation (NLG)?
NLP (Natural Language Processing) uses methods from various disciplines, such as computer science, artificial intelligence, linguistics, and data science, to enable computers to understand human language in both written and verbal forms. NLG (Natural Language Generation) is the process of producing a human language text response based on some data input. So, NLP accurately converts what you say into machine-readable data so that NLG can use that data to generate a response.
Originally, NLG systems used templates to generate text. Based on certain data or a query, an NLG system filled in the blank text. Over time, however, natural language generation systems have evolved to allow more dynamic, real-time text generation through the use of neural networks and transformers.
How can NLG help improve SEO?
Natural language generation has some incredible implications for your SEO strategy.
By using NLG systems, more content can be created faster. Not only does this save time, but it allows website owners, especially in e-commerce, to create content on a large scale.
However, when working with AI text generation for SEO, we need to be even smarter and more critical than usual. To create tangible and sustainable business value, we need to create workflows where humans collaborate with machines.
This means that we should not (yet) use GPT-3 or other transformer-based language models by letting them run free. We need to keep “control” over content quality and semantic accuracy to avoid unwanted biases and prejudices. Within this workflow, humans have the main task of nurturing semantically rich data in Knowledge Graphs.
The human side can be encoded in a knowledge graph. When we look at embedding the Knowledge Graph behind our blog for an entity like AI, the closest relationship in our small world leads us to SEO. Editors over the years have established this relationship between AI and SEO on our blog, and we can now use it to “control” content creation around one of these concepts.
In WordLift, we have developed a sophisticated approach to AI-generated content where we move from initial data collection and enrichment to active learning where the model is improved and the validation pipeline is also improved. The workflow consists of 4 steps as shown in the figure below.
What Google says about AI-generated content
Google’s Helpful Content Update, released in August of this year, has reignited the discussion around AI-generated content and how Google is addressing it.
Although not explicit, in the update Google confirms its aversion to AI-generated content and talks about content that is high quality and meets the needs of users, content that is written by people for people. Google wants to avoid low-quality content written for search engines and not for users.
The reasons for Google’s aversion to AI-generated content may be many, but the fact is that there are some truths to consider. The first is that AI-generated content performs well in search if it is of high quality. The second is that AI-generated content is still not easy to scale. Training the models is very tedious and there is still not complete confidence in the results obtained.
So what makes the difference? What makes high-quality AI-generated content that does not risk Google (and user) penalties? The answer is simple: data. As Kevin Indig also said: The biggest differentiator will be what inputs (data) companies can use to create content. And that’s where structured data comes in.
The key is to build a large dataset that you can keep adding to and that can provide potential insights. GPT-3 will help you extract valuable insights from it.Daniel Ericksson, CEO at Viable.
The crucial part of using NLG for AI-generated content in SEO is semantically rich data (that might also be beneficial for structured data). Language models trained on billions of sentences learn common language patterns and can generate natural-sounding sentences by predicting likely word sequences. However, when generating data in text, we want to produce language that is not only fluent, but also accurately reflects the content.
Structured data makes this possible. They are semantically enriched so that they provide the model with the information and attributes that can make content generation more accurate. In the case of GPT-3, for example, any data is converted to text to make it more usable, but without the structured data and the information it contains, GPT-3 would confuse even the best prompts. If you use structured data instead, you can achieve better results and outperform the competition, which is very large in this area, considering how many open automated content writing tools are available online.
Building a Knowledge Graph is essential for the effective use of NLG systems such as GPT-3. The Knowledge Graph is the dynamic infrastructure behind a website. It represents a network of entities – i.e., objects, events, situations, or concepts – and illustrates the relationships between them. Structured data is used to describe these entities. This way, your content is more easily understood by search engines and ranks better on Google.
With the Knowledge Graph, you then have the semantically enriched dataset you need to train models and create unique and original AI-generated content that respects brand identity and tone of voice.
AI-generated Content For SEO: A Real-World Case
How We Used NLG For E-Commerce
Here we would like to show you a real world case of how we used AI to create content that had a positive impact on SEO. In this case, we are talking about what we did for a corporation that includes some of the biggest international brands for accessories. So let us talk about AI-generated content for e-commerce.
This experiment proves that customized AI Product Descriptions can bring measurable SEO value to product detail pages (PDP). With AI-generated content, each variant of the same product gets a personalized description, whereas with standard workflow, all variants share the same description.
The experiment began a year and a half ago. In the first step of the workflow, we collected all the data we needed to better train and enrich the model. Thus:
- Product feeds to describe.
- Actual product descriptions or examples of product descriptions with the brand’s ToV;
- Other content with the brand’s tone of voice (e.g., social feeds).
- Brand kits and other materials related to the brand.
Then we selected the attributes. These must be available in the dataset provided and relevant to the product. The list of selected attributes was used to create the prompt.
To customize the model, fine-tuning is the way to go. We use the provided examples with product descriptions. This way, the content generated by the AI matches the products, brands, and tone of voice of the client. Once the prompt is ready, we can start creating the product description. The completion is renewed until we reach our goal that the product description is correct and validated in terms of number of characters (between 190 and 350). We will check the output with a native English speaker who will correct the generated text.
The validation process is an important step in automating the verification of the attributes described in the compilations. This way we can check the product description to see if all attributes are correct or missing. Once the compilation is validated, we share the output with the internal teams involved in the project, who approve and possibly integrate the descriptions before they go online.
An important step was to involve the internal editorial team from the beginning. Our goal is to improve their work, but they are the ones who have the rudder and need to be able to steer the boat in the right direction with the necessary adjustments.
After some testing:
- We determined that we could have a consistent model across the group of sites, with TOV customized for each brand.
- We have improved the level of validation and are becoming more flexible in handling the content rules we receive from the editorial and SEO teams.
- Two key components are improving the quality of the data used to train the model and interacting with the content team.
- As for validation, experiments with StableDiffusion and DALL-E 2 can help validate the overall quality of the final description.
With the first test conducted on one of the group’s websites, we achieved a 43.73% increase in clicks. By looking at the increase in the sales, the team estimated a potential double digit growth of the annual revenues.
Another test was performed on another site with a variant group compared to a control group. In this context, we saw a +6.19% increase in clicks. In this experiment, AI content was added to canonical URLs (which already contained a description).
How To Use NLG To Create SEO-Friendly Content
NLG applications use structured data and turn it into written narratives, writing like a human but at the speed of thousands of pages per second. NLG makes data universally understandable and aims to automate the writing of data-driven narratives, such as product descriptions for e-commerce.
Content creation is something that almost all e-commerce store and blog owners need to constantly invest in to produce fresh, high-quality content. Content can take many forms, including blog articles, product descriptions, knowledge bases, landing pages, and many other formats. Creating good content requires planning and a lot of effort.With the help of AI and natural language generation models, it is possible to produce good content while dramatically reducing the time it takes to create it.Remember that human monitoring, refinement, and validation remain critical to delivering excellent AI-generated content with current technology.
Generate E-Commerce Product Descriptions
You can use GPT-3 to generate product descriptions for e-commerce. Based on data provided to it during training, GPT-3 can predict which words are most likely to be used after an initial prompt and produce good human-like sentences and paragraphs. However, this is not an out of the box solution. To adapt GPT-3 to your specific needs and produce good product descriptions for your online store, you will need to go the fine-tuning route.
Make your e-commerce 404 pages “smart”
You can create a recommendation engine for your e-commerce that suggests products on the 404 error page. You can design the layout of the page using DALL-E 2 and create the error message using GPT-3. This way, your ecommerce 404 pages, which are among the most visited pages on your website, will help provide a relevant and optimal user experience.