With artificial intelligence changing the landscapes of many industries, GPT-3 like technologies are becoming well-established and powerful web tools. With the introduction of more capable language models, it’s only a matter of time before artificial intelligence writing tools achieve mass adoption. But how do you get a competitive edge when everyone has access to the same language generation tools? How can online businesses leverage the latest technologies to automate at scale without scarifying the quality of content production?
In this article, I am going to provide you with everything you need to know about the use of GPT-3 to automatically generate product descriptions for an e-commerce store. I’ll give a walk-through of a real-world example of applying natural language generation to auto-generate several product descriptions.
But before that, let us introduce these advanced language models, their benefits, and some of the adoption challenges that arise from using them.
- What is GPT-3?
- What can GPT-3 do?
- Is GPT-3 a trustworthy source?
- But what about the correctness of product descriptions?
- Is GPT-3 a competitive advantage for businesses?
- What can GPT-3 achieve for e-commerce?
- Can GPT-3 generate good product descriptions?
- Test 1: AI-generated descriptions with the pre-trained model (without fine-tuning)
- Test 2: AI-generated product description using the fine-tuned model
- More data isn’t always better
- Maintaining the right tone of voice
- Never underestimate the power of prompt design
Learn more about Prompt Engineering in SEO, what it is and how you can use diffusion models to make your SEO strategy different!
The process of content writing is an activity that almost every e-commerce store and blog owner must constantly invest in to produce fresh and high-value content. Content can take various forms such as blog articles, product descriptions, knowledge bases, landing pages, among many other formats. Producing good content involves planning and a lot of effort.
A digital threat and an opportunity
The thought that AI technology can write like humans and thus produce infinite content can be perceived either as a digital threat for some businesses or a miraculous opportunity for others. AI writing technology has made tremendous strides, especially in the past few years, drastically reducing the time required to create good content.
From good to great
While content isn’t always enough, only a professional content writer would transform the AI-generated content from good to great. Human supervision, refinements, and validations are crucial to delivering great AI-generated content with the current technology. Human in the loop is needed and it matches Google’s stance on content generated by machine learning as Gary Illyes mentions and reported in our web story about automatically generated content in SEO.
Right now our stance on machine generated content is that if it’s without human supervision, then we don’t want it in search. If someone reviews it before putting it up for the public then it’s fine.
More capable models
Data on the rate of progress in AI shows that the technology is moving so fast. In his recent essay, Jeff Dean, SVP at Google AI, outlines the progress and the directions in the field of machine learning over the next few years. Language models are one of the five areas that are expected to have a great impact on the lives of billions of people.
The competition to produce larger machine learning models has been ongoing and, in many cases, has led to significant increases in accuracy for a wide variety of language tasks such as language generation (GPT-3), natural language understanding (T5), and multilingual neural language translation (M4). The graph below, from the Microsoft research blog, shows the exponential growth in terms of the number of parameters for the state-of-the-art language models.
With these notable gains in accuracy have come new challenges. Therefore, there is a real need, now more than ever, to alleviate the complexity of these models and reduce their size. To continue to move the field forward, new breakthroughs are needed to answer the various sustainability aspects:
- From a technical perspective, the next generation of AI architectures are paving the way for general-purpose intelligent systems. Google introduced Pathways, an AI architecture that is multitasking, multi-modal and more efficient. Such architectures promote less complex AI systems and allow these systems to generalize across thousands or millions of tasks.
- Training large-scale language systems is a costly process, economically, and environmentally. Tackling the main reasons that are motivating the race towards increasing the number of parameters is the aim of Retrieval-Enhanced Transformer (RETRO), a new language model by DeepMind. The approach consists of decoupling the reasoning capabilities and the memorization in order to achieve good performance without having to significantly increase computations.
- With large-scale language models arise concerns related to their truthfulness when generating answers to questions. These concerns range from simple inaccuracies to wild hallucinations. Evaluating the truthfulness of these models and understanding the potential risks requires quantitative baselines and good measurements. In fact, not all false statements can be solved merely by scaling up. TruthfulQA is a first initiative to benchmark the truthfulness of these models towards improving them.
What’s not possible today will be possible in the near future with the rate of progress in AI. That’s why everyone and every business needs to get prepared for the adoption of AI-writing tools. If you’re interested in the topic of the rising tide of AI-generated content, The Search Singularity: How to Win in the Era of Infinite Content is a must-read.
GPT-3: What You Need to Know
What is GPT-3?
GPT-3 stands for Generative Pre-trained Transformer. It is an auto-regressive language model that uses deep learning to produce human-like text. OpenAI, an AI research and deployment company, unveiled this technology with its 175 billion language parameters. It is the third-generation language prediction model in the GPT-n series and the successor to GPT-2, created by Microsoft-funded OpenAI.
What can GPT-3 do?
There are a plethora of real-world applications that can be built on top of GPT-3. This technology enables building various tools such as:
- text summarizers
- paraphrasing tools
- semantic similarity applications
- open-ended conversational tools
Furthermore, this large language model can produce unique outputs based on prompts, questions, and other inputs. In practice, with GPT-3 it’s possible to create:
- essay outlines
- blog introductions
- email templates
- copy for landing pages
- website taglines
- product descriptions
Here’s a screenshot of many possible applications with GPT-3 as listed on OpenAI’s website.
GPT-3 in action
To keep this article practical, I’ll show an example where GPT-3 returns a completion given a textual input, a prompt. The completion consists of one or more sentences, and typically its length is less than the maximum number of tokens (an API parameter).
Here’s the prompt that I wrote:
Multimodal search engines are systems that can handle more than one modality at a time. For instance, some multimodal search engines handle text and images in a single query.
After sending this prompt to GPT-3, it returned the following completion.
This test shows that GPT-3 is able to grasp the context and build upon it to generate related content. Quite good! This is what natural language generation (NLG) is about.
As a next step, a human could revise this machine-generated content to verify its factual accuracy before using it. It’s also interesting to use this GPT-3 generated paragraph to draw inspiration. For instance, I’d consider the following subtopics when writing an article about multimodal and multilingual search:
- Find use cases for combining text and audio in multimodal queries
- Talk about documents that contains multimodal elements
- Investigate how multimodal search change the way search engines operate
- Develop the topic around different languages, for audio and text, within the context of multimodal search
- Address the subject of combining and ranking multimodal results after issuing a multimodal query
It’s important to note that the completion above has been generated using the pre-trained model without any fine-tuning.
Is GPT-3 a trustworthy source?
GPT-3 draws information from multiple sources. This large-scale language model has been trained on a vast amount of written text from the Common Crawl dataset, Wikipedia entries, online books, and so on. While it’s perfectly capable of predicting words, it doesn’t provide the source of information. Consequently, verifying the correctness of the information is beyond the scope of the initial technology.
Verifying the accuracy of the information is one of the missing pieces that are yet to be addressed. Efforts to improve factual accuracy have emerged as a promising solution to meet the challenge. WebGPT, a new prototype that has been developed by copying how humans research answers to questions online, has been trained to cite its sources. In the context of Long-Form Question Answering (LFQA), it’s one step towards achieving a more truthful AI.
But what about the correctness of product descriptions?
The general principle of citing sources could be a perfect fit for open-ended questions. But this isn’t always the case for many other use cases. For instance, generating product descriptions requires a different solution. The product descriptions must precisely describe the product and its attributes. As a matter of fact, the source is already known: it’s the dataset of the e-commerce store that’s used to fine-tune the model. Instead, the e-commerce store would need a way to validate the attributes’ completeness and accuracy in most cases.
GPT-3 for e-commerce
Is GPT-3 a competitive advantage for businesses?
GPT-3 has the potential of transforming many businesses across different industries. GPT-3 is a research-driven technology and has its inherent limitations. It shares with other language models several pitfalls that, many of them, are relevant primarily to business contexts. Nonetheless, AI experts and leading practitioners are aware that they can gain a competitive advantage for businesses only when integrating the AI-generated content into data workflows for curation, validation, and establishing safeguards.
What can GPT-3 achieve for e-commerce?
In 2021, OpenAI reported that over 300 applications are delivering GPT-3–powered search, conversation, text completion, and other advanced AI features through their API. While GPT-3-powered tools, examples, and use cases are constantly growing, I’ll focus on a number of use-cases that are relevant to the context of e-commerce. Here are a few GPT-3 powered applications and ideas for various online businesses:
- Pull useful insights from customer feedback in easy-to-understand summaries
- Answer difficult questions and complex queries using semantic search
- Build a sentiment analysis classifier for social media data or reviews
- Generate a product taxonomy by organizing e-commerce products into categories and tags
- Write product descriptions for e-commerce
In this work, I’ll deep dive into the last use-case which is the generation of product descriptions with GPT-3. I’ll show you how to tackle a real-world example using a public dataset that anyone can use to generate similar e-commerce product descriptions.
Can GPT-3 generate good product descriptions?
By investing a vast amount of information across the web, GPT-3 can predict what words are most likely to come next given an initial prompt. That enables GPT-3 to produce good sentences and write human-like paragraphs. However, this isn’t a solution out of the box to perfectly draft product descriptions for an online store.
When it comes to customizing the output of GPT-3, fine-tuning is the way to go. There’s no need to train it from scratch. Fine-tuning allows you to customize GPT-3 to fit specific needs. You can read more about customizing GPT-3 (fine-tuning) and learn more about how customizing improves accuracy over prompt design (few-shot learning).
To fine-tune GPT-3 means to provide relevant examples to the pre-trained model. These examples are the ideal descriptions that, at the same time, describe the product, characterize the brand, and set the desired tone of voice. Only then, businesses could start seeing real added value when using AI-powered applications to generate product descriptions. To read more about AI text generation for SEO, check this practical example.
The following part of this work will harness the power of GPT-3 to generate product descriptions:
- Using the pre-trained model of GPT-3 without fine-tuning it
- Fine-tuning the pre-trained model with relevant data
From data to prompts
The dataset used in this work contains a list of e-commerce products. Each product can have multiple attributes and a description. The dataset can be downloaded from this Kaggle challenge: Home Depot Product Search Relevance. The challenge seeks to improve customers’ shopping experience by accurately predicting the relevance of search results.
In this work, we’re deviating from the initial aim of the challenge to adapt it to the use case of product description generation using GPT-3. Consequently, pre-processing work must be conducted to discard inapplicable data and extract valuable data.
As mentioned earlier, let’s go through the steps behind selecting a product and creating its associated dataset. From the online dataset, you need to download the following csv files called attributes.csv.
To help you get started, I prepared the various steps required to load the data, choose a specific product category, and extract the list of products within the chosen category with their related attributes. Find the related code in this Google Colab.
In this demo, I chose to generate descriptions for gloves, a product of the clothing category. It’s important to note that it’s possible to choose a different product from the dataset by running the same code and changing a few parameters.
Loading the attributes returns a set of products with their corresponding characteristics and values. Please note that every product has a unique product_uid and can have one or more attributes (column name) and each attribute can have a value (column value).
The next steps consist of a number of operations to:
- clean the data
- choose a category and a specific product
- drop columns that contain high percentage of empty attribute values
- pivot the joined attributes data frame to display the complete set of attributes of a product on a single row
The following image displays the columns for the chosen product. It’s important to note that the columns, which refer to the product attributes, are specific to the product you select. In this case, these columns are related to the attributes that are used to describe gloves. Consequently, these columns won’t be the same for another product.
Similar to the prompt example that I presented earlier, the data from the previous step has to be transformed into sentences. The goal of this step is to describe each product using the available attributes. Each product will have its own prompt.
In this context, prompt design, or prompt engineering, consists of assembling the attributes and their values in sentences. You could have a script that iterates over the list of available products and generates a corresponding prompt for each one. The image below presents a few examples of prompts that describe different pairs of gloves.
Generate product descriptions with GPT-3
Test 1: AI-generated descriptions with the pre-trained model (without fine-tuning)
Once a product is described using a few sentences, GPT-3 can be called to return the related completion. As the intention is to generate product descriptions, the associated endpoint, create completion, is invoked. It’s one of the many other endpoints made available by GPT-3’s API. Each time this endpoint receives a well-formatted request with a prompt, it returns a completion.
While it’s possible to directly use the pre-trained model from GPT-3 to create completions, this isn’t recommended for many reasons. In fact, the quality of the completion would certainly be below the expectation in terms of attributes’ correctness, writing style, tone of voice, etc.
Based on the result of the following test, it’s clear that the pre-trained model isn’t capable of returning good completions. A reminder that completions refer to the automated product descriptions to generate for the pairs of gloves.
For this test, we set the maximum number of tokens to 200 while keeping the default values for the remaining parameters. Curie, one of the engines powering GPT-3, is used in our settings. As a result, you can find below a number of completions returned by the pre-trained GPT-3 model.
A reminder that the returned completions from GPT-3 contain incorrect information as well as false statements.
A first sample completion
A second sample completion
Beyond the examples above, some completions were very short as in the following examples:
- Very short completions: “The measurements of these gloves are: length: 9.5cm. These gloves are manufactured in China.“
- Very short and out of context: “Always refer to the actual package for the most accurate information“
And many other completions suffered from a variety of issues (grammatical errors, poor structure, repetitions, etc.) as shown in the following screenshot.
Fortunately, there’s a way to produce good content using GPT-3. We’ll discuss and show this in the next section.
Test 2: AI-generated product description using the fine-tuned model
The good thing about the dataset used in this work is that it also contains descriptions for a large number of products. In the dataset, you’ll find another csv file named product_descriptions.csv.
Let’s load the descriptions in a data frame as depicted in the following image.
As you notice, the product_description data frame and the attribute data frame (from the previous steps) can be joined using the product_uid column. Doing so associates every product_uid with its corresponding attributes and description.
As in any machine learning task, the data is divided into two parts:
- The first part is used to fine-tune GPT-3
- The remaining part is used to run tests during inference
Adding more data, whenever available, is recommended in most cases. Nonetheless, it’s of valuable importance to have a collection of samples that reflect the reality of the products of e-commerce. At a high level, analyzing and reviewing means evaluating the data by asking and answering questions like:
- Do the product descriptions cover the essential characteristics of the products?
- Are there enough attributes to build unique prompts?
- Is it possible to map the attributes in the prompt with the information available in the product descriptions?
- Do the samples in-hand cover a good diversity of products for the e-store?
Taking the time to review the available information is obviously what will make a difference in fine-tuning a language model. This exercise will always reveal valuable insights that will help you improve the data before running a fine-tuning task—that’s where the real magic happens.
In this work, there are 136 product descriptions available for the gloves. Out of 136 descriptions available in the dataset 110 are used to fine-tune GPT-3. After fine-tuning GPT-3, new completions are generated using the customized model. For the sake of these tests, we made sure that the completions are generated for the same pairs of gloves used in Test 1 and also that these products are in the testing set.
In terms of the results of the fine-tuned model, the following two images depict the generated completions.
Another fine-tuned completion
Discover everything you need to know about GPT-3 for product description in our last web story.
The seductive path of good enough when generating completions
As shown in the AI-generated descriptions, GPT-3 can produce convincing sentences. While the general form of the fine-tuned completions looks good, there are some inconsistencies and mistakes related to some attributes. For instance, the first product isn’t water-resistant as per the dataset’s attributes. Additionally, an SEO would need to optimize the generated content for search and ensure that target keywords are present.
As it’s clear from these AI-generated completions, fine-tuning the model allows us to achieve much better product descriptions. However, it’s clear that the limited number of gloves as well as other elements are aspects that could transform any AI-generated description into great content.
There are many elements that a content writer needs to consider when writing effective product descriptions. Similarly, there are some best practices to consider as well as some pitfalls to avoid when using AI technology to generate automated completions online.
More data isn’t always better
The performance and the accuracy linearly increase with every doubling of the number of examples as per the general best practice provided in the documentation of OpenAI. Ultimately, one could augment the data available in this demo to push further the quality of the generated completions. Keep in mind that more data is not always better unless the data consists of high-quality examples.
Data distribution is an additional element that can make a difference. One could easily explore the data of the gloves dataset from various angles. As shown in the visualization below, it’s possible to group gloves’ data by color, segment it by the gloves’ size, and color it using the gloves’ type. Doing so is a straightforward way to evaluate the distribution of attributes (color, size, type, etc.) across the dataset. With Facets, the tool used to produce the visualization below, it’s extremely simple, and at the same powerful, to analyze the patterns from large amounts of data. Hence, one could find out whether some attributes have a lot of data while others have less or even no associated data at all.
Maintaining the right tone of voice
The size of the dataset used in this work is small and consequently the number of examples to fine-tune the model is limited. However, this isn’t the only challenge. Within the fine-tuned dataset there are gloves from various brands and for various target audiences. Some of these gloves are for skiing while others are for heavy-duty tasks. Maintaining the right tone of voice is another critical element when fine-tuning the model. Grouping brands, products, and using additional elements such as social media copies when fine-tuning the model is important and it makes a big difference.
Never underestimate the power of prompt design
When choosing a product to generate completions, there are some guidelines to keep in mind:
- To build a good prompt you need relevant attributes. The product needs to have at least a minimum number of attributes. Of course, different products will have different attributes but in general some attributes could be related to the color of the product, its material, its shape, etc.
- Not only the attributes are needed but also you have to make sure that relevant product descriptions are available. They are essential to fine-tune the model.
- From an SEO perspective, it’s important that your target keywords be present within the data. Otherwise, the chances to have them in the AI-generated descriptions are slim. In fact, these target keywords that searchers include in their queries need to be part of the prompts as well as the descriptions. For example in this work, this goes back to verifying that keywords like gloves, the type (e.g., skying, heavy-duty, gardening, cycling, etc), and the target audience (e.g, professional, amateurs, etc.) are available in the data in order for these keywords to make their way into the AI-generated data.
- Pro tip: When building the prompt, be sure that the attributes are relevant and that they also appear in the descriptions (that will be used to fine-tune GPT-3). Otherwise, the accuracy of the generated description will drop.
Discover everything you need to know about GPT-3 for product description in our last web story.
Content generation isn’t the final destination
With a dataset, a product feed, or a knowledge graph at hand, false statements and mistakes can be spotted using an automated validation process. The validation is a product-specific process. It’s a critical step of the end-to-end workflow. This is a post-generation step that aims to identify the presence of important attributes as well as the correctness of their values.
At WordLift, we have conceived and implemented a validation workflow that covers the process of automated generation and factual verification from end to end. Scaling this task is possible but, so far, this technology isn’t totally capable of acting on its own without human oversight.
With the long arc of progress in the field of AI, the pendulum will not swing back. To continue to progress, everyone will need to adapt. With more and more AI-writing tools, online businesses and search engines alike need to develop new solutions and appropriate workflows to maintain growth and to provide a high-quality user experience.