Back to Blog

Harnessing LLM and Knowledge Graph for International E-commerce: Creating Product Descriptions in Korean

Learn how ecommerce sites use AI, knowledge graphs, and LLMs to create nuanced, multilingual product descriptions for global markets.

By Beatrice Gamba & Andrea Volpini

July 30, 2024

—

15 min read

entities in this article

Introduction: The Synergy of Knowledge Graphs (KGs) and Large Language Models (LLMs).
Why Graphs for Content Generation?
- Labeled Property Graphs and RDF Graphs
- Integrating LLMs with Knowledge Graphs for Content Generation
Demonstrating the Power of KGs as “Jet Fuel” for LLMs
Human-Centric Multilingual Approach
- The SEO angle to AI-generated multilingual product descriptions
Benefits of Using KGs and LLMs for Internationalization
Practical Use Case: Creating Product Descriptions in Korean
- The neuro-symbolic loop to validate product descriptions
Conclusions and references
Frequently Asked Questions

Introduction: The Synergy of Knowledge Graphs (KGs) and Large Language Models (LLMs)

In the world of e-commerce, creating compelling and accurate product descriptions is crucial for attracting and converting customers. This task becomes even more complex when dealing with international markets, such as Korea, due to the need for both linguistic and cultural accuracy.

This article explores how integrating Knowledge Graphs (KGs) with Large Language Models (LLMs) can revolutionize content generation. By combining these technologies, we enhance SEO and user engagement while making the content generation process more efficient and scalable.

We’ll demonstrate how leveraging structured data from KGs helps LLMs produce content that is precise, culturally relevant, and customized for each market, avoiding common pitfalls like content duplication and loss of nuance that often occur with traditional translation methods. This approach represents a significant advancement in global e-commerce.

Why Graphs for Content Generation?

Knowledge Graphs are structured frameworks that connect data points through relationships, offering a comprehensive context for understanding and retrieving information.

A Knowledge Graph is essentially a network of interconnected facts, where each fact is represented as a node and relationships between them are depicted as edges. This structure allows for the organization of information in a way that integrates various data models.

A Knowledge Graph not only organizes information but also provides context by linking related data points. As a result, it serves as a versatile tool that goes beyond traditional data models, offering a richer, more interconnected view of information and enhancing the ability to understand and utilize complex datasets.

Labeled Property Graphs and RDF Graphs

At WordLift, we specialize in creating custom AI models to generate content using data from our client’s knowledge graphs. Understanding the differences between Labeled Property Graphs (LPGs) and Resource Description Framework (RDF) graphs is essential to this process.

LPGs, commonly used in databases like Neo4j and Data Graphs, allow you to organize data with nodes and relationships, each carrying labels and properties. This structure is great for capturing detailed information about entities and their characteristics, making it easier to manage and analyze complex data.

On the other hand, RDF graphs, which are built using triples (subject-predicate-object), focus on linking data across various resources identified by URIs. This format is particularly useful for integrating data from different sources and ensuring compatibility with established web standards, like schema.org used in e-commerce.

In our work, RDF graphs are our go-to format because they excel at making data interoperable. This means we can seamlessly connect and use data from different systems, which is crucial for creating cohesive and effective digital marketing strategies for our clients.

Integrating LLMs with Knowledge Graphs for Content Generation

Large Language Models (LLMs) are advanced AI systems capable of understanding and generating human-like text. By combining the structured data of Knowledge Graphs (KGs) with the contextual understanding of LLMs, we can create a powerful tool for content generation that scales across different languages.

Neuro-symbolic AI, which integrates symbolic reasoning with neural networks, further enhances this synergy. It enables more accurate and context-aware content generation by leveraging the strengths of both KGs and LLMs. For example, enriching a KG with performance metrics and using aggregate functions can highlight the keywords most associated with a product. This process requires a solid data foundation, such as importing data from Google Search Console and the ability to query this information for in-context learning or fine-tuning.

Demonstrating the Power of KGs as “Jet Fuel” for LLMs

As Mike Dillinger (in a podcast curated by Knowledge Graph Insights), who has worked as the Technical Lead for LinkedIn’s Knowledge Graph, describes, Knowledge Graphs act as “jet fuel” for Large Language Models. There are indeed different ways in which we interact with a KG to boost SEO and marketing performances with its combination to LLMs:

Capturing Information: We use LLMs to gather data and structure it into triples, facts with context. This involves extracting relevant information from various sources and organizing it into a Knowledge Graph.
Training the Model: The KG is then used to train a small or large model with the examples captured. This training process helps the model understand the relationships and context within the data. We are currently working with closed-sourced models, such as GPT-4o, and GPT-4o mini, and open-source models, such as PHI-3, Llama 3.1, or Mistral.
In-Context Prompting: We feed the trained model with specific data from the knowledge graph as context. This approach is similar to Graph-RAG (Retrieval-Augmented Generation), where we enhance content generation by using structured data from the knowledge graph. By doing so, we ensure that the content produced is not only accurate but also highly relevant to the context, leveraging the rich information within the knowledge graph.
Validation and Evaluation: The KG also stores rules for validating the final outcome. These rules capture the evaluation of content editors, ensuring that the generated content meets quality standards.

The graph works end-to-end at every process step, functioning like an expert who educates and guides the language model. This presents an unprecedented opportunity to prevent hallucinations and ensure the content we generate is unique.

Because knowledge graph data is mostly language-neutral and modern LLMs are trained on extensive multilingual data, we can efficiently generate content in multiple languages with minimal effort. Additionally, content graphs often include separate sub-graphs or datasets for different languages, which further improves the quality of the content.

Search keywords for the same entity can vary significantly between countries and languages. These variations are captured as triples in the KG. When interacting with an LLM, these details ensure that the generated content meets the specific needs and quality standards of each target market and language, which is our objective.

Human-Centric Multilingual Approach

When going multilingual, humans remain central to the process. Contrary to common belief, a Knowledge Graph (KG) is not built by engineers working on the infrastructure and code. Instead, it is constructed by linguists and, in our case, content marketers and SEOs. The collaborative nature of this task, embodies its beauty.

At WordLift, we collaborate with language experts as external consultants. They provide initial feedback on the generated content, helping us refine the tone of voice and the quality of the outcomes. This feedback can impact various areas of the workflow depending on the project:

The rules that the system uses to validate and improve the content.
The underlying data model (e.g., what attributes are needed in the KG, what is the best ontology for describing a particular class of products?).
The best samples that can be used for calibrating the models.
The prompt engineering behind each generation.

This human-centric approach ensures the content is accurate, culturally relevant, and aligned with the brand’s guidelines.

The SEO Angle To AI-Generated Multilingual Product Descriptions

The use of knowledge graphs and LLMs for generating product descriptions can significantly enhance SEO performances by ensuring content is keyword-optimized in the native language of the website and tailored to the search behavior and informational needs of the specific country. Often, multilingual e-commerce websites seek convenience in translating the same product descriptions from English to other languages.

A primary concern is the potential loss of nuance and cultural relevance, as direct translations may fail to capture the local context, idiomatic expressions, or cultural references, potentially leading to misunderstandings and misinterpretation of the main query to target behind the product page.

Especially with the rise of Generative Search Engines, where users type longer queries, translated descriptions would fail to target the long tail keyword for the product in a foreign language.

Moreover, there is the risk of search engines perceiving translated content as duplicate content, potentially harming SEO performance.

Maintaining a consistent brand voice across different languages can also be challenging, risking the dilution of brand identity and customer confusion.

Generating product descriptions with AI is kind of one click away, but additionally to creating a product knowledge graph, SEO groundwork must be done beforehand. We develop detailed taxonomies that outline the specific product attributes to be highlighted in the descriptions. These attributes are essential for differentiating products and ensuring that each item’s unique features and selling points are communicated clearly.

To build these taxonomies, we begin by analyzing the client’s industry. This involves examining how competitors and retailers write their product descriptions and assessing the quality of these descriptions on the web.

Next, we identify the key product attributes that should be included. We do this by reviewing both the product data feed from the client and insights from our competitor analysis. We compile a list of characteristics that highlight what makes our client’s products stand out, along with common industry features for similar products. Our goal is to emphasize the unique aspects of our client’s products in their descriptions.

Once we have this list of attributes, we map them to the data in the Knowledge Graph and relevant information from the web. We then incorporate these attributes into the content generation process, guiding the AI model to include them in the product descriptions.

This meticulous approach to SEO helps create compelling, informative content optimized for search engines, enhancing visibility and attracting potential customers.

Ready to see how WordLift can boost your content’s visibility and attract more customers? Book a demo today and experience the power of knowledge graphs and LLMs in action!

Benefits of Using KGs and LLMs for Internationalization

Using Knowledge Graphs (KGs) and Large Language Models (LLMs) for content generation and programmatic SEO offers several benefits for internationalization:

Scalability: Easily create content in multiple languages without compromising quality.
Diversity: Country-specific attributes and local search queries help the system differentiate content for different markets and target languages.
SEO Optimization: Improve search rankings and visibility through structured data and context-aware content.
User Engagement: Enhance user experience with accurate and relevant product descriptions.

WordLift Content Generation

Practical Use Case: Creating Product Descriptions in Korean

This section provides a real-world example of creating product descriptions in Korean. We discuss the challenges faced and the solutions implemented to overcome these challenges.

The diagram above illustrates the construction of a simplified Knowledge Graph (KG) using the schema.org vocabulary and a custom namespace (not shown in the chart) with specific product attributes. The quality and granularity of the data directly affect the quality of the generated completion, because they provide the foundation for the content creation process.

High-quality data allows the model to generate starting from precise and reliable information, while granularity helps the model produce text that is not only accurate but also rich in context and nuance.

When the data in the knowledge graph is both detailed and of high quality, the model can incorporate specific facts, subtle distinctions, and relevant context into the generated text. In this example, we initially stored a set of country specific product highlights in schema:description, taken from the existing website and product reviews.

Once all the key attributes are added to the knowledge graph, we can organize and structure this information using a predefined set of rules known as a template language. This template language helps format the data to align with the desired output, ensuring that the generated content is clear and relevant. It’s often best to simplify these instructions when some data points are missing, as this helps maintain the clarity and focus of the resulting content.

In this example, the final prompt template is straightforward. It is written in the target language and mentions the client, a prominent Korean brand. The {% if name != blank %}{% endif %} construct omits any missing attributes from the prompt. Additional details are included in the prompt, as shown in the translated part below:

A word of caution:
Be objective in describing the actual features and benefits of your product.
Don't include hyperbole, unsubstantiated claims, or anything that could be legally problematic.
Limit your description to 60 words, and avoid using redundant words.
Increase your credibility by providing useful information for your customers.
Comply with your country's advertising and consumer protection laws.
Don't include product numbers.

We iterated several times on different samples before finalizing the prompt. We consistently found it relevant to mention the brand at the beginning and include the recommendations above at the end of the prompt.

The Neuro-Symbolic Loop To Validate Product Descriptions

As content is created, it is verified against the factual information in the Knowledge Graph (KG) to ensure accuracy. The KG may also suggest context-based additions if needed.

If the content doesn’t meet the required standards, it is flagged for a review by a local editor, following a human-in-the-loop process.

Finally, the verified and corrected descriptions are added back into the knowledge graph to improve the AI model. This step is crucial because it helps the model learn from the validation and corrections, reducing the likelihood of repeating the same errors in future content generation. Over time, this process decreases the need for human intervention.

Conclusions And References

In conclusion, leveraging the synergy of Knowledge Graphs and Large Language Models offers a transformative solution for creating high-quality, multilingual product descriptions. This approach not only enhances SEO and user engagement but also streamlines content generation, making it more efficient and scalable. By integrating structured data from Knowledge Graphs with the contextual understanding of LLMs, we can ensure that the content is accurate, culturally relevant, and tailored to the specific needs of each market. This method also mitigates the risks of content duplication and loss of nuance, which are common in traditional translation approaches.

The future of content creation lies in the seamless integration of human expertise and AI capabilities, ensuring that every piece of content not only meets but exceeds the expectations of a global audience.

To dive deeper into the topic:

Ready to see how WordLift can boost your content’s visibility and attract more customers? Book a demo today and experience the power of knowledge graphs and LLMs in action!

Frequently Asked Questions

Can I use Screaming Frog and ChatGPT to generate product descriptions at scale?

Yes, you can effectively use Screaming Frog and ChatGPT together to generate product descriptions. This integration allows you to automate and optimize the process of creating product content, leveraging the strengths of both tools. However, this approach has several limitations. The generated content will always need to be reviewed, as it tends to be more generic due to reliance on a commercial LLM that everyone is using. Additionally, it does not account for the specific needs of internationalization, such as cultural nuances and language variations. Furthermore, there is no way of validating the accuracy of the generated content, making the review process very intensive for human editors.

How is WordLift different from Screaming Frog and ChatGPT?

WordLift specializes in integrating Knowledge Graphs with Large Language Models to generate context-aware, SEO-optimized content, particularly for multilingual markets. It offers high scalability, developer-friendly APIs, and a human-centric approach to ensure cultural relevance and alignment with brand guidelines.

Screaming Frog, on the other hand, is a technical SEO tool that crawls websites and identifies SEO issues. While it now includes an integration with ChatGPT for generating product descriptions, this feature lacks validation, personalization, and scalability. It is a good testing option but less effective as the number of items grows.

What unique features does WordLift offer for generating product descriptions?

WordLift offers unique features for generating SEO-optimized product descriptions and enhancing user experience:

Semantic Analysis and Keyword Insights: Analyzes product details to extract key insights and suggest relevant keywords, matching the brand’s tone and highlighting unique selling points.
Dynamic Prompt Generation: Uses data from the knowledge graph and fine-tuned models to create dynamic prompts that maintain consistent tone and adhere to editorial guidelines.
Validation Rules: Allows users to define validation rules to ensure generated descriptions meet accuracy and quality standards. The data in the KG is used to validate the factual accuracy of the generated text.
Content Generation at Scale: Produces large volumes of optimized product descriptions (up to 1000 pieces per minute) efficiently, which is ideal for enterprises.
Product Knowledge Graph: Builds a Product Knowledge Graph from merchant feeds, creating structured data to improve product discoverability and user experience.

In summary, WordLift’s features enable the generation of high-quality, SEO-optimized product descriptions at scale, ensuring consistency with brand guidelines and improving organic visibility for e-commerce businesses.

What SEO benefit can I expect from creating SEO-optimized product descriptions?

According to our own analysis, publishing custom AI-generated product descriptions increased organic clicks by 5.4% for a large e-commerce retailer in the US. This indicates that well-crafted product descriptions can lead to a measurable improvement in user engagement and traffic. For more details, you can refer to the source: “The Human-AI Collaboration: Leveraging Knowledge Graphs, AI, and SEO for Enhanced Content Optimization”.