Why are internal links important for product listing pages (PLP) on e-commerce websites? How can we help users and Google more effectively find category pages? Can we automate the creation of internal links? What’s the value for SEO?
In this blog post we will focus on automating the creation of internal links for e-commerce category pages. We will create the so-called related search widget for an e-commerce website, a navigational element designed to recommend similar categories, to improve internal links and to boost rankings.
We structure content on websites to let people find what they want. There is always beauty in understanding how things are organized on an e-commerce website. In SEO, when we are involved with user experience, our ultimate goal is to find the truth (the essence of any webpage) and to render the intent. Peter Morville will say that, when organizing content, we create environments for understanding.
Here is the outline for this article. If you prefer to jump right into code here is the Colab.
- SEO advantages of related search links
- How a related search widget should look like
- Creating internal links on scale for an e-commerce website
- Accessing the sitemap using Advertools
- Extract textual elements from each page
- Extract queries from GSC (optional)
- Computing semantic similarity
- Preparing the output file
- Scaling the workflow – a better AI lifecycle using NOW
- Conclusions and future work
- Additional Questions
We will create recommended links using a small set of commands in Python and a minimal amount of deep learning. Before anything else, let’s review two essential aspects:
- Categorization is a selective process. We emphasize one aspect and silence many others. When it works, it conveys meaning and helps others find what they need.
- In a connected graph of web pages, a page’s closeness centrality represents its ability to be central within its network. The more relevant we are, the easier it gets to reduce the number of clicks a user needs to find what he/she needs.
In layman’s terms, we need a function, a simple system that, when we input an X (let’s say the title of a category), will give as a Y (the set of the top 4 or 5 related categories).
E-Commerce category pages tend to have breadcrumbs and hierarchical links (with the entire list of categories and subcategories). Instead, we want to add a navigational element that can traverse the hierarchical tree in a meaningful way.
SEO Advantages Of Related Search Links
Links on category pages are usually limited to breadcrumbs trail and taxonomy-based filters (the characteristics of the set of products). Recommending links brings the following SEO benefits:
- Skipping ahead. Related links help users traverse the navigational tree of categories and jump where they need to be. They are horizontal and meant to reduce the click-depth.
- Improving rankings. Internal links have a tremendous value in helping search engines understand how categories are organized.
- Distributing pagerank. We want to distribute link equity and ensure that the crawler sees our most relevant pages with the least effort.
- Optimizing the anchor text. We can improve the ranking of a specific query by using it as the clickable text that a user will see.
Moreover, on the business side, having the ability to recommend categories helps the shop owner improve the business relevancy of search by:
- Prioritizing categories for a sales campaign.
- Promoting certain products.
- De-prioritizing categories containing out of stock products.
How A Related Search Widget Should Look Like
There are various examples of internal links on e-commerce (and non e-commerce) websites, let’s review a few of them:
Amazon uses a block of 6 elements that, as we can see, tend to broaden (keyboard, gaming pc), narrow (gaming keyboard 60 percent), or horizontally expand (gaming monitors, gaming mouse) the initial search.
In Alibaba, the textual relatedness is weaker. The semantic jump between men’s coats and dog coats is extreme. Besides the questionable association between men and dogs, the focal points remain clothing for men. The design is essential.
Kijiji labels it as “popular,” and its generation process cannot detect that lawn mower and lawnmower are synonyms. At least in this example, it tends to narrow the search intent. The terms being used are keywords and not proper category names.
Artsper introduces the concept of search refinement by characterizing the block with “Refine your search”. The navigation elements help us move in multiple directions without clear sorting criteria. This is per sé not a bad thing, quite the opposite, we perceive a sense of freedom, and we can quickly skim through the terms. Visually the terms are presented as refinement chips.
Related Search On Google Properties
Here is how things work on Google Search, Google Images and Google Arts and Culture. This is a random exploration of various types of widgets that should give us some ideas on how things can be implemented.
Being primarily an image-centric medium, Google Images helps us with the use of images, in this specific occurrence, to broaden (cake) or to expand the search (meat, pecan).
The UX Of A Related Search Widget
As seen, with this limited selection of the different implementations, we can highlight the following:
- Text relevance is essential and far from being trivial. As seen in the Alibaba example, even advanced websites can fall into the trap of odd matches.
- Most of the implementations are based on a horizontal design. If the complexity (i.e., the number of recommended links) is limited, this is an excellent way to provide options without interfering with the facets typically displayed vertically.
- Refinement (or search) chips are a good design pattern to help users intuitively find what they need. Google uses them a lot across various surfaces.
- Adding visual elements (a featured image for each category) and the number of items behind the category is an intelligent option to facilitate the discovery of different products (this is extremely valuable when products have a solid visual appeal).
Creating Internal Links On Scale For An E-Commerce Website
I used as a reference website fila.com a sportswear manufacturer originally from Biella in the north part of Italy. They are not clients of ours, and here is what we will do:
- Read the sitemap and extract the list of categories
- Parse all the text elements we need
- (Extract queries from Google Search Console – I have the code ready, but it will not run for fila.com as I don’t have access to their search data)
- Run Semantic Search
- Extract top n matches (semantic similarity)
- Re-rank results (additional business logic, if needed, would go here)
- Prepare the output file. This would be a JSON file containing a selection of similar categories for each category page.
The UX of the website is clean and the site lacks a related search widget.
1. Accessing the sitemap using Advertools
We will parse the sitemap and extract the list of category pages by removing any page that ends with “.html” (as this characterizes product pages) and a series of other pages that don’t correspond with the product listing (i.e., “news”, “about-” and so on).
2. Extract textual elements from each page
We will then extract from each page a minimum set of information, including the short intro text below the page’s title, the breadcrumbs, the meta description and the page’s title. In the snippet below, we can see that we are running a custom extraction using xpath. If the intro text is missing, we can rely on the other textual elements of the page. We will need to be very careful in removing oddities or other terms that might compromise the search.
Advertools will store the captured data on fl_category_crawl.jl. We might keep this file so that information will be re-used for the next crawl. Here we can see the result of the extraction.
Parsing the navigational breadcrumbs
After extracting the title of the page and the short intro text, we will analyze the breadcrumbs and create a data frame. This helps us gain an understanding of the site structure. We might reuse this data frame while composing the final list of suggestions. We might, for example, decide to exclude a link already in the Breadcrumbs for that page. Repeating the same link can be annoying, especially if the related search widget is displayed close to the Breadcrumbs.
To clean up the captured text, I have used spaCy and a list of site-specific stopwords. We will also remove special characters, numbers, and other oddities. I decided to lemmarize terms; this means bringing back the base or dictionary form of a word. We will create embeddings afterward and I want consistency from the beginning. This will help as we have a limited amount of text available.
3. Extract queries from GSC (optional)
Optionally I have also prepared the code to capture data from Google Search Console. You can take advantage of the list of queries behind each page and the number of clicks. Queries can be extremely valuable as you might decide to use them instead of the title of the pages.
Let me give you an example. We might have a long title like “Men’s Casual Sneakers + Athletic Shoes | FILA”; in this case it would be better to display something more compact like “Men’s Sneakers”. You will need to authenticate on GSC to extract the data. The information will be merged with the crawl dataset by running a loop with all the crawled urls.
4. Computing semantic similarity
Here comes the AI bit of this workflow. We are going to use the SentenceTransformers (SBERT) library. This open-source library allows us to replace the underlying model and choose the best models that fit our needs. Models are available in the HuggingFace Model Hub. We can also eventually train our model to improve the performance further.
We will index the text extracted from each page and use the title as a query. We will use the native semantic search functionality of SBERT.
The idea behind is as simple as encoding the text in the “clean text” column and comparing it, within the same vector space, with the embedding of the title.
5. Preparing the output file
Once we run the same query on the complete list of category pages, we will get a new data frame that, for each page, will provide a list of recommended links.
Now, depending on the CMS, you can change the output format and get ready to publish it. In our case, we will write the data back into the Knowledge Graph and send it to the CMS using a REST interface (i.e., https://api.wordlift.io/data/https/www.example.com/en-us/category/my-category-page). In the Colab, the data is stored in a JSON file, and you can explore it directly from the notebook.
The site navigation schema markup
We will import the data into the knowledge graph and present it to search engines using structured data markup. A related search widget is a navigation site element; we can use the schema markup for SiteNavigationElement, a subclass of the WebPageElement.
This markup will help search engines understand how things are connected.
6. Scaling the workflow – a better AI lifecycle using NOW
One of the biggest challenges when adopting AI into SEO workflows is the design of a lifecycle that will scale across sites of different sizes and with other characteristics.
Working on Colab helps me envision how things should work; I can easily experiment with new ideas, but at some point, I will need to run the inference on sites with potentially thousands of category pages. Also, I need to have the flexibility to replace the model fine-tuning it. Even more importantly, on e-commerce sites, I want to be able to work with multiple modalities (text + images). On large properties like fila.com, the textual content is very well optimized, and I can easily rely on it but on smaller sites I will need to combine features from text with features extracted from images.
To do that, we partnered with Jina AI. As a quick introduction here, I have added the code to replicate the same neural search provided by SBERT using Jina NOW text-to-text search functionality. As you will see in the code, we will connect to an end-point on the Jina Cloud infrastructure and run queries there. This means having a dedicated pool of machines working on the generation of the embeddings and on running the neural search. The load will be distributed, and we will be able to autoscale resources as we increase the size of the dataset.
Conclusions And Future Work
On the SEO front, there are other essential analyses to be done. For proper link sculpting, we will need to prevent any form of cannibalization and also evaluate how to distribute links equally. Moreover, based on the website, I want to take into account the most representative products for a category and add the support for the analysis of product images and product descriptions.
On the tech side, Jina NOW has just recently launched and we are still working with the team at Jina AI to improve how things work behind the scenes. We want to be able to control the re-ranking directly inside Jina’s flow.
What are PDPs and PLPs in e-commerce websites?
PDP stands for Product Detail Page and represents the webpage that describes a single product. PLP stands for Product Listing Page and refers to a page that lists a category of products.
What is Jina AI?
Jina AI is a neural search framework to build scalable deep learning search applications. In this blog post we use Jina NOW, the simplest way to use semantic search in a distributed environment.
How to optimize e-commerce website SEO?
To optimize your e-commerce SEO strategy and get more conversions you can:
- Build a product knowledge graph and invest on structured data markup
- Align data in Google’s merchant feed with structured data
- Improve on-site search
- Use of GPT-3 to automatically generate product descriptions when these are missing
- Create new product category pages by analyzing search demand
- Increase the resolution of product images(JPG or PNG formats) using AI-powered Super-Resolution.
Must Read Content
Why Do We Need Knowledge Graphs?
Learn what a knowledge graph brings to SEO with Teodora Petkova
2x Your Video SEO in 3 Simple Steps
Use videos to increase traffic to your websites
SEO Automation in 2021
Improve the SEO of your website through Artificial Intelligence
The Power of Product Knowledge Graph for E-commerce
Dive deep into the power of data for e-commerce
Touch your SEO: Introducing Physical SEO
Connect a physical product to the ecosystem of data on the web