Select Page
Top 5 Underutilized Schema Markups for Publishers

Top 5 Underutilized Schema Markups for Publishers

Table of contents:

  1. What are the most common schema markups for publishers?
  2. What are the least common, underutilized schema markups for publishers?
  3. What are some potentially useful schema markups that I can watch out for in the future?

The news and blogging industry is not an easy profession. Anyone who has had the opportunity to participate in these article-driven businesses will tell you that it is a very dynamic, competitive and time-sensitive industry – you do not have much time to get your bearings. News articles and events come and go, and readers’ attention spans are short. That’s why it’s important to optimize your content as soon as it’s fresh and interesting to your audience. Especially for those who want to report live or are serious about being in the newsroom.

Google’s VP Scott Huffman will say that “discoverability is not a solved problem.” That’s why we analyzed more than 100 web platforms of news, blogs and magazines such as BBC, BILD, Blick, NYTimes, TNW, SmashingMagazine, CNN, DW, AL Jazeera, Google News and the personal websites of renowned SEOs and identified the most common, but also the most useful (but less known) schema markups for publishers.

We, as schema markup experts, pioneers in using artificial intelligence to grow online audiences and first-to-market content organization, feel the need to share our findings with the wider SEO industry, because we fight for:

  1. Open and linked-data knowledge that everyone can benefit from;
  2. Helping to establish quality competition which will directly increase the quality of the content that we provide online;
  3. Work together with WWW, Schema.org and general schema markup enthusiasts towards making the Internet better, one day at the time;
  4. So much more.

What Are The Most Common Schema Markups For Article-Oriented Websites?

Assuming you have regular workflows for producing and publishing content, you may have dealt with the Article, NewsArticle, and BreadcrumbList schema markups. What are the differences between them and what exactly are they used for? Let us take a look at them together.

  1. Article is a general type of schema markup, mostly used for magazines’ creative work. As stated on Schema.org, “newspapers and magazines have articles of many different types and this is intended to cover them all”. When it is hard to determine the schema category for your content piece, it is always a great idea to start with Article schema markup;
  2. NewsArticle is a schema markup used to report on news or in other words articles that are produced by an established news organization. The NewsArticle schema represents a type of creative work and it formally inherits the Article schema while being more specific than the Article schema itself. NewsArticle schema markup works best when it is accompanied with the following attributes:
    • backstory – a brief explanation of why and how an article was created;
    • articleSection – the news category for the given article;
    • speakable – usually to highlight the most important parts of the article or indicate parts which are the most likely to be useful for general speech purposes (text-to-speech);
    • abstract – a short summary about the article and particularly useful when combined with entity referencing and entity tagging by using the mentions attribute;
    • accountablePerson – the person who is responsible for producing the creative work;
    • award – recognitions for the given article;
    • contributor – referencing additional co-authors and helpers for the piece to be properly produced;
    • conditionsOfAccess – explains the rules for accessing the content, e.g. “Available by appointment from the Reading Room” or “Accessible only from logged-in accounts “.
  3. BreadcrumbList – it represents a chain of interconnected webpages, indicating the hierarchy for accessing each one of them. Useful for big websites or category-based websites which need to explain their navigation to users and search engines.

Combining accountablePerson, award and contributor attributes helps in building Expertise-Authority-Trust (EAT) for your creative work which is particularly useful after Google’s Helpful Content Update. It is always great when we can provide more context to search engines about the author and his/hers expertise to write the content. Basically, E-A-T is best developed when we can positively answer questions like:

  • When searchers explore content to solve their needs, will they put their trust into your business and expertise provided?
  • Does the content itself demonstrates first-hand expertise and showcases good structure and clear purpose?
  • Will the users satisfy their searching needs after absorbing your creative work?

While E-A-T itself is not a direct ranking factor, it is still a framework that encompasses many clues that Google picks up on to evaluate and further elevate quality content. Therefore, it’s a good idea to develop a solid knowledge graph and link between your authors’ expertise and your niche topics.

Here are some results that we observed when implementing some of the previous schema markups for publishers to our clients’ websites:

As you can see from the bar chart, annotated articles outperformed non-annotated articles by about 17%. This is a huge improvement, especially for large publishers, but also useful for small authors optimizing their audience.

There is nothing more powerful than utilizing what you have on your side in the first place. Do you want to learn how you can bring your business to the next level? Book a demo.

What Are The Least Common, Underutilized Schema Markups For Publishers?

The news & article webspace is definitely not using the full power of available, non-pending schema markups, including FAQ page, How to, VideoObject, Person,
LiveBlogPosting, Thing for entities (entity linking), TVEpisode, TVSeries, CollectionPage, Series, CreativeWork (Whitepapers, Books), Event, PodcastEpisode. We are going to cover only some of them.

Person

Person is more descriptive than accountablePerson or the contributor attribute, because it provides more context-fields to describe the author like:

  • alumniOf – any connection with formal and informal education institutions that can demonstrate author’s competencies is useful to be provided;
  • award – a list of concrete achievements that this person managed to secure during its lifetime, think “Best investigative story” for reporters, mentions in relevant associations and so on;
  • jobTitle – explains the exact job title that the content piece’s author holds, like Editor in Chief or a more regular reporter. Use this to differentiate;
  • knowsAbout – used for expertise building, covering the topics in which the person is expert in;
  • sameAs attribute – this one is relatively known for local businesses but rarely appropriately utilized for Person’s schema markup. Use it to connect to relevant social media profiles for the author and anything that can help in disambiguating its entity across the web.

All of these extension, context-based ideas are not possible or easy to achieve with simple contributors or accountablePersons attributes alone.

LiveBlogPosting

LiveBlogPosting is useful when you need to perform an ongoing textual coverage for events that are ongoing and require continuous updates. Think hostage situations, earthquakes, ongoing wars or political elections – most of them are types of events that happen in a defined time interval and are a hot topic for some days and weeks (excluding war situations which are absolutely more complex than that).

Therefore, a typical Article or Blog schema will not be sufficient in these cases. It is also important to note that Google might feature top stories with a live badge when adding this schema, so it is definitely worth it to give it a try.

TVEpisode and TVSeries

TVEpisode and TVSeries are interesting schema markups and particularly useful when websites are performing some sort of TV broadcast and related online content delivery. You can also think of reportages for TV shows like the Bachelor (example: Blick.ch regularly reports on these ones) – it is not a reportage, analysis or typical news article, so it should be appropriately differentiated.

Series

Series is absolutely different compared to TV series (these two should not be mixed up) and it is usually used to connect a group of related items which do not have to belong in the same category or be of the same type. They are useful where there is not an established strict order in which the items should be shown but still require a certain structure to be interconnected together. In practice, you can observe the Kevin Indig’s blog utilizes this schema markup very cleverly:

  • He writes long-form articles that belong to different topics but still fall under the global SEO topic, ranging from SEO team structures and up to rocking SEO in the machine learning world;
  • Since there is not an established, clear hierarchical structure between the articles, he uses the Series schema markup to connect all these blog-posts together in a series;
  • The Series schema helps with content comprehension but also content referral and building a powerful article recommendation engine.

It is simple but still way to underutilized according to our study.

What Are Some Potentially Useful Schema Markups That I Can Watch Out For In The Future?

In some contexts fact checking is important to be done and adding content-based schemas helps in these initiatives: first we need to gather the data in a structured way and then act on it appropriately. There are important initiatives around in the media space that are working in this direction like the Fact Checker tools by Google to help spot misinformation.

At the same time, Schema.org and linked-data knowledge enthusiasts regularly invest time in developing additional schema markups that describe various web concepts on the Internet more accurately. Here are some of the schema markups that are under development and once they are approved, can bring new added value to your business:

NewsMediaOrganization, ReportageNewsArticle, AnalysisNewsArticle, OpinionNewsArticle, ReviewNewsArticle, BackgroundNewsArticle, AdvertiserContentArticle, SatiricalArticle, BroadcastEvent are absolutely some interesting schema markups to watch after. However, they are still in the pending phase and still wait to be implemented.

What schema markups are you using for your article-based business? Are you utilizing their power enough? Let’s talk about it together, Book a Demo with one of our SEO expert.

Content Resilience Is Business Resilience: A Lesson From 1600+ Articles

Content Resilience Is Business Resilience: A Lesson From 1600+ Articles

Table of contents:

  1. Content resilience…what does that even mean?
  2. Processes, frameworks, mindsets and knowledge graphs as content resilience enablers
  3. Resilience in your SEO content strategy

Covid-19. Inflations. Remote work. New competitors in the market or more competitive business offerings from others in your niche. We have all been there, at least in recent years. One lesson we should have learned by now is how to make ourselves and our businesses more resilient and adaptable to change.

Business resilience is complex and it cannot be built overnight. It requires careful organizational structuring which equally values efficiency and innovation, while ensuring quality supply chain resilience at the same time. You should be able to deliver your products and services to the market in a way that is constant and suffers little change even when the times are uncertain and the market is fluctuating across industries. 

This is where your content and marketing strategy come into play. They should be your most important tools for resilience because they allow you to discover your customers and bring them into your business. They are the tools that guide your online visitors between the awareness and purchase stages, giving you the ultimate data to learn how to improve your offerings and unique selling proposition:

  1. Where did your visitors come from?
  2. What devices are they using to find you?
  3. What makes them stick on the website or leave it immediately?
  4. What does their customer journey on your website look like?
  5. Do you have a mature branding strategy (brand queries dominate) or do you have a pure, established organic growth marketing channel in place?
  6. So much more.

This is why your business teams need to embrace the power of content and learn how to keep going further even when the going becomes tougher.

Content Resilience…What Does That Even Mean?

It all comes down to the mindset.

First things first: your content comes from many places or ideally multiple departments should be able to inform your content strategy. By gathering and unifying data from there, you should be able to answer questions like:

  1. Who are your searchers?
  2. Who are your customers and how did they convert? Which search queries are the most impactful when it comes to converting online and why?
  3. What increases the desirability of your product/service?
  4. What helps to reduce the friction in closing deals?
  5. How do prospects behave when they use the product/service?
  6. What established processes and conventions are established at your company, so that you can make a constant user discovery process?
  7. How can you present your information so that you have a clear and well articulated way to change your prospects minds, making them switch to your business solutions?

In the content world, the ability to find a new way to understand and profile your audience, and at the same time the ability to present a new talking point and a new type of content, is called content resilience. This comes with time and practice, but it’s important to start and plan for it.

You must be able to find inspiration by taking data from multiple data sources: it’s all about being data-driven but also imaginative, innovative and deeply, deeply interested to understand the customers in the first phases of the buying journey in order to guide them properly later.

Processes, Frameworks, Mindsets And Knowledge Graphs As Content Resilience Enablers

Analyze what you have on your side. Start documenting your work and then try to find the minimum denomination which applies to all of your content templates. Being able to put structure where there’s none and even more, giving granularity and interoperability to your data is an ultimate content resilience skill:

  1. We build knowledge graphs (KG) to provide structure to content;
  2. Structure is what is needed to build resilience;
  3. AI (still) depends on vast amount of training data – structure helps us scale this training data to “teach” search engines new things and help them find the audience we need;
  4. The return of investment (ROI) of a content KG depends on the value of the semantic annotations over time;
  5. The value of semantic annotations is to bridge gaps with the right searchers’ personas;
  6. A content that is semantically annotated gets stronger over time. Why? Because it enforces interconnectedness among concepts across the web and it becomes semantically understandable what your website is about.

And yes…it takes domain expertise but also great business acumen with an innovative mind to start connecting the dots that were thrown away without any framework behind. Build your foundations just like you would build with LEGO blocks and utilize the power of content knowledge graphs: a blog post on social media lasts for a few hours while an entity-based content strategy (entity mapping) lasts forever. 

Here is an analysis above on 1650+ articles of a client of ours and this is the click through rate (CTR) when we stopped adding new entities. As you can observe, the result goes down once we stop the entity tagging process. 

Know the advantages and disadvantages of different content approaches. It is an ultimate superpower.

Resilience In Your SEO Content Strategy

We won’t make a mistake if we say that content strategy is your first phase of your supply chain process – you’re distributing information to online searchers so that they can make a proper decision on what to buy online. It can get hard if you don’t strategize around your content strategy and understand your unique value proposition inside out.

Your numbers are not adding up at the moment or the executives are not satisfied with the results that they see? Period. If your content is set up to succeed, those are just temporary turbulent periods that you need to overcome and be sure, the sun is there afterwards! They come from time to time, it is a learned skill that helps manage everyone’s expectations. How? Well, here’s the deal:

1.Working with backlinks

If you work with backlinks and you are not a seasoned SEO marketer, you will have tough times articulating your strategic decisions to upper management. It is very likely that you might think that investing time in building high domain authority links to your website is beneficial. 

The problem with links is that they expire and become old or completely obsolete over time. Completely unresilient. They don’t hold value for long, especially if you are not updating them up at scale. It’s not that you do not need them or that you should ditch them, it is more about being more strategic: investing in classic links but also investing in linked-data links, enabling global shared understanding about concepts across the globe. That’s content resilience. That’s link resilience.

2. Being evergreen

Are your content ideas up-to-date? Do you truly, deeply understand your customers? What are some problems that keep popping up all the time or from time to time? What do knowledge graphs across the web teach you about content gaps? Can you fill them to address these issues? Are they evergreen and will this content be reusable when winning new audiences on the Internet over and over again? Be specific and brutally honest to yourself about being evergreen – it is the only way to go forward. 

3. Flexible mindset

Are you a flexible person? Are you able to construct and deconstruct content pieces from ground up and vice versa? Are you able to put new perspectives and fresh, indie thinking into how you keep elements in your core content processes? Are you learning and experimenting with new stuff? One potential idea is working with product knowledge graphs. Have you considered these approaches before? Resilience is adapting to change and if you do not have a flexible mind, you will face difficulties along the way.

4. Third-party SEO vendors

These tools support your keyword research process and link profiling between your website and competitors’ websites, however, they are not reliable in the long run. 

Being dependent on SEO software vendors that do not even work with first-party data and slowly increase the prices of their products is not a resilient-oriented way of thinking. You should build your safety net around the data that you own in the first place, like Google Search console data source and unstructured data in your CMS. Your tech stack should be picked in a way that enables you to grow from there and not be strategically dependent on third-party SEO data vendors.

To sum up, it is important to build your foundations right. Put your data in context. Build around what you already have and then expand from there. You’ll thank us later.

Ready to experiment? Give us a go! Book a Demo with one of our SEO Expert 🤩

There is nothing more powerful than utilizing what you have on your side in the first place. Do you want to learn how you can bring your business to the next level? Book a demo.

The Lifecycle [and the Death] of SEO Content Documents As We Know Them

The Lifecycle [and the Death] of SEO Content Documents As We Know Them

Table of contents:

  1. Documents are getting old…or is it the approach itself?
  2. Documents are not fundamentally efficient
  3. Documents as learning tools
  4. Documents as a audience development tool, sales enablement and science impact without boundaries
  5. Next-gen documents are FAIR

Digital natives experienced the birth and the death of regular content documents.

If you are like me, you bought your first computer in the 90s and performed your first searches in a text-based browser. Back then, it was not easy to be visionary, because no one knew what direction the web would take, even though we were always excited by the idea of connecting people, knowledge and opportunities as efficiently as possible. At least, that was supposed to be our manifesto.

Documents Are Getting Old…or Is It The Approach Itself?

But something does not seem right. The way we used (and still do!) to interact with SEO content documents (articles, blog posts, research papers, whitepapers, webpages, whatever you call them) was:

  1. Make a research about what needs to be written;
  2. Write the article (or document existing work);
  3. Publish it and/or share it through your content distribution channels;
  4. Set and forget approach: once you’re done, the article stays in the back.
  5. Or re-optimize when (and IF) the time is right, if the resources and the demand allow you to do so as well.

Documents Are Not Fundamentally Efficient

It turns out that SEO documents are anachronistic for most of their life: Once they have satisfied transient user needs, they are either deleted from the system entirely or content teams forget about them because they do not benefit the user. They are deemed obsolete, which results in them not reaching their full potential. And believe me, their potential is enormous if only they were properly semantically tagged and modularized, as they should have been from the start.

If we could just change that logic and start seeing them as islands of knowledge, we could take advantage of more than 3 sites: Voice (Chatbots), Automation (Content Operations) and Knowledge Exploration across the web (Linked Data). This has not been possible until now:

  • Behavior changes were happening very slowly;
  • We didn’t have the right, democratized technology in place;
  • We were in the early stage of fostering knowledge developers positions on the job market.

Documents As Learning Tools

Given the multiple touch points and interactions with content, we need to shift to this new way of thinking about SEO documents to discover the ultimate truths about the world around us beyond a tiny fraction of what’s out there. That’s the power of functional, effective content documents: exploring new worlds of knowledge that are yet to be discovered – just like new lands in ancient times.

A document is a magnificent structure that we can only imperfectly understand because we as humans are limited in our ability to reuse and analyze it in a variety of ways, as machines can. Content documents are a tool to express opinions, but also an exploration that allows us to discover new facts about the world around us. They are tools that help us satisfy our desire for more knowledge.

To go beyond what is known, we need to think critically about our current SEO document content operations, systems, and overall strategy for developing SEO content documents over time. Imagine a world where we can work more cohesively with everyone and help others solve their problems through the intelligent use of SEO document content. It’s no longer about dealing with Big Data: from now on, smart data counts through smart content engineering.

Documents As A Audience Development Tool, Sales Enablement And Science Impact Without Boundaries

Interconnected worlds of data are not primarily limited by language comprehension but rather lack of structure. They are limited by not being organized in your content management systems (CMSs) according to WWW standards which limits their usability and discovery potential over time.

So imagine you are an entrepreneur with a limited budget selling your services online via content marketing, or a researcher looking to find a way to cure disease. Even though they are two different professions, both face the same challenge in practice: they both lack comprehensive sales channels to promote their services, get funding, or uncover new facts. The process of document creation and SEO document dissemination is very manual:

  • You would go to a given search engine or social media.
  • You use some marketing techniques (ads, SEO, search operators) to promote your work or target platforms for prospect opportunities and/or search for related papers.
  • You search for foundations and NGOs that can fund your work.

Very, very manual. It’s not just about automation, it’s about a better approach to how we share knowledge and promote things so that each document is a valuable node in the world of the open knowledge graph. This is a critical factor in finding cures for diseases and supporting online document distribution channels. Therefore.

Next-Gen SEO Documents Are FAIR

The concept of FAIR data is not new. FAIR stands for findability, accessibility, interoperability, and reusability. These principles enable machine usability: “the capacity of computational systems to work with data with no or minimal human intervention.”

Even though this concept has been around for some time and was first introduced in the world of research, we can apply the same logic to any SEO content document that exists. Watch the video below to learn more about it👇

In this way, the value of SEO content documents will never go away, because the way they are created and maintained is evergreen and strategically different than before. They are:

  • True to their original intent, so their purpose is clear and to the point.
  • Modular, so that can be reorganized and redistributed in different and multiple ways.
  • Accessible, no matter the spoken language that it is used.
  • Measurable and testable, so it’s easy to restructure them as needed.
  • Trackable and self-describing: you can query and analyze them in the Linked-Open-Data world.

How To Get Perfect SEO Content Documents

If you want to apply these principles to your SEO content documents, you can start with the following:

  • Bring this same layer of metadata (entities) inside your own knowledge graph that can be created by using cutting-edge techniques that we employ here at WordLift.
  • Integrate the entities in your publishing workflow by adding a unique identifier to each document, to each author and to each relevant content piece (example: in our Knowledge Graph a FAQ has its own ID).

There is nothing more powerful than utilizing what you have on your side in the first place. Do you want to learn how you can bring your business to the next level? Book a demo.

SEO Content Strategy for More Targeted Customer Traffic: Make Your Writers Feel Respected

SEO Content Strategy for More Targeted Customer Traffic: Make Your Writers Feel Respected

Table of contents:

  1. The Content Journey – Past and Future
  2. Valuing Content Writers – Is It Possible?
  3. Democratizing Content
  4. Linked-Data Content As a Competitive Advantage

The Content Journey – Past and Future

Content has evolved a lot in the past decade – we have UX writers, technical writers, content engineering and even content operations. Content has become “the way” to win customers online because they are constantly searching for answers to their problems and opportunities to educate themselves before buying something.

If we analyze the past decade, we would see that what seemed to be an SEO content strategy in the past, it simply does not fit the definition of today. In the past, when speaking about “content strategy”, you would define it as a process of hiring copywriters, giving prepared, non-data-driven topics to write about, increase their salaries from time to time and be satisfied with whatever was produced on their end. It was easy to post content online and game the web search systems to get visitors to your website and that was enough.

Valuing Content Writers – Is It Possible?

We need to adopt a new way of thinking about content. We need to talk about content operations and have deeper conversations around how the work that the content writers are doing is having an impact across the field. This is necessary if you want to make your writers feel respected: instead of constant content creation that does not demonstrate deep expertise and creating content for the sake of search engines, you need to shift your mindset and produce content that fills the gaps and helps you stay ahead of your competition

What we know from experience is that constant content creation can make your writers feel overwhelmed, disgruntled, disorganized and pressured to deliver something that does not satisfy anyone’s criteria: the readers (end consumers), the business and the writers themselves just for the sake of writing something every week. Who does not want to get up in the morning and produce something that others find useful and has a purpose? Every true content professional likes to pour their heart into their work and drive business results back, by doing what they are best at.

This is where advanced SEO content strategy comes into place. Ask yourself the following questions:

  • How do you organize your content in order to pop more on search engines?
  • How can you position your content to be reusable and easy to distribute to ensure semantic interoperability?
  • How can you make your content fit different content management systems (CMSs)?
  • How can you establish new content practices in order to stay ahead of your competition?
  • How can your content be designed so that it can be consumed by AI for processing?
  • How can you tailor your content by persona, vertical, industry and develop audiences based on these criteria?
  • Finally, how can you make your content people feel valued and believe that their work has a real impact on the business?

Advanced SEO content strategy is now shifting and what the modern content writer needs to be a skill-based polyglot: combining a bit of everything. It is like having a big tent covering multiple things – it includes not just a skill to create content but also understanding the information systems that keep the data in behind and the whole lifecycle starting from content creation and going up to content delivery.

The question is: when having limited resources and little insights on the ongoing trends in the content industry, how can you deliver something useful and at scale, while optimizing for efficiency for your content writers?

Democratizing Content

Content creation nowadays is more than just opening a GoogleDoc document and writing some stuff down. Today, every content designer should be able to communicate the effectiveness of the content that is produced by defining a set of key performance metrics and standards that need to be taken care of.

However, if you are constantly changing your content team or on-boarding new writers in a short time span, it might become challenging to introduce intelligent content practices business-wide. New on-boardings mean training for a specific skill set that requires HR involvement and training budgets that your organization might not have planned for. There is ongoing frustration and confusion – how to approach this problem then?

Linked-Data Content As a Competitive Advantage

We worked with over a dozen customers and we got insights from their content and team structures and we can predict with confidence that you are facing the same problems as them.

Long story short, you need to find a way to democratize the process of content creation and distribution. Every writer should be enabled to do his best possible work, writing user-centric content and not something for the sake of search engines

One way to make it easier for your content writers is to enable them to distribute their content further and go beyond traditional SEO. You need to help your writers understand their end consumers better and increase the clarity in their existing content pieces rather than forcing them to produce content more and more. The solution is to employ a user-centric approach and linked-data backbone which focuses on smart content delivery as easy as possible, ideally in just a few clicks during and after the content creating process.

We at WordLift have helped content writers deliver more than 2000+ pieces of content in a more structured way by using our own custom tools, like some plugins and dashboards but also linking this content with the Linked Open Data Cloud. In simple terms, this means that we enabled our writers to focus on what they are the best at: writing genuine, original and useful content that answers their potential customers’ needs and spending less time on optimizing for search engines.

Structured Data For Semantic Web Analytics

Structured Data For Semantic Web Analytics

Introduction

Adding structured data to your website means enriching your data with information that makes your content easier for Google and search engines to understand. This way, your website and the search engines can talk to each other, allowing you to have a richer representation of your content in Google’s SERPs and increase organic traffic. You’ll then get more clicks and growth for your business

With structured data in modern SEO, you can create an impact, and this impact is measurable whether you have a large or small business.

Focus on the importance of structured data beyond numbers (clicks, impressions, etc) and the advantage that you can gain in modern SEO

Much of the adoption we see of modern standards like schema.org (particularly via json-ld) appears to be motivated by organizations and individuals who wish to take advantage of search engines’ support (and rewards) for providing data about their pages and content but outside of this, there’s a rich landscape of people who use structured data to enrich their pages for other reasonsWeb Almanac

So structured data is not just the data we prepare for Google; it’s data that helps you understand the meaning of web pages. 

If you want to learn how to get semantic analytics with WordLift, read our article.

What Is Structured Data For Semantic Analytics?

The Semantic Web has changed the way we approach web content. As Tim Berners Lee himself says, the world is not made of words but is made of something more powerful: data. This means that to improve search engines’ understanding of web content, it is necessary to have a high-quality dataset enriched with information in structured data.

Structured data allows Google and search engines to understand what you’re talking about on your website and rank better by returning users with enriched results in SERPs. In this way, users can find relevant information that better meets their search intent. 

We talk about entities and no longer about keywords. They represent “concepts” and allow machines (Google and search engines, voice assistants, etc.) to interpret what we know about a person, organization, place, or anything described in a document.

In this scenario, Semantic Web Analytics is the use of named entities and linked vocabularies such as schema.org to analyze the traffic of a website.

With this type of analysis, you’ll start from your website’s structured data, and you’ll be able to cross-reference it with the data from Google Analytics, Google Search Console or your CRM. In this way, you’ll be able to learn more about your users/customers and their behaviors, gaining a strategic advantage over impression and traffic data alone. As we’ll see below, with just a few clicks, you can extract structured data from web pages and blend it, in Google Data Studio, with traffic from Google Analytics. 

How To Use Structured Data For Semantic Analytics

It’s clear that structuring information goes beyond search engine support and can also provide value in web metrics analysis. 

At this point, we show you how you can extract structured data from web pages and blend it with Google Analytics traffic in Google Data Studio. You’ll also see how this will allow you to gain insights into web analytics.

We start from a demo website that we built for demonstration purposes. If you have a small business, with a small number of products, you can crawl your content by using a Streamlit application. Otherwise, if you are at a more advanced level and you have a large number of products, you can use Colab, working with the SEO crawler of Advertools, the free library created by Elias Dabbas, available here. With this system, you can crawl hundreds of thousands of URLs. But it has a pitfall: it is not able to detect structured data that has been injected with javascript.  

Then the data will be brought by the crawler in Google Sheets and blended in Google Data Studio in order to have one single view.

You can create a Data Studio Dashboard where you can select and see some specific insights. Here, for example, you can see the breakdown of the session in Google Analytics with the category. So we can see that clothing is accounting for 50% for the session. 

How Do Blended Sources In Google Data Studio Work? Blending Data Is Simple.

As you can see in the image, you have tables (in our case, Google Sheets and Google Analytics) and a list of available fields that you can use from this table within the join to create a view of combined fields. 

Then you have the join configuration; that is how you want to blend this data. You can decide to take everything from the left table that overlaps with the right table, or you want to look at the strict overlap of the inner. 

Then you have the name of the blended source that you will create and the fields that you will represent inside this blended source which is a view on one, two or more tables combined by a unique key. In this example, the unique key is the URL. 

You are using the URL on both sides to combine them and these allow you to look at the analytic, for instance the session, by looking at the category. 

If you want to see something more advanced, you can blend the second Spreadsheet with Google Analytics. In this case, you have more data, such as the color and the brand name, and you can create a chart using the product category, the session, and the price. This way, you can see traffic for each product category and the price. You can also see the breakdown of the colors and the brands. 

You can play with different combinations in order to have the right data. Extracting structured data from your web pages and blending it with Google Analytics data gives you a more precise and more accurate picture of your users’ behavior with just a few clicks. This is particularly useful to improve your marketing strategy and grow your business in a data-driven way. 

Keep In Mind: Structured Data Has Its Pitfalls. 

  • Structured data, when injected using Javascript, cannot be easily crawled;
  • Data is messy and/or inconsistent;
  • Multiple syntaxes appear on the same page; 
  • Multiple tools can add contradicting statements;  
  • Competitors have better data.

We discussed this topic in the webinar Google Data Studio Structuring SEO Data Tips&Tricks, hosted with Windsor.AI – Watch the video.

If you want to know how to create a Web Analytics Dashboard using Google Data Studio, traffic data from Google Analytics, and WordLift, read this article.

Frequently Asked Questions

What is Semantic Web Analysis?

Semantic Web Analytics is the analysis of a website’s traffic done using named entities and related vocabularies such as schema.org.

With this analysis, you can start from the website’s structured data and cross-reference it with data from Google Analytics, Google Search Console, or other CRM. In this way, you can learn more about user and customer behavior and gain a competitive advantage beyond just analyzing impressions and traffic.

We take on a small handful of clients projects each year to help them boost their qualified traffic via our SEO Management Service

Do you want to be part of it?

Yes, send me a quote!

How To Create Content Hubs Using Your Knowledge Graph

How To Create Content Hubs Using Your Knowledge Graph

Content marketing is about creating compelling content that responds well to searchers’ intents and it’s essential for any inbound marketing strategy. 

The level of competition varies by sector, but it is in general extremely fierce. Just consider the following numbers to get an idea. There are around 500 million active blogs worldwide and, in 2021, according to Internetlivestats.com, we’re publishing over 7 million blog posts every day! These are astonishing numbers, and yet, we can still conquer the competition on broader queries. These queries, in our ultra-small niche, would be something like “structured data”, “linked data” or “semantic SEO”. To succeed in going after these broader search intents, we need to organize our content in topically relevant hubs.

In layman’s terms, we can define topically relevant content as a group of assets (blog posts, webinars, podcasts, faqs) that cover in-depth a specific area of expertise. In reality, though, there are various challenges in compiling this list:

  • we want to identify all the themes and subthemes related to a given concept;
  • we want to do it by using the various formats that we use (content from our academy, blog posts, and ideally content that we might have published elsewhere);
  • we need to keep in mind different personas (the SEO agency, the in-house SEO teams of a large corporation, the bloggers, etc.).  

In this article, I will share how you can build content hubs by leveraging deep learning and data in a knowledge graph. We will use specifically a technique called knowledge graph embeddings (or simply KGE). This is an approach to transform the nodes and edges (entities and relationships) in a low dimensional vector space that fully preserves the knowledge graph’s structure.

Here is the link to the Colab that will generate the embeddings.

Here is the link to the TensorFlow Projector to visualize the embeddings.

Let’s first review a few concepts together to make sure we’re on the same page.

What Is A Content Hub?

A content hub is where you want your audience to land after triggering a search engine’s broad search query. A content hub is presented to the user either as a long-form article or as a compact category page (Content Harmony here has a nice list of various types of content hubs). In both cases, it needs to cover the core aspects of that topic. 

The content being presented is not limited to what we have on our site but should ideally include assets that have been already developed elsewhere, the contributions of relevant influencers, and everything we have that can be helpful for our audience. 

Experts and influencers in the given subject’s field have a strategic role in establishing the required E-A-T (Expertise, Authoritativeness, and Trustworthiness) for each cluster.

What Is A Knowledge Graph In Seo?

A knowledge graph is a graph-shaped database made of facts: a knowledge graph for SEO describes the content that we produce so that search engines can better understand it.  

Our Dataset: The Knowledge Graph Of This Blog

In today’s experiment, we will use the Knowledge Graph behind this blog to help us create the content hubs. You can use any knowledge graph as long as it describes the content of your website (using RDF/XML or even a plain simple CSV).

In most essential terms, a graph built using WordLift connects articles with relevant concepts using semantic annotations. 

There is more, of course, some entities that can be connected to each other with typed relationships; for example, a person (i.e. Jason Barnard) can be affiliated with an organization (Kalicube) and so on depending on the underlying content model used to describe the content of the site. 

Here below, we see a quick snapshot of the classes that we have in our dataset.    

What Are Knowledge Graph Embeddings (KGE)?

Graph embeddings convert entities and relationships in a knowledge graph to numerical vectors and are ideal for “fuzzy” matches. Imagine graph embeddings as the fastest way to make decisions using the Knowledge Graph of your website. Graph embeddings are also helpful to unveil hidden patterns in our content and cluster content into groups and subgroups.

Deep learning and knowledge graphs are in general complex and not easy to be visualized, but things have changed, and we can now easily visualize concepts in high-dimensional vector spaces using Google’s technologies like TensorBoard. Meanings are multi-dimensional as much as people are. The magic of embeddings lies in their ability to describe these multiple meanings in numbers so that a computer can “understand” them. 

We can approach a given concept like “Semantic SEO” from so many different angles. We want to use machine learning to detect these angles and group them in terms of assets (content pieces) and entities (concepts).  

Let’s watch a video to grasp more about clustering topics, as this is what we’re about to do. 

Let’s Build The Knowledge Graph Embeddings  

I have prepared a Colab Notebook that you can use to create the graph embeddings using a  Knowledge Graph built with WordLift. We are going to use an open-source library called AmpliGraph (remember to star it on GitHub). 

Feel free to play with the code and replace WordLift’s Knowledge Graph with your data. You can do this quite simply by adding the key of your subscription in the cell below. 

If you do not have WordLift, remember that you can still use the code with any graph database organized in triples (subject > predicate > object). 

To create the knowledge graph embeddings, we will train a model using TensorFlow. Embeddings are vector representations of concepts in a metric space. 

_images/kg_lp_step1.png

While there are various algorithms (TransE, TransR, RESCAL, DistMult, ComplEx, and RotatE) that we can use to achieve this goal, the basic idea is to minimize the loss function when analyzing true versus false statements. In other words, we want to produce a model that can assign high scores to true statements and low scores to statements that are likely to be false. 

The score functions are used to train the model so that entities connected by relations are close to each other while entities that are not connected are far apart.

Our KGE has been created using ComplEx (Complex Embeddings for Simple Link Prediction); this is considered state-of-the-art for link predictions. Here are the parameters used in the configuration. 

Model Evaluation

The library we’re using allows us to evaluate the model by scoring and ranking a positive element (for example, a true statement) against a list of artificially generated negatives (false statements). 

AmpliGraph’s evaluate_performance function will corrupt triples (generating false statements) to check how the model is behaving. 

As the data gets split between train and test (with a classical 80:20 split), we can use the test dataset to evaluate the model and fine-tune its parameters. 

I have left most of the hyper-parameters untouched for the training but, I have worked, before the training, on cleaning up the data to improve the evaluation metrics. In particular, I have done two things:

  1. Limited the analysis to only triples that can really help us understand the relationships in our content. To give you an example, in our Knowledge Graph, we store the relationship between each page and the respective image; while helpful in other contexts, for building our content hubs, this information is unnecessary and has been removed.  
  2. As I was loading the data, I also enabled the creation of all the inverse relationships for the predicates that I decided to use. 

Here is the list of the predicates that I decided to use. As you can see, this list includes the “_reciproal” predicates that AmpliGraph creates while loading the KG in memory.

As a result of these optimizations, I was able to improve the ranking score.

How To Create A Content Hub

Now that we have the KGE, such a new powerful tool, let’s get practical and see how to use them for creating our content hubs. We will do this by auditing the content that we already have in three main steps:

  1. Familiziaring with the model by predicting new facts. These are previously unseen relationships that we want to rank to understand how likely they would be true. By doing that, we can immediately explore what list of concepts we could deal with for each cluster.
  2. Discovering sub-topics and relationships using the TensorFlow Embedding Projector. To enable a more intuitive exploration process we will use an open-source web application for interactive visualization that will help us analyze the high-dimensional data in our embeddings without without installing and running TensorFlow.       
  3. Clustering topics. We want to audit the links in our graph and see what contents we have and how they can be grouped. The clustering happens in the embedding space of our entities and relations. In our example, we will cluster entities of a model that uses an embedding space with 150 dimensions (k=150); we will apply a clustering algorithm on the 150-dimensional space to create a bidimensional representation of the nodes.

Let’s Start Predicting New Facts

To familiarize with our model, we can make our first predictions. Here is how it works. We are going to check the likelihood that the following statements are true based on the analysis of all the links in the graph.

  • semantic_seo > mentions >  content_marketing
  • precision_and_recall > relation_reciprocal > natural_language_processing
  • andrea_volpini > affiliation > wordlift
  • rankbrain > relation > hummingbird 
  • big_data > relation > search_engine_optimization  

I am using one predicate from the schema.org vocabulary (the property mentions) and one predicate from the dublin core vocabulary (the property relation along with its reciprocal) to express a relationship between two concepts. I am also using a more specific property (affiliation from schema.org) to evaluate the connection between Andrea and WordLift. 

Let’s review the results.

We can see that our model has been able to capture some interesting insights. For example, there is some probability that Big Data has to do with SEO (definitely not a strong link), a higher probability that Precision and Recall is related with Natural Language Processing, and an even higher probability that RankBrain is connected with Hummingbird. With a high degree of certainty (rank = 1), the model also realizes that Andrea Volpini is affiliated with WordLift.

Exploring Embeddings To Discover Sub-Topics And Relationships 

To translate the things we understand naturally (e.g., concepts on our website) to a form that the algorithms can process, we created the KGE that captures our content’s different facets (dimensions). 

We are now going to use the Embedding Projector (an open-source tool released by Google) to navigate through views of this data directly from our browser and using natural click-and-drag gestures. This will make everything more accessible and easy to use for the editorial team.

Here is the shareable link to the Projector with our KGE already loaded. We will now discover the facets around two clusters: Semantic SEO and Structured Data. 

Let’s review the concept of Semantic SEO.

We can quickly spot few interesting connections with: 

  • Concepts:
    • Google Zero Results SERPS
    • Voice Search
    • Rankbrain
    • Google Knowledge Graph
    • Content findability
    • Google Big Moments
  • Influencers:
    • Aleyda Solis
    • Purna Virji
  • Webinars:
    • Machine learning friendly content with Scott Abel (one of my favourites)
  • Events:
    • The WordCamp Europe in 2018 in Belgrade (the talk was indeed about semantic seo and voice search)
  • Blog Posts:
    • How Quora uses NLP and structured data
    • AI text generation (using data in a KG)

Over the years, we have built a knowledge graph that we can now leverage on. We can use it to quickly find how we have covered a topic such as semantic SEO, the gaps we have when comparing our content with the competition, and our editorial point of view. 

The beauty of this exploration is that the machine is revealing us years of classification work. While these embeddings have been created using machine learning, our editorial team has primarily human-curated the data in the knowledge graph.

Let’s explore Structured Data.

As a tool for automating structured data, this is for sure an attractive cluster. Here we can see the following at first glance:

  • Concepts:
    • Linked Data
    • Context Card (a product feature that uses structured data)
    • WordPress
    • SEO
    • NLP
    • Google Search
    • JSON-LD
    • Impact measurement
  • Influencers:
    • Bill Slawski (thanks for being part of our graph)
    • Richard Wallis (for those of you who might not know, Richard is one of the greatest evangelist of schema) 
  • Showcase:
    • The success story of Salzburger Land Tourismus

Using the slider on the right side of the screen, we can capture the most relevant connections. 

I am analyzing the semantic similarity of the nodes using the euclidean distance; this is the distance calculated between each vector in the multidimensional space. This is definitely more valuable as it provides a structure and helps us clearly understand what assets we can use in our content hubs. In our small semantic universe, we can see how the concept of Structured Data is tightly related to Linked Data, Knowledge Graph, and Wikidata.  

The Projector uses different dimensionality reduction algorithms (UMAP, T-SNE, PCA) to convert the high-dimensional space of the embeddings (150 dimensions in our case) down to a 2 or 3D space. We can also use a custom setting, at the bottom left corner of the screen, to create a custom map that will layout the concept around Structured Data so that we will have as an example, on the left everything that is related to e-commerce and on the right everything that has to do with web publishing. We will simply add the terms, and the Projector will reorganize the map accordingly. 

This way, we can see how “Structured Data” for E-Commerce shall include content related to WooCommerce and GS1 while content for publishers could cover broader topics like content marketing, voice search, and Google News.  

Topic Clustering

We can now visualize the embeddings on a 2D space and cluster them in their original space. Using the code in the Colab notebook, we will use PCA (Principal Component Analysis) as a method to reduce the dimensionality of the embedding and K-Means for clustering.

We can see five clusters: core concepts (Semantic SEO, SEO, AI, WordLift, …), entities and posts on SEO (including showcases), connecting entities (like events but also people), blog posts mainly centered on the product and content related to AI and machine learning.  

We can change the number of clusters we expect to see in the code (by setting the n_cluster variable). Also, you might want to add a set of nodes that you want to inspect by adding them to the list called top10entities. 

In general, it is also very practical to directly use the Projector with our KGE and the different clustering techniques for exploring content. Here below, we can see a snapshot of t-SNE, one of the methods we can use to explore our embeddings. 

There is nothing more powerful than utilizing what you have on your side in the first place. Do you want to learn how you can bring your business to the next level? Book a demo.

Conclusion

Semantic SEO has revolutionized how we think about content. We can tremendously increase the opportunities to engage with our audience by structuring content while establishing our topical authority. It takes more time to combine all the bits and pieces, but the rewards go beyond SEO. These clusters become effective with paid campaigns for onboarding new team members and let our clients understand who we are, how we work, and how we can help them grow. 

As SEOs, we’re constantly at the frontier of human-machine interfaces; we have to learn and do new and different things (such as building a knowledge graph or training language models) and to do things differently (like using embeddings to create content hubs). 

We need to reimagine our publishing workflows and optimize the collaborative intelligence that arises when humans and artificial intelligence meet. 

The lesson is clear: we can build content hubs using collaborative intelligence. We can transform how we do things and re-organize our website with the help of a knowledge graph. It will be fun, I promise!

More Questions On Content Hubs

Why are content hubs important in Semantic SEO?

In Semantic SEO, we don’t target keywords but rather create topical clusters to help readers understand the content in-depth. We don’t simply create pages but think in terms of concepts and how they relate to each other. We want to become the Wikipedia of our niche and content hubs help us in achieving this goal. Content modeling is an integral part of modern SEO. The more we can become authoritative around a given topic and the more we attract new visitors. 

How can I create a knowledge graph for my website?

You can simply start a trial of WordLift and build your knowledge graph ?. Alternatively, Koray Tuğberk GÜBÜR has worked on a Python script that will get you started on building a graph starting from a single web page. Remember that Knowledge graphs are not just for Google to learn about our content but are meant to help us in multiple ways (here is an in-depth article on why knowledge graphs are important).   

What type of content hubs exists? 

Here is a list of the different types of content hubs: 

  1. Classic hubs. A parent page and a number of sub-pages ranging from a minimum of 5 to a maximum of 30. The work well for every green content and are easy to implement without too much development work. A great example here are Zapier’s guides (https://zapier.com/learn/remote-work/). 
  2. Library style. The main topic is divided in sub-categories and from there the user sees the content items at the end of the journey. A good example here can is Baremetrics’ academy (https://baremetrics.com/academy).
  3. Wiki Hub. The page is organize like an article on Wikipedia and links to all relevant assets. It might appear similar to the classic hubs but the content is organized differently and some blocks can be dynamic. Investopedia goes in this direction (https://www.investopedia.com/terms/1/8-k.asp). 
  4. Long-tail. Here we have a page that is powered by an internal search engine with faceted navigation. The page is a regular search engine result page but it has an intro text and some additional content to guide the user across the various facets. A great example is our client Bungalowparkoverzicht.  (https://www.bungalowparkoverzicht.nl/vakantieparken/subtropisch-zwembad/)
  5. Directory style. Much like a vocabulary (or a phone book) all sub-topics are organized in alphabetical order. The BTC-Echo Academy is a good example (https://www.btc-echo.de/academy/bibliothek/).  

References

  1. Luca Costabello: Ampligraph: Machine learning on knowledge graphs for fun and profit
  2. Philipp Cimiano, Universität Bielefeld, Germany: Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods
  3. Koray Tuğberk GÜBÜR: Importance of Topical Authority: A Semantic SEO Case Study
  4. How to Use t-SNE Effectively
  5. Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. RotatE: Knowledge graph embedding by relational rotation in complex space. CoRR, abs/1902.10197, 2019.
  6. ComplEx: Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. CoRR, abs/1606.06357, 2016.

Learn more about topic clusters in SEO, watch our last web story!