Select Page
Touch your SEO: Introducing Physical SEO

Touch your SEO: Introducing Physical SEO

Executive summary

This post is part of a series that dives deep into the power of data for e-commerce. You can find the other posts here:

  1. The Power of Product Knowledge Graph for E-commerce
  2. Touch your SEO: Introducing Physical SEO
  3. The GS1 Digital Link explained for SEO Jedis (and their clients)
  4. The Only Thing Missing in Your Omnichannel E-commerce Strategy is…

In the previous post of the Product Knowledge Graphs series we introduced you to the possibilities a Product Knowledge Graph opens for e-commerce data. In this post, we want to show you something equally exciting: Physical SEO. That is the intersection of physical and digital UX environments and the instrumental role of linked data and SEO for your products’ visibility and sales. 

Intro

As the boundaries between physical and cyber spaces are blurring, e-commerce and the digital marketing behind it have to catch up with ever-changing consumer needs and behaviours. Take for example an in-store experience. More often than not people would scan a product to find its ingredients, or search for product reviews, feedback, discounts.  

All in one, we act digital in the physical world. In such a scenario, the question for digital marketing and e-commerce is: “How to connect a physical product to the ecosystem of data on the Web?

And the answer is structured data – the common thread running through three things we want to connect the dots for here: 

  • Physical SEO 
  • Barcodes as entry points to digital experiences (GS1 Digital Link technology)
  • Product Knowledge Graph.

1. Physical SEO, yes you heard that right.
2. Making Barcodes Speak Linked Data (GS1 Digital Link)
3. The Product Knowledge Graph: Seeing Connected Data As a Marketing Tool
4. A Bag of Chips Instead of an Epilogue

Physical SEO, yes you heard that right.

 “In the next step, the Semantic Web will break out of the virtual realm and extend into our physical world. URIs can point to anything, including physical entities, which means we can use the RDF language to describe devices such as cell phones and TVs.”

Citation: Scientific American article by TBL (Solid), Ora Lassila (now Amazon), Hedler (Working Ontologist 2020), available here.

The term “physical SEO” is vague but exciting. We use it to define the activities related to search engine optimization that triggers digital experiences in physical environments and vice versa

Think of it. For example, if you are a manufacturer, you have to not only put your product on the shelf in the store, but also on the shelf for anyone browsing the Web for information. If you want to save time and money, you need a unique identifier for this product that will allow all your stakeholders fast and easy access to the data about it (price, description, availability, to mention just a few).

In other words, as search is inevitably permeating physical environments, so should SEO activities.

Fortunately, together with the increasingly complex digital user behavior come the increasingly connected and mature technologies to live up to the empowered users’ expectations.

Enter GS1 Digital Link.

Making Barcodes Speak Linked Data (GS1 Digital Link)

GS1 Digital link is developed by GS1, a not-for-profit global supply chain standards organization working to improve the efficiency, safety and visibility of supply chains. GS1 is best known for the barcode – the single standard for product identification. As GS1 standards aim to provide a framework that allows products, services, and information about them to move efficiently and securely across physical and digital channels, the organization started developing a standard that would “extend the power and flexibility of GS1 identifiers by making them part of the web.”    

Today, the standard is ready to facilitate the exchange and comparison of structured data and bridge the gap between usually incomplete data behind a scanned barcode. GS1 Digital Link promises to enhance the shopping experience for consumers by simplifying B2B data sharing and providing a standards-based unified way of describing products..

Why would a user want to access a Product Knowledge Graph from a barcode
He might want to:

  • Learn more about the product
  • Compare the product to other products
  • Search for offers
  • Browse reviews
  • Check compatibility
  • Find technical specifications

Very simply put, the GS1 Digital Link works based on providing a simple, standards-based structure for the data about a product. With GS1 Digital Link, information such as expiration dates, nutritional and medical product data, warranty registration, troubleshooting instructions, even social media links are becoming available with a single scan.

Image source: GS1 Digital Link Flyer

The GS1 Digital Link work is a treasure trove for anyone interested in publishing clean, structured and meaningful data on the Web. And this is where a Product Knowledge Graph can immensely enrich the exchange by providing an ecosystem of related information about any product in the blink of a scan. 

The Product Knowledge Graph: Seeing Connected Data As a Marketing Tool

As the underlying ecology of e-commerce is changing, the ways you publish product data to serve your customers and other stakeholders best require new approaches.

As the amazing Aaron Bradley wrote about that 5 (yes!, five) years ago in his blog SEO with Data :

“if data about you is going to appear in the search results – and it is – make sure it’s accurate by providing it yourself.”

And this is exactly what WordLift aims to do, enabling a scenario where all interested parties (merchants, retailers, customers) have access to interconnected detailed information about anything across both physical and digital channels. In other words, all kinds of data derived from (and accessed from) one comprehensive Product Knowledge Graph.

To understand the value of a Product Knowledge Graph for e-commerce, let’s have a look at a model. In 2018 researchers Turban, Outland et al. created the following model (in their book Electronic Commerce 2018 A Managerial and Social Networks Perspective):

Source: Slides for Chapter 2 

Now, let’s take it from this model and enhance it with the understanding and practice of Linked Data and e-commerce. The model will be pretty much the same, however the data that will feed a portal is data that you have control over.

  1. Person- > 2. Portal – > 3. Product Knowledge Graph 

A Bag of Chips Instead of an Epilogue

Sir Tim-Berners Lee constantly leaves us notes to the future. This is what he did in 2010. With a bag of chips. 

In this talk, the Web’s inventor talks about the barcode as a universal way for retailers to “talk” about products and the various other languages packages use to convey information.

A decade later all these components can be integrated and put in a Product Knowledge Graph, benefiting both consumers and producers. As we saw GS1 Digital Link opens the possibilities for such integration. WordLift pushes the idea one step forward by helping producers, retailers and e-commerce businesses publish their data to allow any search engine or platform to consume it in a meaningful way for the user.

Do you want your customers to find you on the digital shelf? Let us help you with that!

Knowledge Connexions 2020

Knowledge Connexions 2020

Knowledge Connexions 2020, also known as KnowCon 2020, is an online event about data, semantic technology, Knowledge Graphs, Graph Databases and Graph AI presented by the Knowledge Graph Conference and Connected Data London.

It’s a great opportunity for professionals and entrepreneurs who want to learn more about these technologies, learning from experts and innovators, such as Dawn Anderson, Hamlet Batista, Jason Barnard and many more.

The conference will take place from November 30th to December 2nd 2020, with 33 speakers running masterclasses, workshops and presentations.

WordLift at Knowledge Connexions 2020

WordLift will be present too! Our CEO, Andrea Volpini, will join other experts in a panel about the relation between Knowledge Graphs and SEO. He will also run two masterclasses where he will show attendants how to use schemas, knowledge graphs and NLP to develop a long-tail SEO strategy.

The “From Knowledge Graphs to AI-powered SEO – The Theory” masterclass will be a lecture-based workshop. Focusing on a use case applicable across different industries, participants will learn how to use knowledge graphs to discover new search-demand areas and build dynamic pages that can target long-tail queries. Andrea will also cover some essential elements of natural language generation using Google’s T5 Text-to-Text Transfer Transformer Model. At the end of the masterclass, participants will be equipped with concrete strategies and techniques to leverage existing data – within their organization – for improving their publishing workflow and for discovering new long-tail queries.

The “From Knowledge Graphs to AI-powered SEO – The Practice” masterclass will be a highly collaborative, interactive and hands-on class based on the theory learned in the previous workshop. Participants will form small teams, each of which will work on the reference website and run a “search intent investigation” using Python (code will be made available in Google Colab) and the reference website’s Knowledge Graph.

You can book your place at these events following the links:

Are you ready to turn your organisation data into information and knowledge that search engines will love? Come join us at Knowledge Connexions 2020!

Why The Knowledge Panel Is Critical For Your SEO Strategy

Why The Knowledge Panel Is Critical For Your SEO Strategy

As search engines move toward voice search, mobile personal assistants adoption is growing at a fast rate. While the transition is already happening, there is another interesting phenomenon to notice. The SERP has changed substantially in the last couple of years. As Google rolls out new features that appear on the “above the fold” (featured snippets, knowledge panels, and featured snippets filter bubbles), those allow us to understand how voice search might look.

In this article, we’ll focus mainly on the Knowledge Panel, why it is critical, and how you can get it too. 

The Knowledge Panel: The Google’s above the fold worth billions

Knowledge panel

The Knowledge Panel is a feature that Google provides quick and reliable information about brands (be them personal or company brands). For instance, in the case above, you can see that for the query “who’s Gennaro Cuofano” on the US search results, Google is giving both a featured snippet (on the left) and a knowledge panel (on the right).

While the featured snippet aims to provide a practical answer, fast; the knowledge panel aims to provide a reliable answer (coming from a more authoritative source) and additional information about that brand. In many cases, the knowledge panel is also a “commercial feature” that allows brands to monetize their products. For instance, you can see how my knowledge panel points toward books on Amazon that could be purchased in the past.

This space on the SERP, which I like to call “above the fold,” has become the most important asset on the web. While Google’s first page remains an objective for most businesses, it is also true that going toward voice search traffic will be eaten more and more by those features that appear on the search results pages, even before you get to the first position. 

How does Google create Knowledge Panels? And how do you get one?

Knowledge Panel examples

Here are some examples of Google Knowledge Panels, which refer to different search categories (sports, books, movies, celebrities).  

SPORT                              BOOK                                                                                       

tennis knowledge panel           jane eyre book knowledge panel                                                     

MOVIE                                                     FAMOUS PERSON

titanic movie knowledge panel                dalai lama knowledge panel    

Knowledge panel: the key ingredient is Google’s knowledge vault

Brand panel

As Google points out:

When people search for a business on Google, they may see information about that business in a box that appears to the right of their search results. The information in that box, called the knowledge panel, can help customers discover and contact your business.

In most cases, you’ll notice two main kinds of Knowledge Panels:

  • Brand panels
  • Local panels

While brand panels provide generic information about a person or company’s brand, local panels offer local information. You can see how the local panel offers the local business’s address, hours, and phone in the example above. In short, that is a touchpoint provided by Google between the user and the local industry.

Where does Google get the information from the Knowledge Panel? Google itself specifies that “Knowledge panels are powered by information in the Knowledge Graph.”

What is a Knowledge Graph?

In 2012 Google started to build a “massive Semantics Index” of the web called Knowledge Graph. In short, a knowledge graph is a logical way to organize information on the web. In the past, Google could not rely on the direct meaning of words on a web page. With the knowledge graph, instead, the search engine can collect information on the web and organize it around simple logical phrases, called triples (for ex. “I am Gennaro” and “Gennaro knows Jason”).

Those triples are combined according to logical relationships, and those relationships are built on top of a vocabulary called Schema.org. In short, Schema.org defines the possible relationships available among things on the web.

Thus, two people that in Schema are defined as entity type “person” can be associated via a property called “knows.” That is how we might make it clear to Google that the two people know each other.

From those relationships among things (which can be people, organizations, events, or any other item on the web), a Knowledge Graph is born:

Knowledge Graph Fourweekmba

Example of a knowledge graph shaped on a web page from FourWeekMBA that answers the query “Who’s Gennaro Cuofano”

Where does Google get the information to comprise in its Knowledge Graph? As pointed out on Go Fish Digital, some of the sources are:

In short, there isn’t a single source from where Google mines the information to include in its Knowledge Panels.

Is a Knowledge Panel worth your time and effort?

Is it worth it to gain a knowledge panel?

wikipedia knowledge panel

A Knowledge Panel isn’t only the avenue toward voice search but also an organic traffic hack. It’s interesting to see how a good chunk of Wikipedia traffic comes from Google’s knowledge panels. Of course, Wikipedia is a trusted and authoritative website. Also, one consequence of knowledge panels might be the so-called no-clicks searches (those who don’t necessarily produce a click-through from the search results pages).

Yet, as of now, a Knowledge Panel is an excellent opportunity to gain qualified traffic from search and get ready for voice search.

How do you get your Knowledge Panel?

Key takeaways

As search evolves toward AEO, it also changes how you need to look at content structuring. As Google SERP adds features, such as featured snippets and Knowledge Panels, those capture a good part of the traffic. Thus, as a company, person, or business, you need to understand gaining traction via knowledge panels. The key is Google’s Knowledge Graph, which leverages the Google knowledge vault.

It is your turn now to start experimenting to get your Knowledge Panel!

Big News: Podcast Knowledge Panel now live on Google Search! Find out more here

 

The DBpedia Databus – Transforming Linked Data into a Networked Data Economy

The DBpedia Databus – Transforming Linked Data into a Networked Data Economy

DBpedia has served as a Unified Access Platform for the data in Wikipedia for over a decade. During that time DBpedia has established many of the best practices for publishing data on the web. In fact, that is the project that hosted a knowledge graph even before Google coined the term. For the past 10 years, they were “extracting and refining useful information from Wikipedia”, and are expert in that field. However, there was always a motivation to extend this with other data and allow users unified access. The community, the board, and the DBpedia Association felt an urge to innovate the project. They were re-envisioning DBpedia’s strategy in a vital discussion for the past two years resulting in new mission statement: “global and unified access to knowledge graphs”.

Last September, during the SEMANTiCS Conference in Vienna, Andrea Volpini and David Riccitelli had a very interesting meeting with Dr. Ing. Sebastian Hellmann from the University of Leipzig, who sits on the board of DBpedia. The main topic of that meeting was the DBpedia Databus since we at WordLift are participating as early adopters. It is a great opportunity to add links from DBpedia to our knowledge graph. On that occasion, Andrea asked Sebastian Hellmann to participate in an interview, and he kindly accepted the call. These are the questions we asked him.

 

Sebastian Hellmann is head of the “Knowledge Integration and Language Technologies (KILT)” Competence Center at InfAI. He also is the executive director and board member of the non-profit DBpedia Association. Additionally, he is a senior member of the “Agile Knowledge Engineering and Semantic Web” AKSW research center, focusing on semantic technology research – often in combination with other areas such as machine learning, databases, and natural language processing. Sebastian is a contributor to various open-source projects and communities such as DBpedia, NLP2RDF, DL-Learner and OWLG, and has been involved in numerous EU research projects.

Sebastian Hellmann

Head of the “Knowledge Integration and Language Technologies (KILT)" Competence Center at InfAI, DBpedia

How DBpedia and the Databus are planning to transform linked data in a networked data economy?

We have published data regularly and already achieved a high level of connectivity in the data network. Now, we plan a hub, where everybody uploads data. In that hub, useful operations like versioning, cleaning, transformation, mapping, linking, merging, hosting are done automatically and then again dispersed in a decentral network to the consumers and applications. Our mission incorporates two major innovations that will have an impact on the data economy.

Providing global access
That mission follows the agreement of the community to include their data sources into the unified access as well as any other source. DBpedia has always accepted contributions in an ad-hoc manner, and now we have established a clear process for outside contributions.

Incorporating “knowledge graphs” into the unified access
That means we will reach out to create an access platform not only to Wikipedia (DBpedia Core) but also Wikidata and then to all other knowledge graphs and databases that are available.

The result will be a network of data sources that focus on the discovery of data and also tackles the heterogeneity (or in Big Data terms Variety) of data.

What is DBpedia Databus?

The DBpedia Databus is part of a larger strategy following the mission to provide “Global and Unified Access to knowledge”. The DBpedia Databus is a decentralized data publication, integration, and subscription platform.

  • Publication: Free tools enable you to create your own Databus-stop on your web space with standard-compliance metadata and clear provenance (private key signature).
  • Integration: DBpedia will aggregate the metadata and index all entities and connect them to clusters.
  • Subscription: Metadata about releases are subscribable via RSS and SPARQL. Entities are connected to Global DBpedia Identifiers and are discoverable via HTML, Linked Data, SPARQL, DBpedia releases and services.

DBpedia is a giant graph and the result of an amazing community effort – how is the work being organized these days?

DBpedia’s community has two orthogonal, but synergetic motivations:

  • Build a public information infrastructure for greater societal value and access to knowledge;
  • Business development around this infrastructure to drive growth and quality of data and services in the network.

The main motivation is to be finally able to discover and use data easily. Therefore, we are switching to the Databus platform. The DBpedia Core releases (Extraction from Wikidata and Wikipedia) are just one of many datasets that are published via the Databus platform in the future. One of the many innovations here is that DBpedia Core releases are more frequent and more reliable. Any data provider can benefit from the experience we gained in the last decade by publishing data like DBpedia does and connect better to users.

We’re planning to give our WordLift users the option to join the DBpedia Databus. What are the main benefits of doing so?

The new infrastructure allows third parties to publish data in the same way as DBpedia does. As a data provider, you can submit your data to DBpedia and DBpedia will build an entity index over your data. The main benefit of this index is that your data becomes discoverable. DBpedia acts as a transparent middle-layer. Users can query DBpedia and create a collection of entities they are interested in. For these sets, we will provide links to your data, so that users can access them at the source.

For data providers our new system has three clear-cut benefits:

  1. Their data is advertised and receives more attention and traffic redirects;
  2. Once indexed, DBpedia will be able to send linking updates to data providers, therefore aiding in data integration;
  3. The links to the data will disseminate in the data network and generate network-wide integration and backlinks.

Publishing data with us means connecting and comparing your data to the network. In the end, DBpedia is the only database you need to connect with to in order to get global and unified access to knowledge graphs.

DBpedia and Wikidata both publish entities based on Wikipedia and both use RDF and the semantic web stack. They do fulfill quite different tasks though. Can you tell us more about how DBpedia is different from Wikidata and how these two will co-evolve in the next future?

As a knowledge engineer, I have learned a lot by analyzing the data acquisition processes of Wikidata. In the beginning, the DBpedia community was quite enthusiastic to submit DBpedia’s data back to Wikimedia via Wikidata. After trying for several years, we had to find out that it is not as easy to contribute data in bulk directly to Wikidata as the processes are volunteer-driven and allow only small-scale edits or bots. Only a small percentage of Freebase’s data was ingested. They follow a collect and copy approach, which ultimately inspired the sync-and-compare approach of the Databus.

Data quality and curation follow the Law of Diminishing Returns in a very unforgiving curve. In my opinion, Wikidata will struggle with this in the future. Doubling the volunteer manpower will improve quantity and quality of data by dwindling, marginal percentages. My fellow DBpedians and I have always been working with other people’s data and we have consulted hundreds of organizations in small and large projects. The main conclusion here is that we are all sitting in the same boat with the same problem. The Databus allows every organization to act as a node in the data network (Wikidata is also one node thereof). By improving the accessibility of data, we open the door to fight the law of diminishing returns. Commercial data providers can sell their data and increase quality with income; public data curators can sync, reuse and compare data and collaborate on the same data across organizations and effectively pool manpower.

 

 

How to add Schema markup to WordPress

How to add Schema markup to WordPress

If you are a web content writer, there is no need to remind you all the struggle you have to face to distribute your content. Maybe you spend hours – or even days! – of hard work writing awesome content, but once your article is done, you know that your job has just begun. Now it’s time to fine-tune your content for SEO purposes, share it on several channels, monitor search keywords for your next article… Wouldn’t be wonderful to just focus on writing and nothing more?

Semantic markup is the key to success. Schema markup can really help your pages get the traffic they deserve. How? To explain it, we need to do a few steps back: first of all, you need to know what schema.org is.

What is schema.org markup

Schema.org is an initiative launched in 2011 by the world’s largest search engines (Bing, Google, and Yahoo!) to implement a shared vocabulary and adopt standard formats to structure data on web pages.

What is Schema.org

Schema.org markup helps machines understand your content, without fail or ambiguity. 

Let’s explore how to use the Schema markup, the benefits of using it and how it can be implemented on your WordPress website.

How to add Schema.org markup to WordPress

To use schema markup on your pages, you can either use a tool like WordLift or do it manually.
WordLift plugin enables you to add Schema markup on WordPress without writing a single line of code. Once you configured the plugin, a new menu will appear on the right side of your article in the WordPress editor: it will allow you to annotate your content and, by doing so, to create an internal vocabulary to your website or blog.

Adding schema markup to WordPress with WordLift

WordLift uses JSON-LD to inject schema.org markup in your web pages. Click here to see the magic: it’s a gif which shows you the data representation of this article with JSON-LD!

Imagine you have published an event on your website: once you completed creating your specific content, the final step will be to add a normal meta description, which will appear on the search page as plain text. But, by adding Schema markup to the page, you can really help your content stand out by transforming it into a rich snippet and therefore getting a lot more clicks ?

There are several types of schema you can use to mark your content, and by using the event schema markup is possible to show dates, locations and any other detail related to a specific event to help people easily get access to all the information they might need:

Event schema markup

Once the purpose of adding structured data is clear  – that is to provide accurate information about what your content’s website is about, you could also see that adding Schema markup to your site really is a highly-customizable process.

How to increase your traffic with semantic markup

While crawling the web looking for some specific content to be served to users, search engines will unquestionably identify the context your articles belong to. Nowadays this is the most effective and affordable way to distribute your content and made it “findable” to those who are looking for it through Search Engines.

Salzburgerland Party Meeting Event

The example above shows the results of a long-tail search about the upcoming Salzburgerland Party Meeting event. As you can see, the first result is a rich snippet with 2 links and allows you to skip directly to the next events. All that is made possible by the markup, which helps search engines detect the structured data matching the user’s answer inside the whole website. It’s been proven that rich snippets increase the Click Trough Rate: so, more qualified traffic for you, here!

Intelligent Agents

Salzburgland.com uses WordLift to structure its content.

Moreover, you can explore new ways to disseminate your content based on chatbots, which can serve your just-baked articles to your readers depending on their interests.

In the image on the right side, you can see how Intelligent Agents such as Google Allo can answer your voice search questions with appropriate content if they are correctly structured.

To learn more, read this useful article about how to set-up your first chatbot.

 

Assess markup quality with Google’s Structured Data Testing Tool

Once you added your schema markup to WordPress, it’s easy to determine that everything was done right, simply by using the Structured Data Testing Tool made available by Google. Just enter the URL you need to analyze and let the tool verify your content.

Structured Data Testing Tool

Let’s see, as an example, the markup of the SEMANTiCS 2018 Conference on our blog:

Structured Data markup for SEMANTiCS 2028 event

As we can see, everything worked just fine, there’s only 1 warning about the field Offer that in this case has no value added.

The first rule while adding schema markup is to be clear. Google will know! Also, remember that adding schema markup to your page might as well not guarantee any result at first. But it’s always recommended to do it because it can definitely give you the best chance for success in SERPs, and help increase your CTR.

Automating structured data markup with WordLift

While developing WordLift plugin, we focused on making more accurate than ever our schema.org markup.

Now we can say – without fear of contradiction – that our Plugin offers you one of the most extended sets of markup to structure data on a WordPress website… without writing a single line of code!

Since our 3.10 release, WordLift made a lot of improvements and, as the SEO specialist Jarno Van Driel also said (by the way, thanks a lot for your support, Jarno!) our blue plugin generates beautiful 5-star – schema.org powered – linked data graphs.

Here is a list of improvements on the markup that SEO specialists are going to appreciate:

  1. ARTICLE: we’ve added the markup schema.org:Article for each article/blog post, publishing it with the property Main Entity of Page. Simply put: we say to Google and to the other search engines that this web page is an article. To know more about this property, read this how-to by Jarno Van Driel.
  2. PUBLISHER: we also communicate the publisher’s information related to each article as structured data. The publisher can be an individual with his/her proper name or an organization with a brand name and a logo.
  3. ID: with WordLift we also made available the Publisher ID. What is an ID, and why it is so important? For each entity, article, and publisher, we generate a permanent ID: a unique identifier which is fundamental in the context of 5 stars Linked Data because it allows the connections between data on the web. Each entity, article, and publisher can be connected to other data, hosted – for example – in WikiData, with the “same as” property and each of them can also be decoded with a JSON-LD data representation.
  4. RELATED ENTITIES: we used the meta tag “mentions” to say which entities are mentioned. In this way, you’ll have a hierarchy or entities where the main one defines the article itself and the other ones are recognized as mentioned on it.

 

To play around with JSON-LD markup that WordLift created for this article head straight to the JSON-LD Playground.

 

 

 

How knowledge graphs can help your travel brand attract more visitors

How knowledge graphs can help your travel brand attract more visitors

For the first time this year we can finally say that knowledge graphs and semantic technologies are hype. People like me, who played with the semantic web stack for several years now, have long predicted that one day we would have a Graph for Everything. We did wait for long and hopefully not in vain ? until recently Gartner finally shout out loud that 2018 is indeed the “Year of the Graph”. We, here at WordLift, are far beyond the hype. We have built technologies, open source frameworks, companies and products on this vision of semantic web, knowledge representation and ontologies.

Knowledge Graph Technology in the Hype Cycle 2018 Gartner

Knowledge Graphs in the Gartner’s Hype Cycle for 2018.

For many years, way too many, talking with large enterprises or public institutions like the Italian Parliament about the importance of creating taxonomies and labeling information has been extremely frustrating, and yet I am very thankful to everyone who has listened to me and helped us get to the point of writing an article like this one.

Knowledge graphs are real and bring a competitive advantage to large enterprises like Amazon, Google, LinkedIn, Uber, Zalando, Airbnb, Microsoft, and other internet powerhouses but no, this article is not about giant graphs from large enterprises. It is about our direct experience in helping travel brands like bungalowparkoverzicht in the Netherlands, the largest tour operator in Iceland and SalzburgerLand in Austria.

WHAT IS A KNOWLEDGE GRAPH?

A knowledge graph is a way of representing human-knowledge to machines. In short, you start by defining the main concepts as nodes and the relationships among these concepts as edges in a graph. READ MORE

Not all Graph are created equal and each organization has its own business goals and ways of representing relationships between related entities. We model data and build knowledge graphs to create a context, to improve content findability by leveraging on semantic search engines like Google and Bing and to provide precise answers to certain questions. When you have organized your data semantically and you have built your own taxonomy there are many applications that can be implemented: from classifying items to integrating data coming out of different pipelines, from building complex reasoning systems, to publishing metadata on the web. When we built the knowledge graph for a travel brand like bungalowparkoverzicht our main focus was on the type of information that a traveler would need before reaching the destinations.

We model data for the so-called “planning and booking moments”. Planning, accordingly to a research from Google, starts when a digital traveler has chosen a destination and is then looking for the right time and place to stay. Then the booking will follow, and that’s the moment when the travelers move into reserving their perfect hotel, choose a room and reserve it.

Types of Information to model for the planning and booking moments

When modeling hotel-related information in Web content using the schema.org vocabulary you basically work with three core type of nodes (entity types):

  • A lodging business, (e.g. a hotel, hostel, resort, or a camping site): essentially the place and local business that houses the actual units of the establishment (e.g. hotel rooms). The lodging business can encompass multiple buildings but is in most cases a coherent place.
  • An accommodation, i.e. the actually relevant units of the establishment (e.g. hotel rooms, suites, apartments, meeting rooms, camping pitches, etc.). These are the actual objects that are offered for rental.
  • An offer to let a hotel room (or other forms of accommodations) for a particular amount of money and for a given type of usage (e.g. occupancy), typically further constrained by advance booking requirements and other terms and conditions.
Schema Markup for hotels and lodging businesses

Schema Markup for hotels and lodging businesses.

Relationships (edges in the graph) between these entities are designed in such a way that several potential conversations between a lodging business and a potential client become possible. We simply:

a) encode these relationships using an open vocabulary and, by doing so,  

b) easily enable search engines and/or virtual assistants to traverse these connections in multiple ways.

As seen above we can map – using the vocabulary – all the hospitality infrastructures as schema:Organization and create a page listing all the different companies behind these businesses or we can list these hotels and lodging facilities using their geolocation and the properties of the schema:Place type.

Making it happen

The content management system in the back-end uses a relational database, and this is just great as most of the data needs to be used with transactional processes (versioning, reviews are all based on efficiently storing data into tables). Our work is to apply to each data-point the semantics required to:

  1. publish metadata on the web using structured data that machines can understand
  2. index each item of the property inventory (i.e. all the proposed hotels, all the locations, …) with a unique identifier and a corresponding representation in an RDF knowledge graph
  3. semantically annotate editorial content with all the nodes that are relevant for our target audience (i.e. annotating an article about a camping site in the Netherlands with the same entity that connects that location with the related schema:LodgingBusiness)   
  4. have a nice and clean API to query and eventually enrich the data in the graph using other publicly available data coming from Wikidata, GeoNames or DBpedia
  5. provide search engines and virtual assistants with the booking URL using schema:ReserveAction(see the example below) to make this data truly actionable.

1. Publishing metadata on the Web: data quality becomes King

Since major search providers (including Google, Microsoft, Yahoo, and Yandex) joined forces to define a common language for semantic markup, semantic web technologies became an important asset of online business of all sort. At the time of writing this article, 10 million websites use Schema.org to mark up their web pages.

Structured Data from the Common Web Crawl

Structured Data Growth from the Common Web Crawl.

While there is a growing interest in adding structured data in general, the focus is now shifting from providing whatever form of structured data to providing high-quality data that can have a real impact on the new entity-oriented search.

WHAT IS ENTITY-ORIENTED SEARCH?

Entity-oriented search, as defined by Krisztian Balog in his book, is the search paradigm of organizing and accessing information centered around entities, and their attributes and relationships.

Ranking high on long tail intents like the ones we see in the travel sector is – in several cases – about providing consistent and reliable information in a structured form.

How structured data might be used in Google synthetic queries

How structured data might be used in Google synthetic queries.

The importance of geocoding the address

To give you a practical example, when making explicit the data about the address of the lodging business for the Dutch website, we realised that the data we had in the CMS wasn’t good enough to be published online using schema and we decided to reverse geocode the address and extract the data in a clean and reliable format, using an external API. A simple heuristic like this one improves the quality of the data describing thousands of lodging businesses that can now be unambiguously ranked for various type of searches.         

Using well-known datasets to disambiguate location-specific characteristics

In schema, when describing most of the hotel-related types and properties – e.g. telling hosts that the hotel might have a WiFi Internet connection – we can use the amenityFeature property that is derived from the STI accommodation ontology (our friends in Innsbruck at the Semantic Technology Institute that have greatly contributed to the travel extension of Schema).

Unfortunately, there is not common taxonomy yet for describing these properties (the wifi or the presence of a safe in the room). In order to help search engines and virtual assistants disambiguate these properties at best, in WordLift we’re providing a mapping between these hotel-related properties and entities in Wikidata. In this way, we can add an unambiguous pointer to – let’s say – the concept of WiFi, that in Wikidata corresponds to the entity Q29643.

2. Creating unique IDs for entities in the graph

When representing the nodes in our graph we create entities and we group them in a catalog (we call it vocabulary). All the entities we have in the catalog belong to different types (i.e. Lodging business, Organization, Place, Offer). The entity catalog defines the universe we know and each entity has its own unique identifier. The fact that we can have an ID for each node turns out to be surprisingly useful as it allows us to have a  one-to-one correspondence between a node (represented by its ID) and the real-world object it represents.

An accommodation like the Strand Resort Ouddorp Duin in the South of Holland, for example, has its own unique ID in the graph on http://data.wordlift.io/wl0760/vakantiepark/strand_resort_ouddorp_duin.

3. Bridging text and structure

Combining structured and unstructured information is key for improving search breadth and quality from external search engines like Google and Bing. It also becomes very important to provide a consistent user experience within the site. Let’s say that you are referring, in an article from the blog, to South of Holland or to the Landal Strand Resort we talked about before: you want your users to see the latest promotions from this resort and/or offers from other properties nearby. Connecting editorial content from the blog using the data in the graph is called entity-linking. It is done by annotating mentions of specific entities (or properties of these entities) being described in a text, with their own unique identifiers from the underlying knowledge graph. This creates a context for the users (and for external search engines) and a simple way to improve the user experience by suggesting a meaningful navigation path (i.e. “let’s see all the resorts in the region” or “let’s see the latest offers from the Strand Resort”).  

Florian Bauhuber presenting SLT Knowledge Graph at Castelcamp Kaprun 2018

Florian Bauhuber from Tourismuszukunft presenting SLT Knowledge Graph at Castelcamp Kaprun 2018.

4. Discovering new facts by linking external data

Kaprun in GeoNames

Kaprun in GeoNames.

Having a graph in RDF format is also about linking your data with other data. A great travel destination in Salzburgerland like Kaprun has its own entity ID in the graph http://open.salzburgerland.com/en/entity/kaprunbuilt by the Region of Salzburg using WordLift. This entity is linked with the equivalent entities in the Web of data. In GeoNames it corresponds to the entity http://sws.geonames.org/2774758/ (GeoNames is a freely available geographical database that contains a lot more properties about Kaprun that what we store in our graph). We can see from GeoNames that Kaprun is 786m above sea level and belongs to the Zell am See region in Salzburgerland. These informations are immediately accessible to search engines and can be also stored in the index of the website internal search engine to let users find Kaprun when searching for towns in Zell am See or destination in Salzburgerland close to a lake. This wealth of open data, interlinked with our graph, can be made immediately accessible to our users by adding attributes in Schema that search engines understand. An internal search engine with these information becomes “semantic” and we don’t need to maintain or curate this information (unless we find it unreliable). Wow!   

WHAT IS RDF?

The Resource Description Framework (RDF),  is a W3C standard for describing entities in a knowledge base. An entity such as a hotel can be represented as a set of RDF statements. These statements may be seen as facts or assertions about that entity. A knowledge graph is a structured knowledge repository for storing and organizing statements about entities. READ MORE

SLT Knowledge Graph in the Linked Open Data Cloud

SLT Knowledge Graph in the Linked Open Data Cloud.

5. From answering questions to making it all happen: introducing Schema Actions

We use nodes and edges in the graph to help search engines and virtual assistants answer specific questions like “Where can I find a camping site with a sauna close to a ski resort in Germany?”. These are informational intents that can be covered by providing structured data using the schema.org vocabulary to describe entities.

In 2014 Schema.org, the consortium created by the search engines to build a common vocabulary introduced a new extension called Actions. The purpose of Schema Actions is to go beyond the static description of entities – people, places, hotels, restaurants, … and to describe the actions that can be invoked (or have been invoked) using these entities.

In the context of the knowledge graph for a travel brand, we’re starting to use Schema Actions to let search engines and virtual assistants know what is the URL to be used for booking a specific hotel.

Here is an example of the JSON-LD code injected in the page of a camping village providing the indication of the URL that can be used on the different devices (see the attribute  actionPlatform) to initiate the booking process.


  "potentialAction": {
	"@type": "ReserveAction",
	"target": {
  	    "@type": "EntryPoint",
  	    "urlTemplate": "/boek/canvas-belvedere-village/",
  	    "inLanguage": "nl-NL",
  	    "actionPlatform": [
    	        "http://schema.org/DesktopWebPlatform",
    	        "http://schema.org/IOSPlatform",
    	        "http://schema.org/AndroidPlatform"
  	    ]
	},
	"result": {
  	    "@type": "LodgingReservation",
  	    "name": "Reserveren of meer informatie?"
	}
  }

Next steps and final thoughts

As we’re continuing to explore new ways to collect, improve and reuse the information in the knowledge bases we are building with our clients in the travel industry, a new landscape of applications is emerging. Data is playing a pivotal role in the era of personal assistants, content recommendations and entity-oriented search. We are focusing on making knowledge as explicit as possible inside these organizations, to help searchers traverse it in a meaningful way.

The semantic web is a branch of artificial intelligence specifically designed to transfer human knowledge to machines. Human knowledge, in the travel sector, is really what creates a concrete business value for the travelers.

When planning for a next vacation we are constantly looking for something new, sometimes even unusual, but at the same time we need full reliability and we want to complete the planning and booking process in the best possible way, and with the least amount of effort. 

For travel brands, destinations, online travel agencies, and resorts building a knowledge graph is truly the best way to improve the traveler experience, to market the travel offers and to prepare for the “AI-first world” of voice search and personal assistants.

Are you ready to build your travel-oriented knowledge graph? Contact us

Credits

Thanks to Rainer Edlinger and Martin Reichhart that this year invited me to the Castel Kamp in Kaprun where every year the travel community from Austria, Germany, and Südtirol gathers to share their experiences, best practices and challenges in the digital marketing world. I have been also very happy to meet again Reinhard Lanner with whom I started this journey back in 2014. A great “Grazie” also to our wonderful team that is constantly working to improve our technology and to help our clients get the most out of our stack. 

Feel free to connect if you want to know more about SEO for travel websites and if you have any more questions about my experience with Knowledge Graphs for your travel brand!