As search engines move toward voice search, mobile personal assistants adoption is growing at a fast rate. While the transition is already happening, there is another interesting phenomenon to notice. The SERP has changed substantially in the last couple of years. As Google rolls out new features that appear on the “above the fold” (featured snippets, knowledge panels and featured snippets filter bubbles) those allow us to understand how voice search might look like.
In this article, we’ll focus mainly on the knowledge panel, why it is critical and how you can get it too.
The Knowledge Panel: The Google’s above the fold worth billions
The knowledge panel is a feature that Google uses to provide quick and reliable information about brands (be them personal or company brands). For instance, in the case above you can see that for the query “who’s Gennaro Cuofano” on the US search results Google is giving both a featured snippet (on the left) and a knowledge panel (on the right).
While the featured snippet aim is to provide a practical answer, fast; the knowledge panel aim is to provide a reliable answer (coming from a more authoritative source) and additional information about that brand. In many cases, the knowledge panel is also a “commercial feature” that allows brands to monetize on their products. For instance, you can see how my knowledge panel points toward books on Amazon that could be purchased in the past.
This space on the SERP, which I like to call “above the fold” has become the most important asset on the web. While Google first page remains an objective for most businesses, it is also true, that going toward voice search traffic will be eaten more and more by those features that appear on the search results pages, even before you get to the first position.
How does Google create knowledge panels? And how do you get one?
Knowledge panel: the key ingredient is Google’s knowledge vault
When people search for a business on Google, they may see information about that business in a box that appears to the right of their search results. The information in that box, called the knowledge panel, can help customers discover and contact your business.
In most cases, you’ll notice two main kinds of knowledge panels:
While brand panels provide generic information about a person or company’s brand, local panels offer instead information that is local. In the example above, you can see how the local panel provides the address, hours and phone of the local business. In short, that is a touch point provided by Google between the user and the local business.
Where does Google get the information from the knowledge panel? Google itself specifies that “Knowledge panels are powered by information in the Knowledge Graph.”
What is a knowledge graph?
Back in 2012 Google started to build a “massive Semantics Index” of the web called knowledge graph. In short, a knowledge graph is a logical way to organize information on the web. While in the past Google could not rely on the direct meaning of words on a web page, the knowledge graph instead allows the search engine to collect information on the web and organize it around simple logical phrases, called triples (for ex. “I am Gennaro” and “Gennaro knows Jason”).
Those triples are combined according to logical relationships, and those relationships are built on top of a vocabulary called Schema.org. In short, Schema.org defines the possible relationships available among things on the web.
Thus, two people that in Schema are defined as entity type “person” can be associated via a property called “knows.” That is how we might make clear to Google the two people know each other.
From those relationships among things (which can be people, organizations, events or any other thing on the web) a knowledge graph is born:
Example of a knowledge graph shaped on a web page from FourWeekMBA that answers the query “Who’s Gennaro Cuofano”
Where does Google get the information to comprise in its knowledge graph? As pointed out on Go Fish Digital, some of the sources are:
In short, there isn’t a single source from where Google mines the information to include in its knowledge panels.
Is a knowledge panel worth your time and effort?
Is it worth it to gain a knowledge panel?
A knowledge panel isn’t only the avenue toward voice search but also an organic traffic hack. It’s interesting to see how a good chunk of Wikipedia traffic comes from Google’s knowledge panels. Of course, Wikipedia is a trusted and authoritative website. Also, one consequence of knowledge panels might be the so-called no-clicks searches (those who don’t necessarily produce a click through from the search results pages).
Yet, as of now, a knowledge panel is an excellent opportunity to gain qualified traffic from search and get ready for voice search.
As search is evolving toward AEO, it also changes the way you need to look at content structuring. As Google SERP adds features, such as featured snippets and knowledge panels, those end up capturing a good part of the traffic. Thus, as a company, person or business you need to understand how to gain traction via knowledge panels. The key is Google’s knowledge graph, which leverages on Google knowledge vault.
It is your turn now to start experimenting to get your knowledge panel!
DBpedia has served as a Unified Access Platform for the data in Wikipedia for over a decade. During that time DBpedia has established many of the best practices for publishing data on the web. In fact, that is the project that hosted a knowledge graph even before Google coined the term. For the past 10 years, they were “extracting and refining useful information from Wikipedia”, and are expert in that field. However, there was always a motivation to extend this with other data and allow users unified access. The community, the board, and the DBpedia Association felt an urge to innovate the project. They were re-envisioning DBpedia’s strategy in a vital discussion for the past two years resulting in new mission statement: “global and unified access to knowledge graphs”.
Last September, during the SEMANTiCS Conference in Vienna, Andrea Volpini and David Riccitelli had a very interesting meeting with Dr. Ing. Sebastian Hellmann from the University of Leipzig, who sits on the board of DBpedia. The main topic of that meeting was the DBpedia Databus since we at WordLift are participating as early adopters. It is a great opportunity to add links from DBpedia to our knowledge graph. On that occasion, Andrea asked Sebastian Hellmann to participate in an interview, and he kindly accepted the call. These are the questions we asked him.
Sebastian Hellmann is head of the “Knowledge Integration and Language Technologies (KILT)” Competence Center at InfAI. He also is the executive director and board member of the non-profit DBpedia Association. Additionally, he is a senior member of the “Agile Knowledge Engineering and Semantic Web” AKSW research center, focusing on semantic technology research – often in combination with other areas such as machine learning, databases, and natural language processing. Sebastian is a contributor to various open-source projects and communities such as DBpedia, NLP2RDF, DL-Learner and OWLG, and has been involved in numerous EU research projects.
Head of the “Knowledge Integration and Language Technologies (KILT)" Competence Center at InfAI, DBpedia
How DBpedia and the Databus are planning to transform linked data in a networked data economy?
We have published data regularly and already achieved a high level of connectivity in the data network. Now, we plan a hub, where everybody uploads data. In that hub, useful operations like versioning, cleaning, transformation, mapping, linking, merging, hosting are done automatically and then again dispersed in a decentral network to the consumers and applications. Our mission incorporates two major innovations that will have an impact on the data economy.
Providing global access That mission follows the agreement of the community to include their data sources into the unified access as well as any other source. DBpedia has always accepted contributions in an ad-hoc manner, and now we have established a clear process for outside contributions.
Incorporating “knowledge graphs” into the unified access That means we will reach out to create an access platform not only to Wikipedia (DBpedia Core) but also Wikidata and then to all other knowledge graphs and databases that are available.
The result will be a network of data sources that focus on the discovery of data and also tackles the heterogeneity (or in Big Data terms Variety) of data.
What is DBpedia Databus?
The DBpedia Databus is part of a larger strategy following the mission to provide “Global and Unified Access to knowledge”. The DBpedia Databus is a decentralized data publication, integration, and subscription platform.
Publication: Free tools enable you to create your own Databus-stop on your web space with standard-compliance metadata and clear provenance (private key signature).
Integration: DBpedia will aggregate the metadata and index all entities and connect them to clusters.
Subscription: Metadata about releases are subscribable via RSS and SPARQL. Entities are connected to Global DBpedia Identifiers and are discoverable via HTML, Linked Data, SPARQL, DBpedia releases and services.
DBpedia is a giant graph and the result of an amazing community effort – how is the work being organized these days?
DBpedia’s community has two orthogonal, but synergetic motivations:
Build a public information infrastructure for greater societal value and access to knowledge;
Business development around this infrastructure to drive growth and quality of data and services in the network.
The main motivation is to be finally able to discover and use data easily. Therefore, we are switching to the Databus platform. The DBpedia Core releases (Extraction from Wikidata and Wikipedia) are just one of many datasets that are published via the Databus platform in the future. One of the many innovations here is that DBpedia Core releases are more frequent and more reliable. Any data provider can benefit from the experience we gained in the last decade by publishing data like DBpedia does and connect better to users.
We’re planning to give our WordLift users the option to join the DBpedia Databus. What are the main benefits of doing so?
The new infrastructure allows third parties to publish data in the same way as DBpedia does. As a data provider, you can submit your data to DBpedia and DBpedia will build an entity index over your data. The main benefit of this index is that your data becomes discoverable. DBpedia acts as a transparent middle-layer. Users can query DBpedia and create a collection of entities they are interested in. For these sets, we will provide links to your data, so that users can access them at the source.
For data providers our new system has three clear-cut benefits:
Their data is advertised and receives more attention and traffic redirects;
Once indexed, DBpedia will be able to send linking updates to data providers, therefore aiding in data integration;
The links to the data will disseminate in the data network and generate network-wide integration and backlinks.
Publishing data with us means connecting and comparing your data to the network. In the end, DBpedia is the only database you need to connect with to in order to get global and unified access to knowledge graphs.
DBpedia and Wikidata both publish entities based on Wikipedia and both use RDF and the semantic web stack. They do fulfill quite different tasks though. Can you tell us more about how DBpedia is different from Wikidata and how these two will co-evolve in the next future?
As a knowledge engineer, I have learned a lot by analyzing the data acquisition processes of Wikidata. In the beginning, the DBpedia community was quite enthusiastic to submit DBpedia’s data back to Wikimedia via Wikidata. After trying for several years, we had to find out that it is not as easy to contribute data in bulk directly to Wikidata as the processes are volunteer-driven and allow only small-scale edits or bots. Only a small percentage of Freebase’s data was ingested. They follow a collect and copy approach, which ultimately inspired the sync-and-compare approach of the Databus.
Data quality and curation follow the Law of Diminishing Returns in a very unforgiving curve. In my opinion, Wikidata will struggle with this in the future. Doubling the volunteer manpower will improve quantity and quality of data by dwindling, marginal percentages. My fellow DBpedians and I have always been working with other people’s data and we have consulted hundreds of organizations in small and large projects. The main conclusion here is that we are all sitting in the same boat with the same problem. The Databus allows every organization to act as a node in the data network (Wikidata is also one node thereof). By improving the accessibility of data, we open the door to fight the law of diminishing returns. Commercial data providers can sell their data and increase quality with income; public data curators can sync, reuse and compare data and collaborate on the same data across organizations and effectively pool manpower.
If you are a web content writer, there is no need to remind you all the struggle you have to face to distribute your content. Maybe you spend hours – or even days! – of hard work writing awesome content, but once your article is done, you know that your job has just begun. Now it’s time to fine-tune your content for SEO purposes, share it on several channels, monitor search keywords for your next article… Wouldn’t be wonderful to just focus on writing and nothing more?
Semantic markup is the key to success. Schema markup can really help your pages get the traffic they deserve. How? To explain it, we need to do a few steps back: first of all, you need to know what schema.org is.
What is schema.org markup
Schema.org is an initiative launched in 2011 by the world’s largest search engines (Bing, Google, and Yahoo!) to implement a shared vocabulary and adopt standard formats to structure data on web pages.
Schema.org markup helps machines understand your content, without fail or ambiguity.
Let’s explore how to use the Schema markup, the benefits of using it and how it can be implemented on your WordPress website.
How to add Schema.org markup to WordPress
To use schema markup on your pages, you can either use a tool like WordLift or do it manually. WordLift plugin enables you to add Schema markup on WordPress without writing a single line of code. Once you configured the plugin, a new menu will appear on the right side of your article in the WordPress editor: it will allow you to annotate your content and, by doing so, to create an internal vocabulary to your website or blog.
WordLift uses JSON-LD to inject schema.org markup in your web pages. Click here to see the magic: it’s a gif which shows you the data representation of this article with JSON-LD!
Imagine you have published an event on your website: once you completed creating your specific content, the final step will be to add a normal meta description, which will appear on the search page as plain text. But, by adding Schema markup to the page, you can really help your content stand out by transforming it into a rich snippet and therefore getting a lot more clicks 😉
There are several types of schema you can use to mark your content, and by using the event schema markup is possible to show dates, locations and any other detail related to a specific event to help people easily get access to all the information they might need:
Once the purpose of adding structured data is clear – that is to provide accurate information about what your content’s website is about, you could also see that adding Schema markup to your site really is a highly-customizable process.
How to increase your traffic with semantic markup
While crawling the web looking for some specific content to be served to users, search engines will unquestionably identify the context your articles belong to. Nowadays this is the most effective and affordable way to distribute your content and made it “findable” to those who are looking for it through Search Engines.
The example above shows the results of a long-tail search about the upcoming Salzburgerland Party Meeting event. As you can see, the first result is a rich snippetwith 2 links and allows you to skip directly to the next events. All that is made possible by the markup, which helps search engines detect the structured data matching the user’s answer inside the whole website. It’s been proven that rich snippets increase the Click Trough Rate: so, more qualified traffic for you, here!
Salzburgland.com uses WordLift to structure its content.
Moreover, you can explore new ways to disseminate your content based on chatbots, which can serve your just-baked articles to your readers depending on their interests.
In the image on the right side, you can see how Intelligent Agents such as Google Allo can answer your voice search questions with appropriate content if they are correctly structured.
Assess markup quality with Google’s Structured Data Testing Tool
Once you added your schema markup to WordPress, it’s easy to determine that everything was done right, simply by using the Structured Data Testing Tool made available by Google. Just enter the URL you need to analyze and let the tool verify your content.
Let’s see, as an example, the markup of the SEMANTiCS 2018 Conference on our blog:
As we can see, everything worked just fine, there’s only 1 warning about the field Offer that in this case has no value added.
The first rule while adding schema markup is to be clear. Google will know! Also, remember that adding schema markup to your page might as well not guarantee any result at first. But it’s always recommended to do it because it can definitely give you the best chance for success in SERPs, and help increase your CTR.
Automating structured data markup with WordLift
While developing WordLift plugin, we focused on making more accurate than ever our schema.org markup.
Now we can say – without fear of contradiction – that our Plugin offers you one of the most extended sets of markup to structure data on a WordPress website… without writing a single line of code!
Here is a list of improvements on the markup that SEO specialists are going to appreciate:
ARTICLE: we’ve added the markup schema.org:Article for each article/blog post, publishing it with the property Main Entity of Page. Simply put: we say to Google and to the other search engines that this web page is an article. To know more about this property, read this how-to by Jarno Van Driel.
PUBLISHER: we also communicate the publisher’s information related to each article as structured data. The publisher can be an individual with his/her proper name or an organization with a brand name and a logo.
ID: with WordLift we also made available the Publisher ID. What is an ID, and why it is so important? For each entity, article, and publisher, we generate a permanent ID: a unique identifier which is fundamental in the context of 5 stars Linked Data because it allows the connections between data on the web. Each entity, article, and publisher can be connected to other data, hosted – for example – in WikiData, with the “same as” property and each of them can also be decoded with a JSON-LD data representation.
RELATED ENTITIES: we used the meta tag “mentions” to say which entities are mentioned. In this way, you’ll have a hierarchy or entities where the main one defines the article itself and the other ones are recognized as mentioned on it.
To play around with JSON-LD markup that WordLift created for this article head straight to the JSON-LD Playground.
For the first time this year we can finally say that knowledge graphs and semantic technologies are hype. People like me, who played with the semantic web stack for several years now, have long predicted that one day we would have a Graph for Everything. We did wait for long and hopefully not in vain 😀 until recently Gartner finally shout out loud that 2018 is indeed the “Year of the Graph”. We, here at WordLift, are far beyond the hype. We have built technologies, open source frameworks, companies and products on this vision of semantic web, knowledge representation and ontologies.
Knowledge Graphs in the Gartner’s Hype Cycle for 2018.
For many years, way too many, talking with large enterprises or public institutions like the Italian Parliament about the importance of creating taxonomies and labeling information has been extremely frustrating, and yet I am very thankful to everyone who has listened to me and helped us get to the point of writing an article like this one.
A knowledge graph is a way of representing human-knowledge to machines. In short, you start by defining the main concepts as nodes and the relationships among these concepts as edges in a graph. READ MORE
Not all Graph are created equal and each organization has its own business goals and ways of representing relationships between related entities. We model data and build knowledge graphs to create a context, to improve content findability by leveraging on semantic search engines like Google and Bing and to provide precise answers to certain questions. When you have organized your data semantically and you have built your own taxonomy there are many applications that can be implemented: from classifying items to integrating data coming out of different pipelines, from building complex reasoning systems, to publishing metadata on the web. When we built the knowledge graph for a travel brand like bungalowparkoverzicht our main focus was on the type of information that a traveler would need before reaching the destinations.
We model data for the so-called “planning and booking moments”. Planning, accordingly to a research from Google, starts when a digital traveler has chosen a destination and is then looking for the right time and place to stay. Then the booking will follow, and that’s the moment when the travelers move into reserving their perfect hotel, choose a room and reserve it.
Types of Information to model for the planning and booking moments
When modeling hotel-related information in Web content using the schema.org vocabulary you basically work with three core type of nodes (entity types):
A lodging business, (e.g. a hotel, hostel, resort, or a camping site): essentially the place and local business that houses the actual units of the establishment (e.g. hotel rooms). The lodging business can encompass multiple buildings but is in most cases a coherent place.
An accommodation, i.e. the actually relevant units of the establishment (e.g. hotel rooms, suites, apartments, meeting rooms, camping pitches, etc.). These are the actual objects that are offered for rental.
An offer to let a hotel room (or other forms of accommodations) for a particular amount of money and for a given type of usage (e.g. occupancy), typically further constrained by advance booking requirements and other terms and conditions.
Schema Markup for hotels and lodging businesses.
Relationships (edges in the graph) between these entities are designed in such a way that several potential conversations between a lodging business and a potential client become possible. We simply:
a) encode these relationships using an open vocabulary and, by doing so,
b) easily enable search engines and/or virtual assistants to traverse these connections in multiple ways.
As seen above we can map – using the vocabulary – all the hospitality infrastructures as schema:Organization and create a page listing all the different companies behind these businesses or we can list these hotels and lodging facilities using their geolocation and the properties of the schema:Place type.
Making it happen
The content management system in the back-end uses a relational database, and this is just great as most of the data needs to be used with transactional processes (versioning, reviews are all based on efficiently storing data into tables). Our work is to apply to each data-point the semantics required to:
publish metadata on the web using structured data that machines can understand
index each item of the property inventory (i.e. all the proposed hotels, all the locations, …) with a unique identifier and a corresponding representation in an RDF knowledge graph
semantically annotate editorial content with all the nodes that are relevant for our target audience (i.e. annotating an article about a camping site in the Netherlands with the same entity that connects that location with the related schema:LodgingBusiness)
have a nice and clean API to query and eventually enrich the data in the graph using other publicly available data coming from Wikidata, GeoNames or DBpedia
provide search engines and virtual assistants with the booking URL using schema:ReserveAction(see the example below) to make this data truly actionable.
1. Publishing metadata on the Web: data quality becomes King
Since major search providers (including Google, Microsoft, Yahoo, and Yandex) joined forces to define a common language for semantic markup, semantic web technologies became an important asset of online business of all sort. At the time of writing this article, 10 million websites use Schema.org to mark up their web pages.
While there is a growing interest in adding structured data in general, the focus is now shifting from providing whatever form of structured data to providing high-quality data that can have a real impact on the new entity-oriented search.
WHAT IS ENTITY-ORIENTED SEARCH?
Entity-oriented search, as defined by Krisztian Balog in his book, is the search paradigm of organizing and accessing information centered around entities, and their attributes and relationships.
Ranking high on long tail intents like the ones we see in the travel sector is – in several cases – about providing consistent and reliable information in a structured form.
How structured data might be used in Google synthetic queries.
The importance of geocoding the address
To give you a practical example, when making explicit the data about the address of the lodging business for the Dutch website, we realised that the data we had in the CMS wasn’t good enough to be published online using schema and we decided to reverse geocode the address and extract the data in a clean and reliable format, using an external API. A simple heuristic like this one improves the quality of the data describing thousands of lodging businesses that can now be unambiguously ranked for various type of searches.
Using well-known datasets to disambiguate location-specific characteristics
In schema, when describing most of the hotel-related types and properties – e.g. telling hosts that the hotel might have a WiFi Internet connection – we can use the amenityFeature property that is derived from the STI accommodation ontology (our friends in Innsbruck at the Semantic Technology Institute that have greatly contributed to the travel extension of Schema).
Unfortunately, there is not common taxonomy yet for describing these properties (the wifi or the presence of a safe in the room). In order to help search engines and virtual assistants disambiguate these properties at best, in WordLift we’re providing a mapping between these hotel-related properties and entities in Wikidata. In this way, we can add an unambiguous pointer to – let’s say – the concept of WiFi, that in Wikidata corresponds to the entity Q29643.
2. Creating unique IDs for entities in the graph
When representing the nodes in our graph we create entities and we group them in a catalog (we call it vocabulary). All the entities we have in the catalog belong to different types (i.e. Lodging business, Organization, Place, Offer). The entity catalog defines the universe we know and each entity has its own unique identifier. The fact that we can have an ID for each node turns out to be surprisingly useful as it allows us to have a one-to-one correspondence between a node (represented by its ID) and the real-world object it represents.
An accommodation like the Strand Resort Ouddorp Duin in the South of Holland, for example, has its own unique ID in the graph on http://data.wordlift.io/wl0760/vakantiepark/strand_resort_ouddorp_duin.
3. Bridging text and structure
Combining structured and unstructured information is key for improving search breadth and quality from external search engines like Google and Bing. It also becomes very important to provide a consistent user experience within the site. Let’s say that you are referring, in an article from the blog, to South of Holland or to the Landal Strand Resort we talked about before: you want your users to see the latest promotions from this resort and/or offers from other properties nearby. Connecting editorial content from the blog using the data in the graph is called entity-linking. It is done by annotating mentions of specific entities (or properties of these entities) being described in a text, with their own unique identifiers from the underlying knowledge graph. This creates a context for the users (and for external search engines) and a simple way to improve the user experience by suggesting a meaningful navigation path (i.e. “let’s see all the resorts in the region” or “let’s see the latest offers from the Strand Resort”).
Florian Bauhuber from Tourismuszukunft presenting SLT Knowledge Graph at Castelcamp Kaprun 2018.
4. Discovering new facts by linking external data
Kaprun in GeoNames.
Having a graph in RDF format is also about linking your data with other data. A great travel destination in Salzburgerland like Kaprun has its own entity ID in the graph http://open.salzburgerland.com/en/entity/kaprunbuilt by the Region of Salzburg using WordLift. This entity is linked with the equivalent entities in the Web of data. In GeoNames it corresponds to the entity http://sws.geonames.org/2774758/ (GeoNames is a freely available geographical database that contains a lot more properties about Kaprun that what we store in our graph). We can see from GeoNames that Kaprun is 786m above sea level and belongs to the Zell am See region in Salzburgerland. These informations are immediately accessible to search engines and can be also stored in the index of the website internal search engine to let users find Kaprun when searching for towns in Zell am See or destination in Salzburgerland close to a lake. This wealth of open data, interlinked with our graph, can be made immediately accessible to our users by adding attributes in Schema that search engines understand. An internal search engine with these information becomes “semantic” and we don’t need to maintain or curate this information (unless we find it unreliable). Wow!
WHAT IS RDF?
The Resource Description Framework (RDF), is a W3C standard for describing entities in a knowledge base. An entity such as a hotel can be represented as a set of RDF statements. These statements may be seen as facts or assertions about that entity. A knowledge graph is a structured knowledge repository for storing and organizing statements about entities. READ MORE
SLT Knowledge Graph in the Linked Open Data Cloud.
5. From answering questions to making it all happen: introducing Schema Actions
We use nodes and edges in the graph to help search engines and virtual assistants answer specific questions like “Where can I find a camping site with a sauna close to a ski resort in Germany?”. These are informational intents that can be covered by providing structured data using the schema.org vocabulary to describe entities.
In 2014 Schema.org, the consortium created by the search engines to build a common vocabulary introduced a new extension called Actions. The purpose of Schema Actions is to go beyond the static description of entities – people, places, hotels, restaurants, … and to describe the actions that can be invoked (or have been invoked) using these entities.
In the context of the knowledge graph for a travel brand, we’re starting to use Schema Actions to let search engines and virtual assistants know what is the URL to be used for booking a specific hotel.
Here is an example of the JSON-LD code injected in the page of a camping village providing the indication of the URL that can be used on the different devices (see the attribute actionPlatform) to initiate the booking process.
As we’re continuing to explore new ways to collect, improve and reuse the information in the knowledge bases we are building with our clients in the travel industry, a new landscape of applications is emerging. Data is playing a pivotal role in the era of personal assistants, content recommendations and entity-oriented search. We are focusing on making knowledge as explicit as possible inside these organizations, to help searchers traverse it in a meaningful way.
The semantic web is a branch of artificial intelligence specifically designed to transfer human knowledge to machines. Human knowledge, in the travel sector, is really what creates a concrete business value for the travelers.
When planning for a next vacation we are constantly looking for something new, sometimes even unusual, but at the same time we need full reliability and we want to complete the planning and booking process in the best possible way, and with the least amount of effort.
For travel brands, destinations, online travel agencies, and resorts building a knowledge graph is truly the best way to improve the traveler experience, to market the travel offers and to prepare for the “AI-first world” of voice search and personal assistants.
Are you ready to build your travel-oriented knowledge graph? Contact us
Thanks to Rainer Edlinger and Martin Reichhartthat this year invited me to the Castel Kamp in Kaprun where every year the travel community from Austria, Germany, and Südtirol gathers to share their experiences, best practices and challenges in the digital marketing world. I have been also very happy to meet again Reinhard Lanner with whom I started this journey back in 2014. A great “Grazie” also to our wonderful team that is constantly working to improve our technology and to help our clients get the most out of our stack.
Feel free to connect if you have any more questions about my experience with Knowledge Graphs for your travel brand!
Linked Data is a simple way to link datasets online and is a key element in the Web of Data.
The Linked Open Data Cloud is a diagram that depicts publicly available linked datasets. The diagram is updated regularly and maintained by the Insight Center for Data Analytics (a joint initiative between researchers at Dublin City University, NUI Galway, University College Cork, University College Dublin and other partner institutions). Starting in June 2018 the first datasets created with WordLift have entered the notorious diagram.
The diagram of the LOD Cloud is published under the Creative Commons Attribution License and it is, therefore, free to use. Since the first edition back in 2007, the diagram has been widely used by researchers and academics around the world to talk about linked data in research papers, posters, and presentations.
Every day (almost) we now hear of new knowledge graphs being implemented: from Bloomberg to Thomson Reuters, from Amazon to AirB&B and it is not just multinational organizations and top-notch publishers, research institutions, governments, libraries and universities from all over the world are constantly contributing, one way or the other, to the expansion of linked data.
The exponential growth of interlinked datasets that conform to the linked data principles introduced by the inventor of the Web Sir Tim Berners-Lee is immediately visible by looking at the evolution of the Linked Open Data Cloud diagram.
Started in May 2007 with only 12 interlinked datasets, year after year, this bubble chart, where each bubble is essentially a new knowledge graph, has witnessed the expansion of the connected knowledge. In July 2018 in the LOD Cloud diagram, there are 1224 datasets (this is an x100 growth since the first edition).
Lod Cloud in 2007.
Since January 2017 and after few years during which the chart was not updated, LOD Cloud has been divided into nine subclouds each one representing a separate knowledge domain: geography, government, linguistics, life science, media, publications, social networking, user-generated and cross-domain (anything else, like DBpedia, Wikidata or datasets created with WordLift that span across multiple topics).
WordLift in the cross-domain subcloud of the LOD Cloud.
How can I contribute to the LOD Cloud?
First, you have to publish data that follows the Linked Data Principles and this means the following:
Use HTTP URIs to name *things* in your dataset so that others can look at it using the HTTP protocol
Make sure that, when looking at this URIs, with or without supporting content negotiation, these URIs resolve to RDF data (a standard format to represent interconnected data) in any of the supported format (RDFa, RDF/XML, Turtle, N-Triples)
The dataset shall contain at least 1.000 facts (these are also called triples or subject-predicate-object statements).
Much like the hypertextual web in the web of data links are essential to help us discover new things. Make sure that your dataset links to URIs of existing datasets that are already in the LOD Cloud. They require a minimum of 50 links with already existing linked datasets.
The entire dataset shall be accessible via RDF crawling, RDF dump or a more clever SPARQL end-point.
The team behind the LOD Cloud also requests you to fill up a form online and are extremely kind in letting you submit your dataset. All the diagrams, from the first edition to the latest are available on the LOD Cloud website along with the code used to generate the diagram.
WordLift Knowledge Graphs in the LOD Cloud
We are very happy that, starting with June 2018 edition of the LOD Cloud, datasets created with WordLift are part of the LOD Cloud.
As we are in the process of submitting, on regular basis – new datasets from our users – we are proud to share that the first linked datasets that have made into the diagram are from SalzburgerLand Tourismus (the organization behind the tourism in the region of Salzburg), this same blog (WordLift blog in English and in Italian), and Rainer Edlinger‘s blog “Whiskey circle”: most probably the very first Whisky-centric dataset of the LOD Cloud 🥃.
The dataset from the WordLift Blog (in English) on the LOD Cloud website.
For each dataset, the number of links to other datasets is also used for the creation of the LOD chart. As most of our dataset, the graph created from this blog is linking to DBpedia, Freebase, Geonames, Yago and Wikidata.
What are the benefits of using Linked (Open) Data?
Semantic Web technologies and Linked Data have the main goal of interconnecting existing and new data available on the web. This essentially means to break the information silos in which information is usually closed and to provide new ways of accessing, validating and using data that comes from different sources. Connected knowledge is also key to infer new knowledge. This is the reason why in 2010 it was, once again, Sir Tim Berners-Lee who identified the 5-stars linked dataprinciples to help organizations (both private and public) understand the importance of publishing linked open data online. 5-stars, in Tim Berners-Lee’s rating, are assigned to datasets published using the RDF format in Linked Open Data and that are interlinked with at least another dataset in the linked data cloud.
In WordLift we have created a workflow for online publishers, bloggers, businesses and editorial teams to democratise semantic technologies and to build knowledge graphs optimised for content publishing, search engine optimisation and semantic search.
Contact us to learn more about LOD Cloud and to start creating your own knowledge graph ⚡️
The web constantly plays a powerful role in shaping our world and is the result of an enlightened thinking. This is an article about the illuminating work of a Persian Sultan who lived in 1000 CE, the beauty of web decentralization and the magic of linked data for libraries and academics.
Tapping into the Magic of Linked Data for Libraries
It all started when Alasdair Watson posted a blog post about an illuminated manuscript that was recently made available online by the Bodleian Digital Library: the digital arm of the Bodleaian Libraries from the University of Oxford.
We have digitized our copy of The Shāhnāmah of Ibrāhīm Sulṭān, a beautifully illustrated, 60,000-verse poem that recounts an epic history of Greater Persia, from mythical beginnings until the 7th century. https://t.co/Ls4AWzi6t9pic.twitter.com/lBDxTV0hpw
At this point the conversation was becoming nerdish enough to resonate through my ears. 🤓 The noise online can be overwhelming sometimes but after so many years in the business, I am sure you, just like me, have angels by your side helping you focus the attention where it matters. Aaron is one my angels. Half librarian, half web developer, half SEO, Aaron Bradley is a dear friend and theDa Vinci of the Semantic SEO community. It was once again thanks to Aaron on Twitter that I intercepted the second blog post on how the content from the first academic was being translated into structured data using Wikidata. This means that all the findings about the illuminated manuscript, shared by Alasdair had been translated by poulterm in machine-readable form using Wikidata‘s ontology.
Just to give you an example Ibrahim Sultan was already represented in Wikidata with his own Machine ID Q3147516 but the manuscript was not when poulterm decided to add it. This gave to the persian poem a new identity in the world of data, precisely the Machine ID Q53676578.
So at this point we have:
1. a first academic reporting on a newly digitalized manuscript, 2. a second academic (from the same team) sharing the information from the first guy to 3. a broader community of geeky marketers like Aaron and myself and an army of computer robots (like WordLift), smart agents and search engine crawlers accessing the wealth of publicly available linked data published on Wikidata.
It’s already an interesting mix of linked people ready to share the epic poem of the Persian Sultan Abū l-Qāsim Firdawsī of Ṭūs – yes, the longest poem ever been written by a single person!
As Aaron was reporting the story about the blog post being turned into structured data I decided to prepare a web page on our blog about the manuscript and the sultan. I constantly run experiments to make sure we’re doing our very best to help bloggers and marketers create engaging new content that works well on Google Search.
Let's now re-use it for improving the structured data markup using @wordliftit 😀 When creating new entities we tap into @wikidata and can annotate Ibrahim Sultan. We then realized that to get to the manuscript we need to let WordLift fetch entities that are not yet on Wikipedia pic.twitter.com/rPG1WbaG2N
At that point, I realized that using WordLift (our AI to help content writers excel at SEO) it was still not possible to annotate the newly added entity of the manuscript created by poulterm. We had to get back “in the kitchen” and revise a parameter that was preventing WordLift to tap into newly created entities from Wikidata that did not have a page on Wikipedia. As unit tests where completed, we immediately released a new version of the WordLift Server (this is where the content analysis, in our semantic platform, really takes place) and finally David, our CTO, was able to show that WordLift was finally capable of detecting and interlinking Shahnamah of Ibrahim Sultan 🎉
Conclusion: the Linked Data Movement and the New Digital Disorder
The utopian dream of a curated global knowledge base where academics help us discovering and organizing new facts and where content creators, supported by agentive technologies like WordLift, share and debate about these findings is no longer a myth but an illuminating reality.
The combination of digitally savvy manual curation of experts combined with machine-generated ontologies and AIs that help us dive into this matrix is materializing in front of our eyes.
If you conduct research activities, if you believe in the open sharing of knowledge remember there is a “Linked Data Movement” there for you and the infrastructure to publish and re-use immediatly your research work.
The web is still is and will probably remain a huge mess but from this New Digital Disorder (as David Weinberger would say) so much is happening!