Knowledge validation is an important task in the curation of knowledge graphs. Validation (also known as fact-checking) aims to evaluate whether a statement in a knowledge graph (KG) is semantically correct and corresponds to the so-called “real” world.
In this article, we describe our proposed approach for validating knowledge graphs (KGs).
Nowadays, various KGs have been created. For instance, DBpedia and Wikidata KGs have been created semi-automatically and through crowdsourcing respectively. However, those KGs might have errors, duplicates, or missing values. Moreover, KGs are used in search engines and personal assistants applications. For example, Google shows the fact “Gartenhotel Maria Theresxia GmbH’s phone is 05223563130” that might be wrong to advertise because the main switchboard of the hotel can be reached under “0522356313”. See Fig. 1.
Fig. 1: Knowledge Panel of Garten hotel.
Apart from KGs mentioned above, there are various knowledge sources that collect and store content and data of the Web. For instance, Google places is a service offered by Google and it provides data about more than 200 million businesses, places, points of interest, and more. Furthermore, there are other knowledge sources, such as Yandex Places or Open Street Maps (OSM) that provide data about places, restaurants, point of interests, and much more. Those knowledge sources can be used to validate statements of KGs.
To ensure that a KG is of a high quality, one important task is knowledge validation: it measures the degree to which statements are correct. For instance, knowledge validation measures to which degree information about a hotel (e.g. Hotel Alpenhof’s phone number is +4352878550) is correct based on comparing information of the same hotel in a set of knowledge sources. See Fig. 2.
Fig. 2: Validating information about the Hotel Alpenhof.
KG validation carried out by human experts is a time-intensive and non-scaleable task. Therefore, an approach to validate data semi-automatically is needed. There are some approaches that can help with this process, see Fig. 3.
Further, we present our approach to validate KGs based on weighted sources.
The process overview of the Validator is shown in Fig. 4. At first, users have to set their KG as input to the Validator. Internally, the Validator retrieves data from external knowledge sources (e.g., Google Places, Yandex Places), the Validator performs instance matching to identify which instances (e.g., hotels) refer to the same “real-world” entity, for example, it retrieves Alpenhof hotel from external sources.
Fig. 4: KG Validation process overview.
Later on, the instance scoring process is triggered and the properties (e.g., name, phone number, address) of the same instances are compared with each other. For example, the address of Alpenhof Hotel from the user’s KG is compared with the address value of the Alpenhof Hotel from Google Places, Yandex Places, and so on. Note that each external source is weighted according to its importance: a user can set the weight of (his or her personal) importance for Google Places, Yandex Places, and so on. 0 is the minimum degree of importance and a value of 1 is the maximum degree.
Fig. 5: Screenshot of the Validator [Huaman et al., 2021].
Finally, the computed scores are shown via a graphical user interface (see Fig. 5). It allows users to select multiple properties (e.g. address, name, and/or phone number) to be validated, users can assign/change weights to external sources.
Full details about the approach, implementation, and evaluation are provided at [Huaman et al., 2021].
Where can you use it?
- A user can check whether the information for his or her hotel or other business provided by Google Places, Yandex Places, and other sources are correct and up-to-date.
- A user could want to validate if the provided information is correct based on different knowledge sources.
- The approach might be used to link user’s KG with Google Places, Yandex Places, and so on.
Tools and Technology
The Validator was developed by three students from the University of Innsbruck, in the context of the Eurostars co-funded project WordLift Next Generation.
- Conceptualization of a new KG Validation approach and a first prototypical implementation thereof.
- Showcasing use cases where it can be used.
- Validating KGs ensures that major search engines show the correct information about your business.
[Huaman et al., 2021] Huaman E., Tauqeer A., Fensel A. (2021) Towards Knowledge Graphs Validation Through Weighted Knowledge Sources. In: Villazón-Terrazas B., Ortiz-Rodríguez F., Tiwari S., Goyal A., Jabbar M. (eds) Knowledge Graphs and Semantic Web. KGSWC 2021. Communications in Computer and Information Science, vol 1459. Springer, Cham. https://doi.org/10.1007/978-3-030-91305-2_4
[Huaman et al., 2020] Huaman E., Kärle E, and Fensel D. (2020) Knowledge Graph Validation https://arxiv.org/abs/2005.01389