Finding Similar Entities across Knowledge Graphs

by Sare Aghaei | 3 June 2021 | The Project

With the rise of knowledge graphs (KGs), interlinking KGs has attracted a lot of attention.
Finding similar entities among KGs plays an essential role in knowledge integration and KG connection. It can help end-users and search engines more effectively and easily access pertinent information across KGs.

In this blog post, we introduce a new research paper and the approach that we are experimenting with within the context of Long-tail SEO.

Long-tail SEO

One of the goals that we have for WordLift NG is to create the technology required for helping editors go after long-tail search intents. Long-tail queries are search terms that tend to have lower search volume and competition rate, as well as a higher conversion rate. Let me give you an example: “ski touring” is a query that we can intercept with a page like this one (or with a similar page). Our goal is twofold:

helping the team at SalzburgerLand Tourismus (the use-case partner of our project) expand on their existing positioning on Google by supporting them in finding long-tail queries;
helping them enrich their existing website with content that matches that long-tail query and that can rank on Google.

In order to facilitate the creation of new content we proceed as follows:

analyze the entities behind the top results that Google proposes (in a given country and language) for a given query.
find a match with similar entities on the local KG of the client.

To achieve the first objective WordLift has created an API (called long-tail) that will analyze the top results and extract a short summary as well as the main entities behind each of the first results.

Now given a query entity in one KG (let’s say DBpedia), we intend to propose an approach to find the most similar entity in another KG (the graph created by WordLift on the client’s website) as illustrated in Figure 1.

Figure 1. The interlinking problem over the knowledge graphs

The main idea is to leverage graph embedding, clustering, regression and sentence embedding as shown in Figure 2.

In our proposed approach, RDF2Vec technique has been employed to generate vector representations of all entities of the second KG and then the vectors have been clustered based on cosine similarity using K medoids algorithm. Then, an artificial neural network with multilayer perceptron topology has been used as a regression model to predict the corresponding vector in the second knowledge graph for a given vector from the first knowledge graph. After determining the cluster of the predicated vector, the entities of the detected cluster are ranked through the sentence-BERT method and finally, the entity with the highest rank is chosen as the most similar one. If you are interested in our work, we strongly recommend you to read the published paper.

Conclusions and future work

To sum up, the proposed approach to find the most similar entity from a local KG with a given entity from another KG, includes four steps: graph embedding, clustering, regression and ranking. In order to evaluate the approach presented, the DBPedia and SalzburgerLand KGs have been used as the KGs and the available entity pairs which have the same relation, have been considered as training data to train the regression models. The absolute error (MAE), R squared (R2) and root mean square error (RMSE) have been applied to measure the performance of the regression model. In the next step, we will show how the proposed approach leads to enriching the SalzburgerLand website when it takes the main entities from the long-tail API and finds the most similar entities in the SalzburgerLand KG.

Reference to the paper with full details: Aghaei, S., Fensel, A. „Finding Similar Entities Across Knowledge Graphs“, in Proceedings of the 7th International Conference on Advances in Computer Science and Information Technology (ACSTY 2021), Volume Editors: David C. Wyld, Dhinaharan Nagamalai, March 20-21, 2021, Vienna, Austria.