About

by Maria Silvia Sanna | 5 June 2020

What is WordLift New Generation?

WordLift NG aims at expanding WordLift’s market reach by creating a new platform to deliver the features offered today to WordPress users, to any websites regardless of the CMS.

We plan to improve the backend’s performance enabling enrichment and querying of RDF graphs with semantic similarity indices and full-text search allowing clients of WordLift and Redlink to:

implement semantic search and content recommendations on their website.
integrate with personal digital assistants such as Amazon Alexa.

Artificial Intelligence is shaping the online world with huge investments made by large corporations. We successfully brought these technologies to mid/small size content owners, SMEs and news publishers worldwide using WordPress.

Expanding outside WordPress

We’re now ready to expand outside of the WordPress ecosystem while adding new services such as semantic search, recommendations and conversational UIs for the Google Assistant and Alexa to help this market segment remain competitive.

The project goal is to create a scalable infrastructure that can sustain a B2B2C business model to broaden the market and create a proper distribution. In the first two years of operations, the product met a great interest from the market in areas such as SEO and digital marketing. To scale up further, a robust infrastructure, able to sustain a larger number of clients, with more complex web properties organized around a larger organization is critical.

From our experience within the digital industry, we learned a valuable lesson. Web properties are organized around a very few digital hubs. For instance, in Italy two major digital hubs (Mondadori and Triboo Media) combined control dozens of valuable web properties. Those web properties are organized around an enterprise type organization, requiring a B2B platform.

The technological development of the platform, will make the product valuable across large digital hubs, while becoming valuable also for smaller web properties. Thus, with a core B2B platform the product will be also distributed at a B2C level.

A robust platform based on a B2B2C model will allow WordLift to gain three primary strengths:

Brand recognition
Effective distribution
Network effects

As a main result and goal of the project we expect to scale up the operations.

A new infrastructure for WordLift and RedLink

WordLift uses, since its beginning, an infrastructure provided by Redlink to serve its client base with a SaaS contract in place between the two organizations. The same platform is also directly serving Redlink’s enterprise clients. By creating a new infrastructure the result is twofold: WordLift expands its B2B2C model (targeting web site owners) and Redlink grows its B2B model (targeting large enterprises such as Deutsche Bahn and RedBull).

WordLift software powered by AI and Machine Learning helps publishers and online businesses gain access and visibility via the main commercial search engines (Google, Bing, Yahoo, and Yandex). Its AI (developed with RedLink) transforms simple text in structured data, which is the language used by search engines.

Therefore WordLift has three primary target audiences:

Publishers
E-commerce websites
SEO experts and digital marketing agencies

The core business model pattern is a SaaS, with a monthly subscription. WordLift, currently available on WordPress only, will become available – with this project – on any CMS. WordLift will also be able to offer new advanced functionalities:

to help its clients improve content findability by adding semantic search capabilities to their websites
to help them engage with their audience in a screenless world using conversational user interfaces, making their content truly accessible to smart speaking devices and personal assistants

This multimodal model which relies on several revenue streams also brings a more effective distribution strategy. From the project we expect also the introduction of a Freemium version. A basic version of the tool offered for free (for both WordPress and non WordPress users), with limited features that will fuel the user growth.

These additional patterns will allow WordLift to gain further traction in three areas:

Branding: become a standard in the SEO industry
Virality: generate a continuous stream of leads with minimum marketing expenditure
Growth: scale up organic revenue growth

On the RedLink side the new platform will help the company extend the revenues generated by WordLift (that will grow its market share with this project) and will improve the revenues with its enterprise clients by offering new services such as the creation of advanced contextual chatbots and assistants.

WordLift NG combines some incremental improvements on the existing workflow for WordPress with a breakthrough new technology required to:

decouple the semantic editor from the CMS (bringing WordLift outside of WordPress)
add internal linking and widget embedding support
add support for semantic search and natural language understanding (NLU) to help users “converse” with the structured data on their websites.

At the core of the project is the implementation of a new graph database where to migrate all the existing users and data.

A new graph database

Knowledge Graphs are the core of this project and are more and more a key technology enabler for large-scale information processing systems containing massive collections of interrelated facts. Examples include the Google Knowledge Graph with over 70 billion facts (in 2016), dataCommons, DBPedia, YAGO, and Knowledge Vault, a very large scale probabilistic knowledge graph created by information extraction methods for unstructured or semi-structured information. Specifically, Knowledge Graphs provide the means of development of the newest data methods for data management, data fusion / data merging, and graph and network optimization and modeling, serving as a source of high quality data and a base for a web-scale information integration. In particular, Enterprise Knowledge Graphs help to infer new relationships out of existing facts, giving context and meaning to the content, and can be used in applications.

Creation and population of such Knowledge Graphs from the data, that is often of inferior quality and lacks sufficient context information, comprises a number of challenges. These challenges, for instance, require resolution of the needs such as duplication elimination, error correction, range prediction. They can be addressed with data analytics and machine learning techniques, as well as the human engagement, to ensure the presence of the semantics in the resulting outcome.

Further, intelligent data value chain production and consumption ecosystems require new methods for automated exchange of and reasoning about the information across systems. For example, the data generated by WordLift current AI system shall be semantically represented, shared and employed across numerous systems such as DBpedia Databus and Wikidata, taking into account their respective aims and requirements. These methods are to employ semantic technologies, which provide standards for the data production and consumption.

Last but definitely not least, the data management needs to be explainable and actionable for the end users, thus involving the communication methods, such as the voice application that will be integrated with the personal assistants, the search functionality to be embedded on the end-user website and the UIs to edit, query and maintain the data behind each account.

Natural Language Understanding

Another breakthrough is required for the creation of a content analysis processing pipeline as the main building block of the annotation and NLU modules. This defines the processing stages for the incoming text of the content whether is the text to be annotated or the interaction messages coming from the users interacting with the search or the voice app.

Most of today’s NLU modules and conversational application intercept the intent of the user and return a one-turn result. While short answers are typically valuable and generate limited friction, in order to let the audience interact with the content of a website in a screenless way we will need to implement multi-turn conversation by working on the intersection between voice user experience, data, and reasoning. This will be possible by leveraging graph technologies to power multi-turn user interactions.

The evaluations will be conducted in a threefold manner:

as lab studies (this can be done with STI students and invited users),
user meetings for use cases (SLT partners),
online (such as with invited beta testers).

Cloud Architecture: The service has to guarantee high throughput at any time required. In order to keep the service manageable resource-wise, dynamic scaling is a key factor. We aim to face this by setting up a cloud infrastructure with microservices. To prevent service outages and data loss we will minimize the single point of failure using distributed databases. The data will only be available through dedicated web services that follow high security standards. All in all this setup will allow us to hold service warranties like 99% uptime, data recovery and data security.

Information Extraction: Entity identification and classification in content is a central part for any services provided by the backend, namely semantic search, (batch) analysis and conversational interfaces. As we build on top of the existing Redlink

LP Services this task will focus on porting services to the new microservice architecture.
Knowledge Graph Construction and Evolution: Ensuring the right level of data quality is essential for SEO, and is also not trivial. The tremendous growth of knowledge graphs resulting in KGs with billions of nodes generates new challenges for the management of these huge graphs. One important challenge is the (consistent) construction and evolution of such KGs. In the past, the ontology schemas have been limited, and the ontology evolution focused on keeping of an ontology consistent by automatically generating additional updates when required. Changes to the related instances were reflected as well, but based on the hidden assumption that the rather small number of relevant instances can be changed easily. However, when managing billions of instances, this assumption is not valid anymore. Therefore, intended changes to a KG do not only have to consider the impact on the schema level, but also the effort to adapt to a huge number of instances to the new schema.
Access of Knowledge Graph Data: The knowledge graph will be accessible from outside as Linked Data as well as by a query interface. As in the current platform, the Linked Data interface will support content negotiation as well as stable links. The query interface will be implemented in a controlled type of query language which a) follows a widely known query language patterns, like e.g. GraphQL and b) reduces the function range and thus enables us to provide a better query performance and reduce the load of the internal data storage.
Semantic Search: The search uses the enhanced content to return adequate information based on user intention within a minimal response time. In addition to isolated, syntactic based search we aim to consider user centric as well as global history, semantic relations between results and as customized retrieval workflows based on users information retrieval intent. The technical challenge here is to find the best trade-off between quality and time. We face the problem by combining State of the Art retrieval technology (e.g. Apache Lucene) with high-performance NLP.
Data Migration: The current platform serves data and provide analysis services for existing clients (Wordlift and others). In order to support a smooth switch, data migration with minimal downtime is a crucial step. We face this by running both systems in parallel and prepare and migrate existing data to the new system. A switch to the new API can then be done by an offline adaption and well timed release by the consuming applications.
Conversational Interfaces: These interfaces are backend by a workflow engine that allow both, manual and rule-based conversation flow definition. It makes use of platform services for information extraction, namely classification (for intent detection of user messages) and entity detection for building workflow contexts. The service provides endpoints to connect third-party services that will be developed by WordLift.

We could bound the development of the platform to a specific public cloud infrastructure and be more tightly integrated with their APIs for information extraction, semantics, machine learning and data management. This would in theory free resources from running, maintaining and updating the infrastructure and give our development team more time to stay product focused.

While maintaining and continuously updating a physical infrastructure requires a good amount of resources in both WordLift and Redlink we have built, throughout the years, these competences along with a deep understanding of all the required open source technologies that we need to use. This allows us to run applications on our own cloud bringing to our businesses the following advantages:

We can keep control of the costs and maximize the margins. Particularly for a SaaS business like WordLift it is strategic to be as independent as possible to prevent any unforeseen cost increase and to keep good margins

We can leverage on the work we do for enterprise clients that required us to work with open source technologies and on their premises, we learn from large enterprises how to run these platform and we use these same competences to run our own infrastructure.

Most commercial services in the sector of AI are running outside the European Union which leads to unclear legal conditions. Especially when providing Redlink services to integrators from the enterprise sector this will be a major barrier. We will face this issue by building an independent service platform that can be consumed either as a service running in data centers bound to European data law or even run inside the secure infrastructure on an enterprise.

The Partners of the Consortium

WordLift NG is co-founded by the Eurostars H2020 programme and based on the collaboration between Italian and Austrian scientists and top-notch experts, to democratize the usage of Agentive SEO.

To reach this goal, WordLift is partering with leading organizations in Europe in the field of AI, NLP and Semantic Technologies and tourismus: Redlink GmbH, the Semantic Technology Institute at the Department of Computer Science of the University of Innsbruck, and SalzburgerLand Tourismus GmbH.

What is the Eurostars programme?

Eurostars is a joint programme between EUREKA and the European Commission, co-funded from the national budgets of 36 Eurostars participating states and partner countries and by the European Union through Horizon 2020.

Eurostars supports international innovative projects led by R&D-performing SMEs. With its bottom-up approach, Eurostars supports the development of rapidly marketable innovative products, processes and services that help improve the daily lives of people around the world. Eurostars has been carefully developed to meet the specific needs of SMEs.