Named-entity recognition (NER) (also known as entity identification and entity extraction) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, places, expressions of times, quantities, monetary values, percentages and more.
Most research on NER systems starts with an unannotated block of text, such as this one: “WordLift is a plugin for WordPress” and extracting all relevant information from it:
- WordLift | schema-org:CreativeWork | http://data.redlink.io/91/be9/entity/wordlift
- Plugin | dbc:Software-add_ons | http://dbpedia.org/page/Plug-in_(computing)
- WordPress | dbc:Content_management_software | http://dbpedia.org/page/WordPress.
What is an Entity anyway?
An entity is the “thing” described in a document. An entity helps computers understand everything you know about a person, an organization or a place mentioned in a document. All these facts are organized in statements known as triples that are expressed in the form of subject, predicate, and object.
How WordLift is using Named Entity Recognition
Let’s get into more details as this is one of the key technologies of WordLift:
First and foremost Named-entity recognition (NER) uses a KB (Knowledge Base) that contains all known concepts (Named Entities) that needs to be extracted from a block of text.
WordLift derives semantic information from the user’s content by leveraging on freely available datasets such as DBpedia and the user’s local vocabulary.
As new concepts are added in the local vocabulary, WordLift learns the knowledge domain of the user and improve its understanding of the content.
WordLift uses a sophisticated ‘name-entity disambiguation‘ (NED) mechanism to correctly detected locations, company and people to unique “instances” in the web of data.
During the extraction phase low level NLP functions take place including POS (part of speech) tagging, tokenisation, sentence boundary detection, capitalization rules and in-document coreference.
As a result of the extraction, WordLift proposes to the user a set of candidate kb entities for a mention.
Learn more about Natural Language Processing
Natural language processing (or NLP) is a field of computer science, artificial intelligence, and linguistics that has to do with the interactions between computers and humans using natural languages. As such, NLP is related to the area of human-computer interaction. Many challenges in NLP involve natural language understanding — that is, enabling computers to derive meaning from human or natural language input.
If you want to know more about keyword research using AI, see our web story.