Extractive vs Abstractrive
Let’s have a quick look at the different methods we have for compressing a web page.
“Extractive summarization methods work by identifying important sections of the text and generating them verbatim; […] abstractive summarization methods aim at producing important material in a new way. In other words, they interpret and examine the text using advanced natural language techniques in order to generate a new shorter text that conveys the most critical information from the original text”
With simple words with extractive summarization we will use an algorithm to select and combine the most relevant sentences in a document. Using abstractive summarization methods, we will use sophisticated NLP techniques (i.e. deep neural networks) to read and understand a document in order to generate novel sentences.
In extractive methods a document can be seen as a graph where each sentence is a node and the relationships between these sentences are weighted edges. These edges can be computed by analyzing the similarity between the word-sets from each sentence. We can then use an algorithm like Page Rank (we will call it Text Rank in this context) to extract the most central sentences in our document-graph.