Friday, November 3, 2017
11 AM - 12 PM
In natural language processing, the summarization of information in a large amount of text has typically been viewed as a type of natural language generation problem, e.g. "produce a 250 word summary of some documents based on some input query". An alternative view, which will be the focus of this talk, is to use natural language parsing to extract facts from a collection of documents and then use information visualization to provide an interactive summarization of these facts.
The first step is to extract detailed facts about events from natural language text using a predicate-centered view of events (who did what to whom, when and how). We exploit semantic roles in order to create a predicate-centric ontology for entities which is used to create a knowledge base of facts about entities and their relationship with other entities.
The next step is to use information visualization to provide a summarization of the facts in this automatically extracted knowledge base. The user can interact with the visualization to find summaries that have different granularities. This enables the discovery of extremely uncommon facts easily.
We have used this methodology to build an interactive visualization of events in human history by machine reading Wikipedia articles. I will demo the visualization and describe the results of a user study that evaluates this interactive visualization for a summarization task.
Anoop Sarkar is a Professor at Simon Fraser University in British Columbia, Canada where he co-directs the Natural Language Laboratory (http://natlang.cs.sfu.ca). His research is focused on machine learning approaches to multilingual natural language processing.
Dr. Joyce Chai