Exploratory Relation Extraction in Large Text Corpora

Alan Akbik, Thilo Michael, and Christoph Boden

Exploratory Relation Extraction in Large Text Corpora

The 25th International Conference on Computational Linguistics (COLING 2014)
Dublin, Ireland, August 23-29, 2014

Abstract:

In this paper, we propose and demonstrate Exploratory Relation Extraction (ERE), a novel approach to identifying and extracting relations from large text corpora based on user-driven and data-guided incremental exploration. We draw upon ideas from the information seeking paradigm of Exploratory Search (ES) to enable an exploration process in which users begin with a vaguely defined information need and progressively sharpen their definition of extraction tasks as they identify relations of interest in the underlying data. This process extends the application of Relation Extraction to use cases characterized by imprecise information needs and uncertainty regarding the information content of available data. We present an interactive workflow that allows users to build extractors based on entity types and human-readable extraction patterns derived from subtrees in dependency trees. In order to evaluate the viability of our approach on large text corpora, we conduct experiments on a dataset of over 160 million sentences with mentions of over 6 million Freebase entities extracted from the ClueWeb09 corpus. Our experiments indicate that even non-expert users can intuitively use our approach to identify relations and create high precision extractors with minimal effort. This work is made publicly available at http://130.149.21.47/.

Download: Link

X