Publications tagged: #information_extraction

  • Parma: A predicate argument aligner

    We introduce PARMA, a system for cross-document, semantic predicate and argument alignment. Our system combines a number of linguistic resources familiar to researchers in areas such as recognizing textual entailment and question answering, integrating them into a simple discriminative model. PARMA achieves state of the art results on an existing and a new dataset. We suggest that previous efforts have focussed on data that is biased and too easy, and we provide a more difficult dataset based on translation data with a low baseline which we beat by 17% F1.

    Travis Wolfe , Benjamin Van Durme , Mark Dredze , Nicholas Andrews , Charley Beller , Chris Callison-Burch , Jay DeYoung , Justin Snyder , Jonathan Weese , Tan Xu , others

    Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2013

    PDF BibTeX

    #information_extraction

  • Entity Clustering Across Languages

    Standard entity clustering systems commonly rely on mention (string) matching, syntactic features, and linguistic resources like English WordNet. When co-referent text mentions appear in different languages, these techniques cannot be easily applied. Consequently, we develop new methods for clustering text mentions across documents and languages simultaneously, producing cross-lingual entity clusters. Our approach extends standard clustering algorithms with cross-lingual mention and context similarity measures. Crucially, we do not assume a pre-existing entity list (knowledge base), so entity characteristics are unknown. On an Arabic-English corpus that contains seven different text genres, our best model yields a 24.3% F1 gain over the baseline.

    Spence Green , Nicholas Andrews , Matthew R. Gormley , Mark Dredze , Christopher D. Manning

    Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2012

    PDF BibTeX

    #information_extraction

  • Seeded Discovery of Base Relations in Large Corpora

    Relationship discovery is the task of identifying salient relationships between named entities in text. We propose novel approaches for two sub-tasks of the problem: identifying the entities of interest, and partitioning and describing the relations based on their semantics. In particular, we show that term frequency patterns can be used effectively instead of supervised NER, and that the p-median clustering objective function naturally uncovers relation exemplars appropriate for describing the partitioning. Furthermore, we introduce a novel application of relationship discovery: the unsupervised identification of protein-protein interaction phrases.

    Nicholas Andrews , Naren Ramakrishnan

    Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 2008

    PDF BibTeX

    #information_extraction

Back to all publications