Publications from 2016

  • Twitter at the Grammys: A Social Media Corpus for Entity Linking and Disambiguation

    Work on cross document coreference resolution (CDCR) has primarily focused on news articles, with little to no work for social media. Yet social media may be particularly challenging since short messages provide little context, and informal names are pervasive. We introduce a new Twitter corpus that contains entity annotations for entity clusters that supports CDCR. Our corpus draws from Twitter data surrounding the 2013 Grammy music awards ceremony, providing a large set of annotated tweets focusing on a single event. To establish a baseline we evaluate two CDCR systems and consider the performance impact of each system component. Furthermore, we augment one system to include temporal information, which can be helpful when documents (such as tweets) arrive in a specific order. Finally, we include annotations linking the entities to a knowledge base to support entity linking. Our corpus is available: https://bitbucket.org/mdredze/tgx

    Mark Dredze , Nicholas Andrews , Jay DeYoung

    Proceedings of The Fourth International Workshop on Natural Language Processing for Social Media, 2016

    PDF BibTeX

    #social_media #benchmark

Back to all publications