Publications from 2018

  • Convolutions Are All You Need (For Classifying Character Sequences)

    While recurrent neural networks (RNNs) are widely used for text classification, they demonstrate poor performance and slow convergence when trained on long sequences. When text is modeled as characters instead of words, the longer sequences make RNNs a poor choice. Convolutional neural networks (CNNs), although somewhat less ubiquitous than RNNs, have an internal structure more appropriate for long-distance character dependencies. To better understand how CNNs and RNNs differ in handling long sequences, we use them for text classification tasks in several character-level social media datasets. The CNN models vastly outperform the RNN models in our experiments, suggesting that CNNs are superior to RNNs at learning to classify character-level data.

    Zach Wood-Doughty , Nicholas Andrews , Mark Dredze

    Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, 2018

    PDF BibTeX

    #social_media #efficiency

  • Predicting Twitter User Demographics from Names Alone

    Social media analysis frequently requires tools that can automatically infer demographics to contextualize trends. These tools often require hundreds of user-authored messages for each user, which may be prohibitive to obtain when analyzing millions of users. We explore character-level neural models that learn a representation of a user's name and screen name to predict gender and ethnicity, allowing for demographic inference with minimal data. We release trained models1 which may enable new demographic analyses that would otherwise require enormous amounts of data collection

    Zach Wood-Doughty , Nicholas Andrews , Rebecca Marvin , Mark Dredze

    Proceedings of the Second Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media, 2018

    PDF BibTeX

    #social_media

Back to all publications