Twitter's terms of service prevents sharing of large scale Twitter corpora. Instead, we share the 1000-dimensional PCA vectors produced for each user's tweet and network views. These embeddings can be used in place of the user data to reproduce our methods and to compare new methods against our work.
File: user_6views_tfidf_pcaEmbeddings_userTweets+networks.tsv.gz (1.4 GB)
Vector dimensions are sorted in order of decreasing variance, so evaluating a 50-dimensional PCA vector means just using the first 50 values in each view.
For details on how the data was generated, or to reference them in your work, use:
Adrian Benton, Raman Arora, and Mark Dredze. Learning Multiview Representations of Twitter Users. Association for Computational Linguistics (ACL), 2016.
Direct your questions or comments to:
adrian dot author1_surname at gmail dot com