Users on social media platforms routinely interact by posting text messages, sharing images and videos, and establishing connections with other users through friending. Learning latent user representations from these observations is an important problem for marketers, public policy experts, social scientists, and computer scientists.
In this thesis, we show how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis, a set of multiview learning objectives, to learn these representations, and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can improve three additional downstream tasks: improving topic model fit, mental health status classification, and message-level stance classification.
Adrian Benton received the B.A. degree in Linguistics from the University of Pennsylvania in 2008. He received the M.S. degree in Computer Science from the University of Pennsylvania in 2012, and enrolled in the Computer Science Ph.D. program at Johns Hopkins University in 2013. He is advised by Dr. Mark Dredze. His research centers around applying machine learning techniques to analyze social media data.