Inferring User Gender, Age and Political Preferences using Batch Log-Linear Models


Gender Prediction: Male vs. Female

Gender accuracy results estimated over 383 users in the gender dataset: models are learned from (I) user, (II) neighbor or (II) user-neighbor tweets and compared to Zamal et al., 2012 results.

Male vs. Female top ranked features learned from user tweets.


Age Prediction: 18 - 23 y.o. vs. 25 - 30 y.o.

Age accuracy results estimated over 381 users in the age dataset: models are learned from (I) user, (II) neighbor or (II) user-neighbor tweets and contrasted to Zamal et al., 2012 results.

18 - 23 y.o. vs. 25 - 30 y.o. top ranked features learned from user tweets.


Political Preference Prediction: Democratic vs. Republican

Political preference accuracy results estimated over 371 users in the active dataset, 1031 users in the candidate-centric dataset and 270 users in the geo-centric dataset: models are learned from (I) user, (II) neighbor or (II) user-neighbor tweets.

Democratic vs. Republican top ranked features learned from user tweets.


Summary

The most predictive neighborhoods for gender, age and political preference classification: models learned exclusively from neighbor content (top) or from the combined user-neighbor content (bottom).