Logo

(Not) Your Average Twitter User

Discovering groups of similar Twitter users based on their profiles.

About
Intro
Analysis
Conclusion

In this project, we made an attempt at discovering similar groups of Twitter users based on their usage of the platform. According to our hypotheses, it should be possible to detect 6 groups of users: incognito, newbie/inactive, retweeter, community builder, celebrity/business and bot/spammer. To test this hypothesis, we perform clustering. We augmented the existing statistics we had available on each ego by other quantities that we thought would improve clustering. While DBSCAN did not produce satisfactory results, mostly assigning a large proportions of the users as noise, $K$-means performed better. Unexpectedly, some of the features that we deemed highly relevant seem to not have made much contribution to the forming of the clusters. This may suggest that further improvements to feature engineering might be considered.

We then tried to link the hypothesised user types to the clusters. Although the results obtained did show some of the expected properties, as there are no clear boundaries between different user types on Twitter, it is very challenging to articifially force groups to be formed. In addition, our discussions are simply possible interpretations but they might not correspond to the truth. In fact, there is no ground truth in the data (hence the need for unsupervised learning), however, one might want to verify the performance by collecting labelled data. But even then, labelling may be subjective even if we had the full profiles at hand.