Understanding International Migration Using Tensor Factorization


In this work, we explore the feasibility of using tensor decomposition, to understand trends in global human migration. Understanding international migration patterns is of great interest to demographers, social scientists and governments. The availability of large scale geo-coded social media data has made studying these migration patterns in an un-obtrusive way possible. However, getting insights from these typically large datasets is a time consuming process. In this paper, we explore if tensor factorization can help in quickly gaining insights from such large scale social media data. Our experiments on over 100M geo tagged tweets reveal interesting patterns of migration.


Using the publicly available 1% twitter stream, we first collected all geo tagged tweets.

We bootstrapped this data using tweets from individual users who posted at least two geo tagged tweets and collected more data.

Detailed description of the data collection can be found in our paper.

Since the original dataset is huge (3G compressed), you can find a sample here.

For the full dataset, please email Kiran, at kiran.garimella XatX aalto.fi

Description of the data: Each geo tagged tweet has been mapped to a potential country using the latitude and longitude. The format of the file is latitude [tab] longitude [tab] username [tab] tweet_createdAt [tab] tweetId [tab] country_code.


To visualise the biases in our data and give a sense of the migration patterns in our data, we modified the visualization by Abel et al. You can find the demo here. The demo shows two tabs (2000-2005, representing data from Abel et al.) and (2005-2010, representing our data). We can clearly see that in our data, Africa is clearly under represented.


The code used in the paper is on Github.