Data

The information on this website are distributed only for academic and research purposes.

Twitter Timeseries

Every case study in the paper (except Higgs and Unfiltered Stream) comes in two "flavors", corresponding to two alternative geographical partitions.
In general, the time series in *.dat files are represented in columns. That is, each column represents a geographic unit (city, state, country, etc); and each row corresponds to a time unit. THE EXCEPTION TO THIS IS THE HIGGS DATASET, IN WHICH TIME SERIES ARE TRANSPOSED.
In general, a "time unit" is 1 minute: every value in the time series aggregates the number of messages for a given minute. THE EXCEPTION TO THIS IS THE UNFILTERED STREAM DATASET, IN WHICH EACH ROW CORRESPONDS TO 1 SECOND.

1.    If the dataset is focused in US [Batman, Google]
1.1. The finer partition is at the level of states (50 states + DC + Puerto Rico); which implies that the file has 52 columns. They are sorted according to "usstates_order.txt".
1.2. The alternative partition is at the level of "divisions" (see US Census Bureau-designated regions and divisions: https://en.wikipedia.org/wiki/List_of_regions_of_the_United_States#Census_Bureau-designated_regions_and_divisions); which implies that the files have 9 columns. They are sorted according to the labelling in Fig S10 of the paper.

Relevant files:

  • batmandivisions-timeseries.dat
  • batmanstates-timeseries.dat
  • googledivisions-timeseries.dat
  • googlestates-timeseries.dat
  • usstates_order.txt

2.   Brazil dataset
2.1. The finer partition is at the level of basins, which is roughly equivalent to metropolitant areas (supra-city level); which implies that the file has 97 columns. They are sorted according to file "brazilbasins_order.txt"
2.2. The alternative partition is at the level of "states"; which implies that the files have 27 columns. They are sorted according to the labelling in Fig S10 of the paper.

Relevant files:
  • brazilbasins-timeseries.dat
  • brazilbasins_order.txt
  • brazilstates-timeseries.dat

3.   Spain dataset
3.1. The finer partition is at the level of metropolitan areas; which implies that the file has 57 columns. 56 correspond to metropolitan areas, the extra one ("Resto") belongs to any place in Spain not included in the previous. They are sorted according to "spainmetro_order.txt"
3.2. The alternative partition is at the level of "autonomous communities" (see Spain autonomous communities list: https://en.wikipedia.org/wiki/Ranked_lists_of_Spanish_autonomous_communities#By_population); which implies that the files have 16 columns (Extremadura, Ceuta and Melilla were left out because of lack of data from those areas). They are sorted according to the labelling in Fig S10 of the paper.

Relevant files:
  • spainmetro-timeseries.dat
  • spainmetro_order.txt
  • spaincommunities-timeseries.dat

4.  Higgs dataset:
*****************************************************
Note that, exceptionally, for this dataset rows correspond to countries, columns correspond to time)
*****************************************************
in the paper we report results for a single partition at the level of countries; which implies that the file has 61 rows (only the 61 more active countries in the event were considered). They are sorted according to "higgscountries_order.txt"

Relevant files:
  • higgscountry-timeseries.dat
  • higgscountries_order.txt

5.   Unfiltered Twitter dataset: in the paper we report results for a single partition at the level of countries; which implies that the file has 61 columns (only the 61 more active countries in the event were considered). They are sorted according to "unfilteredcountries_order.txt"

Relevant files:
  • unfilteredcountry-timeseries.dat
  • unfilteredcountries_order.txt


Source (citation)
J. Borge-Holthoefer, N. Perra, B. Gonçalves, S. González-Bailón, A. Arenas, Y. Moreno and A. Vespignani - Science Advances, Vol. 2, no. 4, e1501158, DOI: 10.1126/sciadv.1501158 (2016)