Skip to content

Analysis of changes measured on Twitter data of users reporting their infection to SARS-CoV-2

License

Notifications You must be signed in to change notification settings

digitalepidemiologylab/content_changes_paper

Folders and files

NameName
Last commit message
Last commit date
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Aug 3, 2022
Jan 17, 2023
Jul 27, 2022
Jul 27, 2022
Jul 25, 2022
Nov 23, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022
Jul 27, 2022

Repository files navigation

Dynamics of social media behavior before and after SARS-CoV-2 infection

Methods

The first task is to identify Twitter users who reported that they tested positive to Covid-19. This step is achieved with positive_filter.py(which depends on filters.py). The so-called test-positive tweets are stored under data/positive in daily Parquet files and are then grouped in a single file (data/df_positive.pkl). The Twitter timelines of the selected users are then retrieved with download_timelines.py and stored (Pickle files in data/timelines/raw). The script parse_timelines.py is then used to parse the raw timelines (in JSON Line files) and store the output data in Parquet files. The following analyses are applied to the parsed timelines:

The results of these various analyses are collected and concatenated with timeline_combine_all.py, which enables to generate user-specific files in data/language/all_timelines.

Pre/post comparisons

After the tweets of the users who reported that they tested positive to Covid-19 are processed with the various ML-based methods described above, the output files are stored in data/language/all_timelines. Individual-level pre/post comparisons related to these data are then performed with statistical_analysis.py. The collective analyses consist of Wilcoxon signed-rank tests, as detailed in wilcoxon_features.R and adjusted_pvalues.py.

It should be noted that the collective analyses should be performed after executing statistical_analysis.py since the latter script contains a few preprocessing steps required for filtering the users retained in the pre/post comparisons. More information about the output of statistical_analysis.py is provided here.

Figures

The figures shown in the article can be generated as follows:

About

Analysis of changes measured on Twitter data of users reporting their infection to SARS-CoV-2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages