Causal Analysis between User Stress Level and Contextually Filtered Features Extracted from Mobile Sensor Data
- Introduction
- Paper Summary
- Repository Overview
- Data Description
- Code Structure & Workflow
- Results
- How to Cite
CausalCFF is a framework designed to analyze how high-level contextual features from mobile sensor data influence user stress levels. By applying causal inference methods (specifically Convergent Cross Mapping or CCM), the project identifies potential stress triggers that go beyond simple correlations. This repository contains all relevant code to reproduce the analysis, including data preprocessing, feature extraction, and causal inference steps.
This repository accompanies the paper: “CausalCFF – Causal Analysis between User Stress Level and Contextually Filtered Features Extracted from Mobile Sensor Data.”
- Motivation & Background: Stress is a key factor in mental well-being. Traditional sensor-based studies often examine single-sensor features, missing the influence of multi-sensor contexts.
- Methodology:
- Contextual Feature Extraction: Using association rule mining on multi-modal sensor data (location, app usage, physical activity) to extract high-level behavioral patterns.
- Causal Inference (CCM): Determining causal relationships between these contextually filtered features (CFFs) and self-reported stress levels.
- Results & Implications:
- Identified top behavior patterns (e.g., frequent workplace visits with low home time) as potential stressors.
- Emphasized personalized interventions since causal strengths vary across individuals.
- Suggested future work on sequential rule mining and improved interpretability of complex features.
The repository is structured around Jupyter notebooks and scripts that:
- Load and preprocess the dataset.
- Extract high-level contextually filtered features (CFFs).
- Perform causal analysis using Convergent Cross Mapping (CCM).
- Produce interpretable metrics and visualizations for stress analysis.
You can use this code to replicate the entire pipeline—from data preparation to final causal results.
- Original Dataset: An open-source dataset with mobile sensor data and stress self-reports from 24 university students over six weeks (https://github.com/Kaist-ICLab/DeepStress_Dataset).
- GPS Data: Categorized into home, work, or other places.
- Physical Activity: Walking, running, sitting, etc.
- App Usage: Grouped into categories like social media, productivity, entertainment.
- Self-Reported Stress: 5-point Likert scale.
Note: This repository assumes you already have access to the dataset in the correct format (e.g.,
combined.csv
,proc_interpretable_updated.pkl
). For privacy and licensing reasons, the dataset may not be included in this repository. Please refer to the dataset’s license and usage conditions in the original dataset page.
- Notebook Sections:
- Markdown cells describing the dataset and the rationale behind EDA.
- Binarizing stress labels and other label processing steps.
- Key Operations:
- Load
combined.csv
and parse timestamps. - Plot distribution of stress levels across all participants.
- Assign [uid, timestamp] as the primary index for subsequent analyses.
- Load
- Notebook Sections:
- Introduction to high-level features derived from sensor data.
- Markdown explaining how the dataset is re-organized by participant code (pcode).
- Key Operations:
- Load
proc_interpretable_updated.pkl
. - Map sensor data to each participant (P01, P02, …).
- Remove or exclude certain data types (e.g.,
APP_DUR_UNKNOWN
,LOC_DUR_others
) to focus on interpretable features.
- Load
- Notebook Sections:
- Discussion on resampling sensor data at a 1-second interval.
- Use of Ray for parallel processing.
- Key Operations:
- Identify unique user IDs.
- Create a dictionary for categorical feature distributions.
- Resample sensor readings, removing duplicates for efficiency.
- Notebook Sections:
- Association rule mining to generate contextually filtered features (CFFs).
- Explanation of sub-features based on time windows prior to each stress measurement (ESM response).
- Key Operations:
- Define window sizes (e.g., 160 minutes) and sub-windows (e.g., 8).
- Use
extract_extended_parallel()
to generate extended features for each time window. - Save the extracted CFFs (e.g., in
Features/arm/
directory).
- Notebook Sections:
- Aggregating sub-features into final feature vectors for each participant.
- Handling missing data and normalization methods.
- Key Operations:
- Load the sub-feature file (e.g.,
subfeature_160MIN_8.csv
). - Check for missing values and decide on an imputation strategy.
- Finalize the aggregated feature set to be fed into the causal inference module.
- Load the sub-feature file (e.g.,
Upon successful execution of the notebooks, you will obtain:
- Top Contextually Filtered Features (CFFs) that have a statistically significant causal relationship with stress levels (p < 0.05).
- Visualizations & Summary Tables indicating personalized variations in causal strengths.
- Insights into Behavioral Patterns, highlighting the importance of context-aware stress management strategies.
Panyu Zhang, Gyuwon Jung, Uzair Ahmed, and Uichin Lee. 2025. Causal-CFF: Causal Analysis between User Stress Level and Contextually Filtered Features Extracted from Mobile Sensor Data. In Extended Abstracts of the 2025 CHI Conference on Human Factors in Computing Systems (CHI EA ’25), April 26–May 01, 2025, Yokohama, Japan. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3706599.3719776
Questions? Feel free to open an issue or reach out with any questions!