missing data imputation

Goal

Use the sparse GPS data collected from NFTA buses to impute the traffic condition for the time points when no GPS data is available.

Requirements

Python 3.x
PyTorch

Plans

Data Process

PEMS

Existing Downlaoded Data

Some raw PEMS data can be found here. Download them, unzip, and put under folder data_raw/d[xx]/, where xx is the district ID in two digits.

Steps to download more raw data and sensor metadata from official website

Register if not yet (it might take some time for the new account to be approved) and sign in
Download by following these steps
1. Click on Data Clearinghouse at the bottom left of the homepage
2. To download data,
  1. on the top of the page, in the dropdown list of
    - Type: select Station 5-Minute
    - District: select target district, e.g., District 7
  2. Click Submit button
  3. In the table below the Submit button, click on the cell in the year and Month table
  4. Download data from the Available Files table
3. To downlaod metadata file, choose Station Metadata in the Type dropdown list, and then select the desired District.
Put data and meta file under folder data_raw/d[xx]/, where xx is the district ID in two digits.

Steps to process PEMS data and generate samples

Run the following commands under the root directory of this repository.

Select sensors/stations based on some rules, and calculate the distance between each pair of sensors

$ python -m scripts.select_sensors
Generate the distance matrix among selected sensors, where elements smaller than threshold are set to 0. Currently, randomly select 200 sensors.

$ python -m scripts.generate_adj_matrix
(Only run where needed) Generate graphs for different time intervals

$ python -m scripts.generate_more_graphs

Two methods to generate samples:

Method 1 (Tested and Recommended):

Generate samples from raw data without preprocessing

$ python -m scripts.generate_data_samples_from_raw

Method 2:

For each district, select data based on selected sensors and merge them together

$ python -m scripts.process_pems
Generate samples

$ python -m scripts.generate_data_samples --source_data_filename=data_raw/d07/data.npz --output_dir=data/d07

The train, val, and test files will have the following format

x: (number of samples, input length, number of nodes, number of traffic measurements)
y: (number of samples, prediction length, number of nodes, number of traffic measurements)
mask_x: has the same shape with x; mask_x[idx] == 0 means data missing at certain point during data collection, whereas mask_x[idx] == 2 means manully added missingness.
mask_y: has the same shape with y.

The names of train, val, and test data files are in format {mode}_{input length}_{predict length}_{missing rate}.npz, where mode is train, val or test, input length is the length of input sequence in terms of time interval, predict length is the length of prediction sequence, and missing rate is the missing rate in data samples.

References

KDD'14 Travel Time Estimation of a Path using Sparse Trajectories
AAAI'20 GMAN a graph multi-attention network for traffic prediction (GitHub)

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
data/d07		data/d07
data_raw/d07		data_raw/d07
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

missing data imputation

Goal

Requirements

Plans

Data Process

PEMS

Existing Downlaoded Data

Steps to download more raw data and sensor metadata from official website

Steps to process PEMS data and generate samples

References

About

Releases

Packages

Contributors 2

Languages

wdzhong/missing_data_imputation

Folders and files

Latest commit

History

Repository files navigation

missing data imputation

Goal

Requirements

Plans

Data Process

PEMS

Existing Downlaoded Data

Steps to download more raw data and sensor metadata from official website

Steps to process PEMS data and generate samples

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages