Skip to content

Commit

Permalink
Add Weather Prediction dataset.
Browse files Browse the repository at this point in the history
  • Loading branch information
carschno committed Jan 14, 2025
1 parent aea49c2 commit ce7d8cc
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 0 deletions.
15 changes: 15 additions & 0 deletions paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,21 @@ @misc{horst_allisonhorstpalmerpenguins_2020
doi = {10.5281/zenodo.3960218}
}

@misc{huber_weather_2022,
title = {Weather prediction dataset},
copyright = {Creative Commons Attribution 4.0 International, Open Access},
url = {https://zenodo.org/record/4770936},
doi = {10.5281/ZENODO.4770936},
abstract = {Dataset created for machine learning and deep learning training and teaching purposes.{\textless}br{\textgreater} It can, for instance, be used for classification, regression, and forecasting tasks.{\textless}br{\textgreater} Complex enough to demonstrate realistic issues such as overfitting and unbalanced data, while still remaining intuitively accessible. {\textless}strong{\textgreater}Description and units of weather features:{\textless}/strong{\textgreater} Data includes the following features/variables for several European cities: Feature (type) Column name Description Physical Unit mean temperature \_temp\_mean mean daily temperature in 1 °C max temperature \_temp\_max max daily temperature in 1 °C min temperature \_temp\_min min daily temperature in 1 °C cloud\_cover \_cloud\_cover cloud cover oktas global\_radiation \_global\_radiation global radiation in 100 W/m2 humidity \_humidity humidity in 1 \% pressure \_pressure pressure in 1000 hPa precipitation \_precipitation daily precipitation in 10 mm sunshine \_sunshine sunshine hours in 0.1 hours wind\_speed \_wind\_gust wind gust in 1 m/s wind\_gust \_wind\_speed wind speed in 1 m/s {\textless}strong{\textgreater}File descriptions{\textless}/strong{\textgreater} {\textless}code{\textgreater}weather\_prediction\_dataset.csv{\textless}/code{\textgreater} - Main data file, tabular data, comma-separated CSV. Contains the data for different weather features (daily observations, see below for more details) for 18 European cities or places through the years 2000 to 2010. {\textless}code{\textgreater}weather\_prediction\_picnic\_labels.csv{\textless}/code{\textgreater} - Optional data to be used as potential labels for classification tasks. Contains booleans to characterize the daily weather conditions as suitable for a picnic (True) or not (False) for all 18 locations in the dataset. {\textless}code{\textgreater}weather\_prediction\_dataset\_map.png{\textless}/code{\textgreater}- Simple map showing all 18 locations in Europe. {\textless}code{\textgreater}metadata.txt{\textless}/code{\textgreater} - Further information on the dataset, the data processing, and conversion, as well as the description and units of all weather features. ORIGINAL DATA TAKEN FROM: EUROPEAN CLIMATE ASSESSMENT \& DATASET (ECA\&D), file created on 22-04-2021{\textless}br{\textgreater} THESE DATA CAN BE USED FREELY PROVIDED THAT THE FOLLOWING SOURCE IS ACKNOWLEDGED: Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface{\textless}br{\textgreater} air temperature and precipitation series for the European Climate Assessment.{\textless}br{\textgreater} Int. J. of Climatol., 22, 1441-1453.{\textless}br{\textgreater} Data and metadata available at http://www.ecad.eu For more information see metadata.txt file.{\textless}br{\textgreater} The dataset has also been presented at the Teaching Machine Learning Workshop at ECML 2022: https://teaching-ml.github.io/2022/. The Python code used to create the weather prediction dataset from the ECA\&D data can be found on GitHub: https://github.com/florian-huber/weather\_prediction\_dataset{\textless}br{\textgreater} (this repository also contains Jupyter notebooks with teaching examples) Versions: {\textless}strong{\textgreater}v5{\textless}/strong{\textgreater}: updated metadata.txt file. {\textless}strong{\textgreater}v4{\textless}/strong{\textgreater}: to be more future proof in times of climate change/crisis --\> "BBQ weather" prediction is now "picnic weather" prediction. Data itself remains unchanged. {\textless}strong{\textgreater}v3{\textless}/strong{\textgreater}: added "light" version of the dataset with less features (only 11 locations and fewer variables, reduction from 163 to 89 features) --\> This is meant to be used if training times for hands-on session is becoming an issues {\textless}strong{\textgreater}v2{\textless}/strong{\textgreater}: now also contains additional `BBQ\_weather` labels, the dataset itself has not changed between versions v1 and v2},
language = {en},
urldate = {2025-01-14},
publisher = {Zenodo},
author = {Huber, Florian and van Kuppevelt, Dafne and Steinbach, Peter and Sauze, Colin and Liu, Yang and Weel, Berend},
month = sep,
year = {2022},
keywords = {machine learning, deep learning, training data, teaching material},
}

@article{gaviria_rojas_dollar_2022,
title = {The {Dollar} {Street} {Dataset}: {Images} {Representing} the {Geographic} and {Socioeconomic} {Diversity} of the {World}},
volume = {35},
Expand Down
1 change: 1 addition & 0 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ such as convolutional layers.
We use data with permissive licenses and designed for real world use cases:

- The Penguin dataset (@horst_allisonhorstpalmerpenguins_2020)
- The Weather prediction dataset (@huber_weather_2022)
- The Dollar Street Dataset (@gaviria_rojas_dollar_2022) is representative and contains accurate demographic information to ensure their robustness and fairness, especially for smaller subpopulations.

# Statement of Need
Expand Down

0 comments on commit ce7d8cc

Please sign in to comment.