The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users.
- Data: https://www.instacart.com/datasets/grocery-shopping-2017 or https://www.kaggle.com/c/instacart-market-basket-analysis/data
- Description: https://gist.github.com/jeremystan/c3b39d947d9b88b3ccff3147dbcf6c6b
- Backup (data and description): https://drive.google.com/drive/folders/1JC-D3vtYI6iOSGaZ9DoSXDQ4GrvMzqLL
Household level transactions over two years from a group of 2,500 households who are frequent shoppers at a retailer All of a household’s purchases within the store, not just those from a limited number of categories Demographics and direct marketing contact history for select households
- Data: https://www.dunnhumby.com/careers/engineering/sourcefiles
- Description: https://www.dunnhumby.com/careers/engineering/sourcefiles
- Backup copy (data and description): https://drive.google.com/drive/folders/1PAe62y3fgxPSgzvkMph3295Ah9WCMrhR
Historical data are provided on 250,000 borrowers.
- Data: https://www.kaggle.com/c/GiveMeSomeCredit/data
- Description: https://www.kaggle.com/c/GiveMeSomeCredit/data
- Backup copy (data and description): https://drive.google.com/drive/folders/14Ss_wSOHP8L7KmHxZelttTR6OadA2ELU
KKBOX offers subscription based music streaming service. The dataset includes user transaction and behavior features.
- Data: https://www.kaggle.com/competitions/kkbox-churn-prediction-challenge/data
- Description: https://www.kaggle.com/competitions/kkbox-churn-prediction-challenge/overview
This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil.
- Data: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
- Description: https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce
- Backup copy (data and description): https://drive.google.com/drive/folders/1vyoG0z7QpAcI4w47HD605-HA3lL45c92
This is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail.
- Data: https://archive.ics.uci.edu/static/public/352/online+retail.zip
- Description: https://archive.ics.uci.edu/dataset/352/online+retail
- Backup copy (data and description): https://drive.google.com/drive/folders/1_u0CjaTPQAh5esUW6ZCyNmvSSs_esgVp
The data represents six months of activities of a big e-commerce businesses in Europe selling all kinds of stuff such as garden tools, toys, clothes, electronics and much more.
- Data: https://recsys.yoochoose.net/challenge.html
- Description: https://recsys.yoochoose.net/challenge.html
- Backup copy (data and description): https://drive.google.com/drive/folders/1pQXY_Pl6UaLYcfvN92pqbDyk2auvyibA
MovieLens 100K movie ratings. Stable benchmark dataset. 100,000 ratings from 1000 users on 1700 movies. Released 4/1998.
- Data: https://grouplens.org/datasets/movielens/100k/
- Description: https://grouplens.org/datasets/movielens/100k/
- Backup copy (data and description): https://drive.google.com/drive/folders/1GhJGkFAwNb95Jnah6OEKJH2oDg0ls25g
MovieLens 1M movie ratings. Stable benchmark dataset. 1 million ratings from 6000 users on 4000 movies. Released 2/2003.
- Data: https://grouplens.org/datasets/movielens/1m/
- Description: https://grouplens.org/datasets/movielens/1m/
- Backup copy (data and description): https://drive.google.com/drive/folders/1GhJGkFAwNb95Jnah6OEKJH2oDg0ls25g
MovieLens 25M movie ratings. Stable benchmark dataset. 25 million ratings and one million tag applications applied to 62,000 movies by 162,000 users. Includes tag genome data with 15 million relevance scores across 1,129 tags. Released 12/2019
- Data: https://grouplens.org/datasets/movielens/25m/
- Description: https://grouplens.org/datasets/movielens/25m/
- Backup copy (data and description): https://drive.google.com/drive/folders/1GhJGkFAwNb95Jnah6OEKJH2oDg0ls25g
This dataset is created by Elo, one of the largest payment brands in Brazil. The datset contain contains up to 3 months' worth of transactions for every card.
- Data: https://www.kaggle.com/c/elo-merchant-category-recommendation/data
- Description: https://www.kaggle.com/c/elo-merchant-category-recommendation/overview
- Backup copy (data and description): https://drive.google.com/drive/folders/1HmrVX4nAT3AVD9jHIe_z-pTKn7Jh-J-h?usp=sharing
This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).
- Data: https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
- Description: https://cseweb.ucsd.edu/~jmcauley/datasets/amazon_v2/
- Backup copy (data and description), beauty and apparel categories only: https://drive.google.com/drive/folders/1-hQPQ-fmwMDQCWCYBFfx8Mlj3TgJaxYp
This dataset represents a sample of 30 days of Criteo live traffic data. Each line corresponds to one impression (a banner) that was displayed to a user. For each banner we have detailed information about the context, if it was clicked, if it led to a conversion and if it led to a conversion that was attributed to Criteo or not. Data has been sub-sampled and anonymized so as not to disclose proprietary elements.
- Data: https://s3-eu-west-1.amazonaws.com/attribution-dataset/criteo_attribution_dataset.zip
- Description: http://ailab.criteo.com/criteo-attribution-modeling-bidding-dataset/
- Backup copy (data and description): https://drive.google.com/open?id=1WY6DdbbL6nzcxLA3z3vWYAbqNXeCg9Qu
Engine degradation simulation was carried out using C-MAPSS. Four different were sets simulated under different combinations of operational conditions and fault modes. Records several sensor channels to characterize fault evolution. The data set was provided by the Prognostics CoE at NASA Ames.
- Data and descriptin: https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/#turbofan
- Backup copy (data and description): https://drive.google.com/drive/folders/1Bc4IrINVjDarZMQrSh5MzAj99k5To20F?usp=sharing
Over 5,000 clothing images of 20 different classes.
- Data: https://github.com/alexeygrigorev/clothing-dataset
- Description: https://github.com/alexeygrigorev/clothing-dataset
- Backup copy (data and description): https://drive.google.com/drive/folders/1mmNWO_1yF2cgkeAuS5BwsWJEqcmPJym5?usp=sharing
High-resolution street fashion photos with totally 59 tags. 1000+ images are with pixel-level annotations.
- Data: https://github.com/bearpaw/clothing-co-parsing
- Description: https://github.com/bearpaw/clothing-co-parsing
- Backup copy (data and description): https://drive.google.com/drive/folders/1I8yNgNsPHVC8SXUyG8PxpGsL4ut4vyVR?usp=sharing
MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection. It contains over 5000 high-resolution images divided into fifteen different object and texture categories.
- Data: https://www.mvtec.com/company/research/datasets/mvtec-ad
- Description: https://www.mvtec.com/company/research/datasets/mvtec-ad
- Backup copy (data and description): https://drive.google.com/drive/folders/1mmNWO_1yF2cgkeAuS5BwsWJEqcmPJym5?usp=sharing