--collected by Freda Xiaoyun Yu
-
Kaggle datasets: https://www.kaggle.com/datasets The largest dataset website.
-
Google dataset search engine: https://datasetsearch.research.google.com/
-
data world: https://data.world/ We can grab datasets into our own working directory. There may be several linked datasets and maybe codes.
-
AggData – locational data: https://www.aggdata.com/
-
UK government datasets: https://data.gov.uk/
-
Google public datasets: https://www.google.com/publicdata/directory
-
US census data: https://www.data.gov/
-
??? https://www.yelp.com/dataset Not available now.
-
UCL machine learning repository: https://archive.ics.uci.edu/ml/index.php
-
Datahub, ostly financial: https://datahub.io/collections
-
NASA Earth data: https://earthdata.nasa.gov/
-
CERN partical physics datasets: http://opendata.cern.ch/
-
Global health observartory data repository: https://apps.who.int/gho/data/node.home
-
BFI film industry statistics: https://www.bfi.org.uk/industry-data-insights
-
New York taxi trip data: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page
-
FBI Crime Data Explorer: https://crime-data-explorer.fr.cloud.gov/pages/home
-
PLOS The Public Library of Science: https://plos.org/open-science/open-data/
-
Harvard’s Cultural Observatory’s Bookworm: http://bookworm.culturomics.org/
-
City of Miami open datasets: https://data.miamigov.com/
-
IGSR: The International Genome Sample Resource: https://www.internationalgenome.org/data
-
UC Irvine Mahcine Learning Repository: https://archive.ics.uci.edu/ml/datasets.php
-
TED talks: https://www.kaggle.com/datasets/ashishjangra27/ted-talks
(File location: ./D:/University-of-London-2020/CM3015-Machine-Learning-and-Neural-Networks/Reliable_big_open_datasets.doc)