I have found these dataset in research papers.
-
Coil-20
http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php
-
STL-10: Self-taught learning
-
MS COCO
-
US Post Office Zip Code Data
https://web.stanford.edu/~hastie/StatLearnSparsity_files/DATA/zipcode.html
-
Google Conceptual Caption dataset
-
Visual Storytelling Dataset (VIST)
-
NVIDIA food Image classification
-
CIFAR-10, CIFAR-100
-
Large-scale CelebFaces Attributes (CelebA) Dataset
-
Street View House Numbers (SVHN)
-
MNIST
-
Facial Database
-
Labeled Faces in the Wild
-
Simple Vector Drawing Datasets
-
Places2 (공간 사진, 정보 데이터)
-
Yelp dataset (식당 정보, 사진)
-
DeepFashion
-
Image to Latex (수식 이미지를 latex 코드로 만드는 데이터셋입니다.)
-
NIST Dataset(Fingerprint, Mugshot, OCR)
-
Biometics ideal test dataset(Iris, Fingerprint, Face, palmprint, handwriting etc. - 로그인 필요!)
-
PASCAL 2012 Dataset (Classification & Detection)
http://host.robots.ox.ac.uk/pascal/VOC/voc2012/index.html#data
-
Flickr Image Dataset
http://www.robots.ox.ac.uk/~vgg/data/oxbuildings/flickr100k.html
-
Stanford dogs dataset
-
CUB-200 dataset (birds)
-
Facial beauty score dataset
https://github.com/HCIILAB/SCUT-FBP5500-Database-Release
-
Tumblr GIF dataset
https://www.kaggle.com/raingo/tumblr-gif-description-dataset
-
Totally looks like dataset
-
CAISA WebFace databaset
http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html
-
Labeled Faces in the Wild Home
-
Behance Artistic Media Dataset
-
Handwriting databaset
http://www.fki.inf.unibe.ch/databases/iam-handwriting-database
-
ImageCLEF dataset - Cross language image retrieval task
-
Yale-b - The extended Yale face database
-
Visual Relationship Detection dataset
-
Visual Genome dataset
-
Oxford-102 dataset (Flower)
-
UCSD Pedestrian dataset (video anomaly detection)
-
Lung cancer dataset
-
Brain tumor dataset
-
Breast cancer dataset (kaggle)
-
The cancer image archive
-
Mammograpy dataset
-
Bio Image Dataset @ IIIT Delhi
-
CAMELYON 16 - metatstasis detection in lymph node
-
CAMELYON17 Dataset https://camelyon17.grand-challenge.org/
-
YouTube-BoundingBoxes Dataset
-
Youtube-8M Dataset
-
The Kinetics Human Action Video Dataset
https://deepmind.com/research/open-source/open-source-datasets/kinetics/
-
Announcing AVA: A Finely Labeled Video Dataset for Human Action Understanding
https://research.googleblog.com/2017/10/announcing-ava-finely-labeled-video.html?m=1
-
Microsoft Kinect dataset
https://www.microsoft.com/en-us/download/details.aspx?id=52283
-
StatMT(Machine Translation, summarization 등의 태스크를 위한 데이터셋으로 나라-나라 쌍의 데이터셋입니다.)
http://www.statmt.org/wmt14/translation-task.html
http://www.statmt.org/wmt15/translation-task.html
-
UN parallel Corpus
-
IWSLT Dataset (including TED Translation)
-
The Stacks Project(대수기하학 책의 원본과 latex 코드 pair set?)
-
Google sentence compression(Google에서 문장을 정형화 한 데이터입니다.)
http://storage.googleapis.com/sentencecomp/compression-data.json
-
조선왕조실록(한글/한문 번역)
-
OpenSubtitles
-
20 Newsgroups
-
Reuter dataset
https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection
-
SNLI(Stanford Natural Language Inference) dataset
-
Tweet data, a subset of TREC 2011 microblog track
-
Title data, including news titles with class labels from some news websites
-
Italia earthquake twitter dataset
-
Paraphrase database
-
bAbI dataset (Facebook Question Answering)
-
Question/Answering(빈칸추론문제) pairs using CNN/Daily Mail articles
-
Stanford Question Answering Dataset
-
Korean Squad dataset
-
RACE Reading Comprehension datraset
-
GLUE (General Language Understanding Evaluation) benchmark dataset
-
ClueWeb12 dataset (information retrieval)
-
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
-
WikiReading dataset
https://github.com/google-research-datasets/wiki-reading
-
SEMPRE: Semantic Parsing with Execution
-
Dialogue system datasets
-
WikiSQL dataset
-
SynthText dataset
-
Cornell Movie dialogue corpus
http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html
-
Word2Vec에 쓰인 데이터셋(위키피디아, WMT11 등) https://code.google.com/archive/p/word2vec/
-
Fast Text pre-trained vector set
https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
-
Stanford Sentiment Treebank(SST)
-
Multi-Domain Sentiment Dataset
-
Visual sentiment ontology
http://www.ee.columbia.edu/ln/dvmm/vso/download/flickr_dataset.html
-
Radboud Face Database (rbfd)
-
Aspect sentiment analysis with aspect category https://github.com/hsqmlzno1/MGAN
-
Common Crawl dataset
-
Nottingham music dataset
-
A large-scale dataset of manually annotated audio events (Google research)
-
Speech Command Dataset
https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
-
Mozilla DeepSpeech
-
Freebase
-
Wordnet
-
Microsoft Concept Graph
-
DBPedia Dataset
The DBpedia data set uses a large multi-domain ontology which has been derived from Wikipedia as well as localized versions of DBpedia in more than 100 languages.
http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets
-
Yago
YAGO3 is a huge semantic knowledge base, derived from Wikipedia WordNet and GeoNames.
-
Google Knowledge graph API
-
AMiner - Datasets for social network Analysis
-
Netflix Prize Data Set
http://academictorrents.com/details/9b13183dc4d60676b773c9e2cd6de5e5542cee9a
-
논문 bibliography 데이터셋, Author Citation Networks
-
Politics sub redit
-
Amazon dataset
-
Twitter Spammer network
-
Twitter tweets
-
Online reviews
-
Rumor Detection dataset
https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/BFGAVZ
-
MovieLens
-
CiteULike
-
LastFM - Music, network dataset
-
Delicious Bookmarks (with other datasets)
-
Check-in dataset
https://sites.google.com/site/yangdingqi/home/foursquare-dataset
-
social event detection 2014 dataset
-
Kaggle fake news dataset
-
Facebook fact check
https://github.com/BuzzFeedNews/2016-10-facebook-fact-check
-
FakeNewsNet dataset
-
FakeNews corpus
-
Liar dataset
-
Word2Vect
-
GloVe
-
FastText
https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
-
Havard Dataverse
-
SKT Bigdata hub
-
ETRI 말뭉치
-
Titanic survivors dataset
-
Obama’s political speeches
-
Yahoo Finance dataset
-
Linux code
-
NYC Taxi dataset
http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml
-
US Census dataset
https://www.census.gov/topics/income-poverty/income/data/datasets.html