DEPRECATED: Go to github.com/GoogleCloudPlatform/generative-ai/language/use-cases/document-summarization
ScisummNet - Scientific Article Summarization Dataset
- Google Cloud Storage Bucket:
gs://cloud-samples-data/documentai/ScisummNet
pdf
- Original PDF files of papers from ACL Anthologysummary_txt
- Human-written summaries of papersjson
- Contains Document.json files processed by the Document AI OCR Processorfull_txt
- Contains Full OCR-Extracted Text from each Document extracted from Document.json files