Accurate detection and classification of pulmonary malignancies are crucial for early diagnosis, treatment planning, and patient prognosis. Traditional histopathological analysis is time-consuming and labor-intensive, which limits its feasibility in clinical applications. To address this issue, we present a dataset containing 691 high-resolution (1200×1600 pixels) histopathological images of lung tissue, covering adenocarcinoma, squamous cell carcinoma, and normal tissue from 45 patients. These images are categorized into three differentiation levels for the two pathological types: well-differentiated, moderately differentiated, and poorly differentiated, resulting in seven classification categories. The dataset includes images at 20x and 40x magnification, reflecting real clinical diversity. We evaluated image classification using deep neural networks and multiple-instance learning approaches. Each approach was used to classify the 20x and 40x magnified images into three superclasses. Depending on the method and resolution, we achieved an accuracy of 81% to 92%, demonstrating the utility of the dataset.
Dimensions | Modality | Task Type | Anatomical Structures | Anatomical Area | Number of Categories | Data Volume | File Format |
---|---|---|---|---|---|---|---|
2D | Pathological Images | Classification | Adenocarcinoma, Squamous Cell Carcinoma, Normal Tissue | Lung | 7 | 691 | JPG |
Dataset Statistics | size |
---|---|
min | (1200, 1600) |
median | (1200, 1600) |
max | (1200, 1600) |
Class | Differentiation Level |
---|---|
Adenocarcinomas | Well-differentiated adenocarcinoma |
Moderately differentiated adenocarcinoma | |
Poorly differentiated adenocarcinoma | |
Pulmonary squamous cell carcinoma | Well differentiated |
Moderately differentiated | |
Poorly differentiated | |
Normal | Normal |
Figure 1. Images of adenocarcinomas showing different degrees of differentiation and resolution.
Figure 2. Images of squamous cell carcinoma showing different degrees of differentiation and resolution.
Figure 3. Normal lung tissue images at different resolutions.
LungHist700/
├── images
│ ├── aca_bd
│ ├── aca_md
│ ├── aca_pd
│ ├── nor
│ ├── scc_bd
│ ├── scc_md
│ ├── scc_pd
├── data.csv
Jorge Diosdado (Dept. of Mathematics and Computer Science, University of Barcelona, Barcelona, Spain)
Pere Gilabert (Dept. of Mathematics and Computer Science, University of Barcelona, Barcelona, Spain)
Santi Seguí (Dept. of Mathematics and Computer Science, University of Barcelona, Barcelona, Spain)
Henar Borrego (University Clinical Hospital of Valladolid, Valladolid, Spain)
Official Website: https://www.nature.com/articles/s41597-024-03944-3
Article Address: https://www.nature.com/articles/s41597-024-03944-3
Publication Date: 2024-10
@ARTICLE{Diosdado2024-zg,
title = "{LungHist700}: A dataset of histological images for deep learning in pulmonary pathology",
author = "Diosdado, Jorge and Gilabert, Pere and Segu{\'\i}, Santi and Borrego, Henar",
journal = "Scientific Data",
volume = 11,
number = 1,
pages = "1088",
month = oct,
year = 2024
}
Original introduction article is here.