PyDmed (Python Dataloader for Medical Imaging)

The loading speed of hard drives is well below the processing speed of modern GPUs. This is problematic for machine learning algorithms, specially for medical imaging datasets with large instances.

For example, consider the following case: we have a dataset containing 500 whole-slide-images (WSIs) each of which are approximately 100000x100000. We want the dataloader to repeatedly do the following steps:

randomly select one of those huge images (i.e., WSIs).
crop and return a random 224x224 patch from the huge image.

PyDmed solves this issue.

How It Works?

The following two classes are pretty much the whole API of PyDmed.

BigChunk: a relatively big chunk from a patient. It can be, e.g., a 5000x5000 patch from a huge whole-slide-image.
SmallChunk: a small data chunk collected from a big chunk. It can be, e.g., a 224x224 patch cropped from a 5000x5000 big chunk. In the below figure, SmallChunks are the blue small patches.

The below figure illustrates the idea of PyDmed. As long as some BigChunks are loaded into RAM, we can quickly collect some SmallChunks and pass them to GPU(s). As illustrated below, BigChunks are loaded/replaced from disk time to time.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
pydmed		pydmed
sample_notebooks		sample_notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
button_quickstart.png		button_quickstart.png
button_quickstart2.png		button_quickstart2.png
howitworks.gif		howitworks.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyDmed (Python Dataloader for Medical Imaging)

How It Works?

About

Releases

Packages

Languages

License

yuchen2580/PyDmed

Folders and files

Latest commit

History

Repository files navigation

PyDmed (Python Dataloader for Medical Imaging)

How It Works?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages