Excessive resource demands #37

mih · 2015-09-05T16:42:06Z

I am trying to convert Philips DICOMs from a ~1 hour scan. About 36k single-slice DICOMs in a single directory, several image series together. The total size of the tarball is ~850MB (~160MB gzipped). I convert via the following call:

% dcmstack -v -d --dest-dir . --file-ext '' study_20...

These DICOMs have no file name extensions, hence the option.

At this point the process is running for 40 min and consumes 18GB of RAM. However, no files have been created yet, hence I assume it will keep going.

The memory consumption is more than 20 times the input data. This seems excessive. Any idea what is happening?

Thanks!

The text was updated successfully, but these errors were encountered:

mih · 2015-09-05T17:01:20Z

For the record: The conversion has now finished after about an hour, with a peak memory demand of ~24 GB.

moloney · 2015-09-09T18:25:33Z

Is this with the master branch? I made a change where speed/memory use should be much better when the '--extract' option isn't being used.

mih · 2015-09-09T18:31:39Z

Thanks for your response!

Yes, this was using the current master at that time. By --extract do you mean --extract-private?

moloney · 2015-09-09T19:03:30Z

Ah, sorry. I meant --embed-meta (or --dump-meta). I guess you are not using those options though. Which version of pydicom?

Could you run dcm2nii on this data and check the speed and memory use? When I made the improvement for speed/memory I did some basic benchmarking and it seemed comparable to dcm2nii.

moloney · 2015-09-10T21:35:28Z

Just started to look at this in some more depth. I missed that you are using the '-d' flag (--dump-meta) when I was looking before. This definitely does slow things down and causes increased memory use.

I just redid some quick benchmarks against dcm2nii, and while the speed is similar when --embed/--dump options are not used the memory use is still quite a bit worse.

I think memory use can be improved, but it would be tricky and it would cause some backwards compatibility issues for the python API. One relatively simple thing that could be done is remove any pydicom objects immediately after parsing them, and just keep the pixel array plus the extracted meta data in memory. Of course, if you are extracting almost all the meta data then memory use will still be higher than dcm2nii, unless the meta data is "simplified" (avoid storing duplicate values) on the fly.

moloney · 2015-09-25T19:37:12Z

@Hanke - I wrote some proof-of-concept code to do faster meta data summarization on the fly, using much less memory. Could you try benchmarking this script on your large dataset: https://gist.github.com/moloney/c3b3d46383f4618ae29e

It won't actually make a Nifti, but should give some idea about what the performance would be.

This does require the "bitarray" package as well.

mih · 2015-09-27T08:50:32Z

Thanks! I started running the code. I only had to adjust the glob, as Philips DICOM filenames do not come with '.dcm' by default. While running I see:

gist/faststack.py:8: UserWarning: The DICOM readers are highly experimental, unstable, and only work for Siemens time-series at the moment
Please use with caution.  We would be grateful for your help in improving them
  from nibabel.nicom import dicomwrappers

Runtime was approximately the same as with stock dcmstack (1h:06min), the memory consumption, however, was ~800 MB -- a fraction of what it was before.

I used the same DICOM tarball as for the previous test.

If you like, I can give you access to a DICOM tarball of similar size.

moloney · 2015-09-28T18:06:25Z

Thanks! If you can share a similar data set that would be very helpful.

moloney · 2015-10-01T23:40:59Z

I spent some time looking at the data you provided offline. Here are a couple of general findings:

If we really want all the meta data extracted, the only way to really speed that up more would be to use multiprocessing. Or I guess improve pydicom performance if that is possible...
Extracting less meta data speeds things up considerably. I am seeing over 3X speedup when I extract 20 specific elements rather that extracting everything.
On Phillips data, dcm2nii is incredibly fast. For Siemens data I found dcmstack to be about the same speed (or much faster in the case of mosaic data) provided the '--dump' and '--embed' options are not used. On Phillips data it looks like dcm2nii is almost 6X faster.

Also, one general comment about your data. I guess you are trying to run the whole study through at once? I highly recommend you sort files into per-series folders first and then run dcmstack on those directories. This keeps peak memory use down and it allows you to convert multiple series in parallel to decrease run time. Of course if your data is not already sorted, the total run time may not improve much...

moloney mentioned this issue Aug 9, 2021

WIP: Add memory efficient meta data summary nipy/nibabel#1030

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excessive resource demands #37

Excessive resource demands #37

mih commented Sep 5, 2015

mih commented Sep 5, 2015

moloney commented Sep 9, 2015

mih commented Sep 9, 2015

moloney commented Sep 9, 2015

moloney commented Sep 10, 2015

moloney commented Sep 25, 2015

mih commented Sep 27, 2015

moloney commented Sep 28, 2015

moloney commented Oct 1, 2015

Excessive resource demands #37

Excessive resource demands #37

Comments

mih commented Sep 5, 2015

mih commented Sep 5, 2015

moloney commented Sep 9, 2015

mih commented Sep 9, 2015

moloney commented Sep 9, 2015

moloney commented Sep 10, 2015

moloney commented Sep 25, 2015

mih commented Sep 27, 2015

moloney commented Sep 28, 2015

moloney commented Oct 1, 2015