-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive resource demands #37
Comments
For the record: The conversion has now finished after about an hour, with a peak memory demand of ~24 GB. |
Is this with the master branch? I made a change where speed/memory use should be much better when the '--extract' option isn't being used. |
Thanks for your response! Yes, this was using the current master at that time. By --extract do you mean --extract-private? |
Ah, sorry. I meant --embed-meta (or --dump-meta). I guess you are not using those options though. Which version of pydicom? Could you run dcm2nii on this data and check the speed and memory use? When I made the improvement for speed/memory I did some basic benchmarking and it seemed comparable to dcm2nii. |
Just started to look at this in some more depth. I missed that you are using the '-d' flag (--dump-meta) when I was looking before. This definitely does slow things down and causes increased memory use. I just redid some quick benchmarks against dcm2nii, and while the speed is similar when --embed/--dump options are not used the memory use is still quite a bit worse. I think memory use can be improved, but it would be tricky and it would cause some backwards compatibility issues for the python API. One relatively simple thing that could be done is remove any pydicom objects immediately after parsing them, and just keep the pixel array plus the extracted meta data in memory. Of course, if you are extracting almost all the meta data then memory use will still be higher than dcm2nii, unless the meta data is "simplified" (avoid storing duplicate values) on the fly. |
@Hanke - I wrote some proof-of-concept code to do faster meta data summarization on the fly, using much less memory. Could you try benchmarking this script on your large dataset: https://gist.github.com/moloney/c3b3d46383f4618ae29e It won't actually make a Nifti, but should give some idea about what the performance would be. This does require the "bitarray" package as well. |
Thanks! I started running the code. I only had to adjust the glob, as Philips DICOM filenames do not come with '.dcm' by default. While running I see:
Runtime was approximately the same as with stock dcmstack (1h:06min), the memory consumption, however, was ~800 MB -- a fraction of what it was before. I used the same DICOM tarball as for the previous test. If you like, I can give you access to a DICOM tarball of similar size. |
Thanks! If you can share a similar data set that would be very helpful. |
I spent some time looking at the data you provided offline. Here are a couple of general findings:
Also, one general comment about your data. I guess you are trying to run the whole study through at once? I highly recommend you sort files into per-series folders first and then run dcmstack on those directories. This keeps peak memory use down and it allows you to convert multiple series in parallel to decrease run time. Of course if your data is not already sorted, the total run time may not improve much... |
I am trying to convert Philips DICOMs from a ~1 hour scan. About 36k single-slice DICOMs in a single directory, several image series together. The total size of the tarball is ~850MB (~160MB gzipped). I convert via the following call:
These DICOMs have no file name extensions, hence the option.
At this point the process is running for 40 min and consumes 18GB of RAM. However, no files have been created yet, hence I assume it will keep going.
The memory consumption is more than 20 times the input data. This seems excessive. Any idea what is happening?
Thanks!
The text was updated successfully, but these errors were encountered: