Memory optimization #67

Gijom · 2017-08-01T08:11:52Z

Hello, I am facing some memory problems when reading big databases (which I would like to store completly in the memory). I noticed that the wfdb functions generally return float64 numpy arrays. However, if I understood correctly from the link below, the signals are coded on maximum 32 bits.
https://www.physionet.org/physiotools/wag/signal-5.htm

Here are my questions / remarks:

does this mean it is safe to convert all signals to 32bits ?
if this is safe, would it be possible to make your functions return float32 arrays ? Or even better to return the minimum size format depending on the header format field ?

G.

cx1111 · 2017-08-01T20:16:00Z

I think the second option sounds reasonable. Can anyone think of any instances where returning numpy arrays with dtype=floatXX !=64 will do any harm with the average user's scripts?

cx1111 · 2017-08-14T20:39:55Z

I'm nearly done with this, though I have some uncertainties regarding default logic and usability.

When digital values are returned, it is easy to use the numpy dtype corresponding to the same resolution or higher, ie. int16 for fmt 16 signals. But when physical values are requested, it may be a bit more ambiguous.

Consider a fmt 16 or fmt 32 signal. Converting those to float16 and float32 will make them lose some resolution if indeed the input signal was really captured with 16 or 32 bits.

In the end, I wonder what values the returnres argument I am adding should be allowed to be. I thought of:

64, 32, 16 to be very explicit. But this is annoying because even though there is only float16+, there is int8.
64, 32, 'minres' for a more automatic case where minres makes the package decide the optimal dtype. Once again, when it comes to floats, what would be the correct automatic formats? int32 --> float64 or float32? etc.

The default will be 64 in any case. Regarding my previous comment, people could easily create overflows/underflows if they unintentionally use a low resolution dtype.

cx1111 · 2017-08-14T21:08:55Z

I've decided to allow it to be set to either 64, 32, 16 or 8. The user can mind the dtype since different cases may warrant different options.

Regarding your questions, for all formats aside from 32, it will be safe to convert to float32 with insignificant precision loss. And even with fmt 32 signals, it is unlikely that we have any (at this point in time on physionet) in which the capturing device actually recorded with 32 bit precision, so float32 would be generally safe in those cases too.

One last note, we have some ridiculously long signals in certain databases (many many days long). When processing these kinds of signals, reading them part by part will be inevitable since this package reads all values into memory.

I'll close this issue once I push the changes.

Gijom · 2017-08-16T18:29:25Z

Great, many thanks for your prompt changes!

cx1111 · 2017-08-16T21:09:01Z

Sure, no problem. Let me know if my changes actually did anything :)

cx1111 · 2017-08-31T15:18:51Z

I should go and edit the code to use np.add/subtract/divide/multiply during adc/dac conversion when only one version of the signal (digital/physical) is desired, to prevent extra memory use via temporary variable allocation.

@jshaffer94247

As discussed in #490, https://github.com/MIT-LCP/wfdb-python/blob/34b989e08435c1a82d31bdd2800c4c14147e3e93/wfdb/io/convert/csv.py#L10 currently "strips the path from the input .csv, then writes the output to .dat and .hea". It's inconvenient not to be able to specify the output directory. This pull request adds a new `output_dir` argument to the `csv_to_wfdb` function. By default `output_dir` is set to None, which will maintain backwards compatibility. Setting `output_dir` to a directory will mean that output files are saved to this directory. I have set this to a WIP, because I haven't tested the new behaviour (other than running `pytest`). @jshaffer94247, if you have an opportunity to test the fix, I'd appreciate your feedback.

@ajadczaksunriselabs

This pull request adds a changelog for `v4.2.0`. The changelog is based on the following auto-generated summary of merge commits generated by GitHub: ``` ## What's Changed * bug-fix: Numpy ValueError when cheking empty list equality by @ajadczaksunriselabs in #459 * bug-fix: Pandas set indexing error by @ajadczaksunriselabs in #460 * fix for /issues/452 by @tecamenz in #465 * Use numpydoc to render documentation by @SnoopJ in #472 * build(deps): bump readthedocs-sphinx-search from 0.1.1 to 0.3.2 in /docs by @dependabot in #477 * Update style by @bemoody in #482 * Fix NaN handling in Record.adc, and other fixes by @bemoody in #481 * Set upper bound on Numpy version (numpy = ">=1.10.1,<2.0.0"). Ref #493. by @tompollard in #494 * Update actions to use actions/checkout@v3 and actions/setup-python@v4. by @tompollard in #495 * Fix: Indent code to ensure 'j' is within for-loop in GQRS algorithm by @tompollard in #499 * Add write_dir argument to csv_to_wfdb. Fixes #67. by @tompollard in #492 * Fix warnings by @cbrnr in #502 * README improvements by @bemoody in #503 * Change in type promotion. Fixes to annotation.py by @tompollard in #506 * Use uv by @cbrnr in #504 * Change in type promotion. Fixes to _signal.py by @tompollard in #507 * Test round-trip write/read of supported binary formats by @bemoody in #509 * Corrected typo and extended allowed types for MultiSegmentRecord by @agent3gatech in #514 * Allow expanded physical signal in `calc_adc_params` by @briangow in #512 * Add capability to write signal with unique `samps_per_frame` to `wfdb.io.wrsamp` by @briangow in #510 * Fix selection of channels when converting to EDF by @SamJelfs in #519 * Change in type promotion introduced in Numpy 2.0. Fixes to edf.py. by @tompollard in #527 * Bump dependencies for NumPy 2 compatibility by @cbrnr in #511 * Bump version to v4.2.0 and update notes on creating new releases by @tompollard in #497 ## New Contributors * @ajadczaksunriselabs made their first contribution in #459 * @tecamenz made their first contribution in #465 * @SnoopJ made their first contribution in #472 * @dependabot made their first contribution in #477 * @agent3gatech made their first contribution in #514 * @SamJelfs made their first contribution in #519 **Full Changelog**: v4.1.2...v4.2.0 ```

Dubrzr assigned cx1111 Aug 1, 2017

cx1111 closed this as completed Aug 16, 2017

cx1111 reopened this Aug 31, 2017

cx1111 closed this as completed Sep 27, 2017

tompollard added a commit that referenced this issue Jul 1, 2024

Add output_dir argument to csv_to_wfdb. Fixes #67.

e54af3e

tompollard added a commit that referenced this issue Jul 1, 2024

Add output_dir argument to csv_to_wfdb. Fixes #67.

d6d5aa7

tompollard added a commit that referenced this issue Jul 1, 2024

Add output_dir argument to csv_to_wfdb. Fixes #67.

caff031

tompollard added a commit that referenced this issue Jul 3, 2024

Add output_dir argument to csv_to_wfdb. Fixes #67.

048e0e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory optimization #67

Memory optimization #67

Gijom commented Aug 1, 2017

cx1111 commented Aug 1, 2017 •

edited

Loading

cx1111 commented Aug 14, 2017 •

edited

Loading

cx1111 commented Aug 14, 2017 •

edited

Loading

Gijom commented Aug 16, 2017

cx1111 commented Aug 16, 2017

cx1111 commented Aug 31, 2017

Memory optimization #67

Memory optimization #67

Comments

Gijom commented Aug 1, 2017

cx1111 commented Aug 1, 2017 • edited Loading

cx1111 commented Aug 14, 2017 • edited Loading

cx1111 commented Aug 14, 2017 • edited Loading

Gijom commented Aug 16, 2017

cx1111 commented Aug 16, 2017

cx1111 commented Aug 31, 2017

cx1111 commented Aug 1, 2017 •

edited

Loading

cx1111 commented Aug 14, 2017 •

edited

Loading

cx1111 commented Aug 14, 2017 •

edited

Loading