Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory optimization #67

Closed
Gijom opened this issue Aug 1, 2017 · 6 comments
Closed

Memory optimization #67

Gijom opened this issue Aug 1, 2017 · 6 comments
Assignees

Comments

@Gijom
Copy link

Gijom commented Aug 1, 2017

Hello, I am facing some memory problems when reading big databases (which I would like to store completly in the memory). I noticed that the wfdb functions generally return float64 numpy arrays. However, if I understood correctly from the link below, the signals are coded on maximum 32 bits.
https://www.physionet.org/physiotools/wag/signal-5.htm

Here are my questions / remarks:

  • does this mean it is safe to convert all signals to 32bits ?
  • if this is safe, would it be possible to make your functions return float32 arrays ? Or even better to return the minimum size format depending on the header format field ?

G.

@cx1111
Copy link
Member

cx1111 commented Aug 1, 2017

I think the second option sounds reasonable. Can anyone think of any instances where returning numpy arrays with dtype=floatXX !=64 will do any harm with the average user's scripts?

@cx1111
Copy link
Member

cx1111 commented Aug 14, 2017

I'm nearly done with this, though I have some uncertainties regarding default logic and usability.

When digital values are returned, it is easy to use the numpy dtype corresponding to the same resolution or higher, ie. int16 for fmt 16 signals. But when physical values are requested, it may be a bit more ambiguous.

Consider a fmt 16 or fmt 32 signal. Converting those to float16 and float32 will make them lose some resolution if indeed the input signal was really captured with 16 or 32 bits.

In the end, I wonder what values the returnres argument I am adding should be allowed to be. I thought of:

  • 64, 32, 16 to be very explicit. But this is annoying because even though there is only float16+, there is int8.
  • 64, 32, 'minres' for a more automatic case where minres makes the package decide the optimal dtype. Once again, when it comes to floats, what would be the correct automatic formats? int32 --> float64 or float32? etc.

The default will be 64 in any case. Regarding my previous comment, people could easily create overflows/underflows if they unintentionally use a low resolution dtype.

@cx1111
Copy link
Member

cx1111 commented Aug 14, 2017

I've decided to allow it to be set to either 64, 32, 16 or 8. The user can mind the dtype since different cases may warrant different options.

Regarding your questions, for all formats aside from 32, it will be safe to convert to float32 with insignificant precision loss. And even with fmt 32 signals, it is unlikely that we have any (at this point in time on physionet) in which the capturing device actually recorded with 32 bit precision, so float32 would be generally safe in those cases too.

One last note, we have some ridiculously long signals in certain databases (many many days long). When processing these kinds of signals, reading them part by part will be inevitable since this package reads all values into memory.

I'll close this issue once I push the changes.

@Gijom
Copy link
Author

Gijom commented Aug 16, 2017

Great, many thanks for your prompt changes!

@cx1111
Copy link
Member

cx1111 commented Aug 16, 2017

Sure, no problem. Let me know if my changes actually did anything :)

@cx1111 cx1111 closed this as completed Aug 16, 2017
@cx1111
Copy link
Member

cx1111 commented Aug 31, 2017

I should go and edit the code to use np.add/subtract/divide/multiply during adc/dac conversion when only one version of the signal (digital/physical) is desired, to prevent extra memory use via temporary variable allocation.

@cx1111 cx1111 reopened this Aug 31, 2017
@cx1111 cx1111 closed this as completed Sep 27, 2017
tompollard added a commit that referenced this issue Jul 11, 2024
As discussed in #490,
https://github.com/MIT-LCP/wfdb-python/blob/34b989e08435c1a82d31bdd2800c4c14147e3e93/wfdb/io/convert/csv.py#L10
currently "strips the path from the input .csv, then writes the output
to .dat and .hea".

It's inconvenient not to be able to specify the output directory. This
pull request adds a new `output_dir` argument to the `csv_to_wfdb`
function. By default `output_dir` is set to None, which will maintain
backwards compatibility. Setting `output_dir` to a directory will mean
that output files are saved to this directory.

I have set this to a WIP, because I haven't tested the new behaviour
(other than running `pytest`). @jshaffer94247, if you have an
opportunity to test the fix, I'd appreciate your feedback.
tompollard added a commit that referenced this issue Jan 21, 2025
This pull request adds a changelog for `v4.2.0`. The changelog is based
on the following auto-generated summary of merge commits generated by
GitHub:

```
## What's Changed

* bug-fix: Numpy ValueError when cheking empty list equality by @ajadczaksunriselabs in #459
* bug-fix: Pandas set indexing error by @ajadczaksunriselabs in #460
* fix for /issues/452 by @tecamenz in #465
* Use numpydoc to render documentation by @SnoopJ in #472
* build(deps): bump readthedocs-sphinx-search from 0.1.1 to 0.3.2 in /docs by @dependabot in #477
* Update style by @bemoody in #482
* Fix NaN handling in Record.adc, and other fixes by @bemoody in #481
* Set upper bound on Numpy version (numpy = ">=1.10.1,<2.0.0"). Ref #493. by @tompollard in #494
* Update actions to use actions/checkout@v3 and actions/setup-python@v4. by @tompollard in #495
* Fix: Indent code to ensure 'j' is within for-loop in GQRS algorithm by @tompollard in #499
* Add write_dir argument to csv_to_wfdb. Fixes #67. by @tompollard in #492
* Fix warnings by @cbrnr in #502
* README improvements by @bemoody in #503
* Change in type promotion. Fixes to annotation.py by @tompollard in #506
* Use uv by @cbrnr in #504
* Change in type promotion. Fixes to _signal.py by @tompollard in #507
* Test round-trip write/read of supported binary formats by @bemoody in #509
* Corrected typo and extended allowed types for MultiSegmentRecord by @agent3gatech in #514
* Allow expanded physical signal in `calc_adc_params` by @briangow in #512
* Add capability to write signal with unique `samps_per_frame` to `wfdb.io.wrsamp` by @briangow in #510
* Fix selection of channels when converting to EDF by @SamJelfs in #519
* Change in type promotion introduced in Numpy 2.0. Fixes to edf.py. by @tompollard in #527
* Bump dependencies for NumPy 2 compatibility by @cbrnr in #511
* Bump version to v4.2.0 and update notes on creating new releases by @tompollard in #497

## New Contributors

* @ajadczaksunriselabs made their first contribution in #459
* @tecamenz made their first contribution in #465
* @SnoopJ made their first contribution in #472
* @dependabot made their first contribution in #477
* @agent3gatech made their first contribution in #514
* @SamJelfs made their first contribution in #519

**Full Changelog**: v4.1.2...v4.2.0
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants