-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory optimization #67
Comments
I think the second option sounds reasonable. Can anyone think of any instances where returning numpy arrays with dtype=floatXX !=64 will do any harm with the average user's scripts? |
I'm nearly done with this, though I have some uncertainties regarding default logic and usability. When digital values are returned, it is easy to use the numpy dtype corresponding to the same resolution or higher, ie. int16 for fmt 16 signals. But when physical values are requested, it may be a bit more ambiguous. Consider a fmt 16 or fmt 32 signal. Converting those to float16 and float32 will make them lose some resolution if indeed the input signal was really captured with 16 or 32 bits. In the end, I wonder what values the
The default will be 64 in any case. Regarding my previous comment, people could easily create overflows/underflows if they unintentionally use a low resolution dtype. |
I've decided to allow it to be set to either 64, 32, 16 or 8. The user can mind the dtype since different cases may warrant different options. Regarding your questions, for all formats aside from 32, it will be safe to convert to float32 with insignificant precision loss. And even with fmt 32 signals, it is unlikely that we have any (at this point in time on physionet) in which the capturing device actually recorded with 32 bit precision, so float32 would be generally safe in those cases too. One last note, we have some ridiculously long signals in certain databases (many many days long). When processing these kinds of signals, reading them part by part will be inevitable since this package reads all values into memory. I'll close this issue once I push the changes. |
Great, many thanks for your prompt changes! |
Sure, no problem. Let me know if my changes actually did anything :) |
I should go and edit the code to use np.add/subtract/divide/multiply during adc/dac conversion when only one version of the signal (digital/physical) is desired, to prevent extra memory use via temporary variable allocation. |
As discussed in #490, https://github.com/MIT-LCP/wfdb-python/blob/34b989e08435c1a82d31bdd2800c4c14147e3e93/wfdb/io/convert/csv.py#L10 currently "strips the path from the input .csv, then writes the output to .dat and .hea". It's inconvenient not to be able to specify the output directory. This pull request adds a new `output_dir` argument to the `csv_to_wfdb` function. By default `output_dir` is set to None, which will maintain backwards compatibility. Setting `output_dir` to a directory will mean that output files are saved to this directory. I have set this to a WIP, because I haven't tested the new behaviour (other than running `pytest`). @jshaffer94247, if you have an opportunity to test the fix, I'd appreciate your feedback.
This pull request adds a changelog for `v4.2.0`. The changelog is based on the following auto-generated summary of merge commits generated by GitHub: ``` ## What's Changed * bug-fix: Numpy ValueError when cheking empty list equality by @ajadczaksunriselabs in #459 * bug-fix: Pandas set indexing error by @ajadczaksunriselabs in #460 * fix for /issues/452 by @tecamenz in #465 * Use numpydoc to render documentation by @SnoopJ in #472 * build(deps): bump readthedocs-sphinx-search from 0.1.1 to 0.3.2 in /docs by @dependabot in #477 * Update style by @bemoody in #482 * Fix NaN handling in Record.adc, and other fixes by @bemoody in #481 * Set upper bound on Numpy version (numpy = ">=1.10.1,<2.0.0"). Ref #493. by @tompollard in #494 * Update actions to use actions/checkout@v3 and actions/setup-python@v4. by @tompollard in #495 * Fix: Indent code to ensure 'j' is within for-loop in GQRS algorithm by @tompollard in #499 * Add write_dir argument to csv_to_wfdb. Fixes #67. by @tompollard in #492 * Fix warnings by @cbrnr in #502 * README improvements by @bemoody in #503 * Change in type promotion. Fixes to annotation.py by @tompollard in #506 * Use uv by @cbrnr in #504 * Change in type promotion. Fixes to _signal.py by @tompollard in #507 * Test round-trip write/read of supported binary formats by @bemoody in #509 * Corrected typo and extended allowed types for MultiSegmentRecord by @agent3gatech in #514 * Allow expanded physical signal in `calc_adc_params` by @briangow in #512 * Add capability to write signal with unique `samps_per_frame` to `wfdb.io.wrsamp` by @briangow in #510 * Fix selection of channels when converting to EDF by @SamJelfs in #519 * Change in type promotion introduced in Numpy 2.0. Fixes to edf.py. by @tompollard in #527 * Bump dependencies for NumPy 2 compatibility by @cbrnr in #511 * Bump version to v4.2.0 and update notes on creating new releases by @tompollard in #497 ## New Contributors * @ajadczaksunriselabs made their first contribution in #459 * @tecamenz made their first contribution in #465 * @SnoopJ made their first contribution in #472 * @dependabot made their first contribution in #477 * @agent3gatech made their first contribution in #514 * @SamJelfs made their first contribution in #519 **Full Changelog**: v4.1.2...v4.2.0 ```
Hello, I am facing some memory problems when reading big databases (which I would like to store completly in the memory). I noticed that the wfdb functions generally return float64 numpy arrays. However, if I understood correctly from the link below, the signals are coded on maximum 32 bits.
https://www.physionet.org/physiotools/wag/signal-5.htm
Here are my questions / remarks:
G.
The text was updated successfully, but these errors were encountered: