-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Don't over-optimize memory with jagged CSV #23527
BUG: Don't over-optimize memory with jagged CSV #23527
Conversation
Hello @gfyoung! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #23527 +/- ##
=======================================
Coverage 92.23% 92.23%
=======================================
Files 161 161
Lines 51324 51324
=======================================
Hits 47339 47339
Misses 3985 3985
Continue to review full report at Codecov.
|
@jreback : Any thoughts on this one? (I only removed the |
can you run the current benchmarks on this. do we have any that specifically target this? |
@jreback : No specific benchmarks as far as know, though I did not observe any meaningful changes to the benchmarks after making this change. |
@jreback : Any other thoughts on this? |
feccd27
to
015a193
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gfyoung ok I reviewed the original issue. so lgtm. if you would add some comments on the parts you added for future readers should be great. ping on green.
With jagged CSV's, we risk being too quick to dump memory that we need to allocate because previous chunks would have indicated much larger rows than we can anticipate in subsequent chunks. Closes pandas-devgh-23509.
015a193
to
17f7822
Compare
@jreback : Addressed the doc comments, and all is still green. PTAL. |
thanks @gfyoung |
* upstream/master: BUG: Don't over-optimize memory with jagged CSV (pandas-dev#23527) DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635) More helpful Stata string length error. (pandas-dev#23629) BUG: astype fill_value for SparseArray.astype (pandas-dev#23547) CLN: datetimelike arrays: isort, small reorg (pandas-dev#23587) CI: Check in the CI that assert_raises_regex is not being used (pandas-dev#23627) CLN:Remove unused **kwargs from user facing methods (pandas-dev#23249) DOC: Enhancing pivot / reshape docs (pandas-dev#21038) TST: Fix xfailing DataFrame arithmetic tests by transposing (pandas-dev#23620)
…fixed * upstream/master: DOC: avoid SparseArray.take error (pandas-dev#23637) CLN: remove incorrect usages of com.AbstractMethodError (pandas-dev#23625) DOC: Adding validation of the section order in docstrings (pandas-dev#23607) BUG: Don't over-optimize memory with jagged CSV (pandas-dev#23527) DEPR: Deprecate usecols as int in read_excel (pandas-dev#23635) More helpful Stata string length error. (pandas-dev#23629) BUG: astype fill_value for SparseArray.astype (pandas-dev#23547) CLN: datetimelike arrays: isort, small reorg (pandas-dev#23587) CI: Check in the CI that assert_raises_regex is not being used (pandas-dev#23627) CLN:Remove unused **kwargs from user facing methods (pandas-dev#23249)
With jagged CSV's, we risk being too quick to dump memory that we need to allocate because previous chunks would have indicated much larger rows than we can anticipate in subsequent chunks. Closes pandas-devgh-23509.
With jagged CSV's, we risk being too quick to dump memory that we need to allocate because previous chunks would have indicated much larger rows than we can anticipate in subsequent chunks. Closes pandas-devgh-23509.
The edge case where we hit powers of 2 every time during allocation can be painful. Closes pandas-devgh-24805. xref pandas-devgh-23527.
The edge case where we hit powers of 2 every time during allocation can be painful. Closes pandas-devgh-24805. xref pandas-devgh-23527.
With jagged CSV's, we risk being too quick to dump memory that we need to allocate because previous chunks would have indicated much larger rows than we can anticipate in subsequent chunks. Closes pandas-devgh-23509.
* Fix memory growth bug in read_csv The edge case where we hit powers of 2 every time during allocation can be painful. Closes pandas-devgh-24805. xref pandas-devgh-23527. * TST: Add ASV benchmark for issue
With jagged CSV's, we risk being too quick to dump memory that we need to allocate because previous chunks would have indicated much larger rows than we can anticipate in subsequent chunks. Closes pandas-devgh-23509.
* Fix memory growth bug in read_csv The edge case where we hit powers of 2 every time during allocation can be painful. Closes pandas-devgh-24805. xref pandas-devgh-23527. * TST: Add ASV benchmark for issue
With jagged CSV's, we risk being too quick to dump memory that we need to allocate because previous chunks would have indicated much larger rows than we can anticipate in subsequent chunks.
Closes #23509.