-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing memory leaks in read_csv #23072
Conversation
Codecov Report
@@ Coverage Diff @@
## master #23072 +/- ##
=======================================
Coverage 92.28% 92.28%
=======================================
Files 161 161
Lines 51434 51434
=======================================
Hits 47467 47467
Misses 3967 3967
Continue to review full report at Codecov.
|
can you run the asv's for csv to see if any effects & a whatsnew note |
can you run the code at the very top of the issue here and show the leak has disappeared. |
@kuraga, would you be able to test this change on your reproducer from #21353? @jreback, not sure what asv's for csv are. This patch fixes a memory leak in some proprietary code I was working on so I can't post it here, and I don't have a standalone reproducer for my issue, sorry. The reason why I mentioned #21353 was that the leak I was seeing was also coming from read_csv. |
@zhezherun i understand, look at the top of the issue. there is a script at the top. pls run that with the new version. |
@zhezherun do you know, will this patch address #19941 as well? Would this memory leak have been exacerbated by multiple threads, or do you think that's a different issue? |
I pulled down this branch and confirmed that it does not fix #19941 (but @zhezherun if you have any guesses on what may be going on there it'd be appreciated). |
@zhezherun can you add a whatsnew note in bug fixes / io section, mentioning the issue number. ping on green. |
@zhezherun can you update |
Added a release note. Ping on green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments, ping on green.
@gfyoung can you rebase and fix up? |
* Move allocation of na_hashset down to avoid a leak on continue * Delete na_hashset if there is an exception * Clean up table before raising an exception Closes pandas-devgh-21353.
lgtm. ping on green. |
@jreback : Comments addressed, and all is green. PTAL. |
Thanks all. |
…fixed * upstream/master: (46 commits) DEPS: bump xlrd min version to 1.0.0 (pandas-dev#23774) BUG: Don't warn if default conflicts with dialect (pandas-dev#23775) BUG: Fixing memory leaks in read_csv (pandas-dev#23072) TST: Extend datetime64 arith tests to array classes, fix several broken cases (pandas-dev#23771) STYLE: Specify bare exceptions in pandas/tests (pandas-dev#23370) ENH: between_time, at_time accept axis parameter (pandas-dev#21799) PERF: Use is_utc check to improve performance of dateutil UTC in DatetimeIndex methods (pandas-dev#23772) CLN: io/formats/html.py: refactor (pandas-dev#22726) API: Make Categorical.searchsorted returns a scalar when supplied a scalar (pandas-dev#23466) TST: Add test case for GH14080 for overflow exception (pandas-dev#23762) BUG: Don't extract header names if none specified (pandas-dev#23703) BUG: Index.str.partition not nan-safe (pandas-dev#23558) (pandas-dev#23618) DEPR: tz_convert in the Timestamp constructor (pandas-dev#23621) PERF: Datetime/Timestamp.normalize for timezone naive datetimes (pandas-dev#23634) TST: Use new arithmetic fixtures, parametrize many more tests (pandas-dev#23757) REF/TST: Add more pytest idiom to parsers tests (pandas-dev#23761) DOC: Add ignore-deprecate argument to validate_docstrings.py (pandas-dev#23650) ENH: update pandas-gbq to 0.8.0, adds credentials arg (pandas-dev#23662) DOC: Improve error message to show correct order (pandas-dev#23652) ENH: Improve error message for empty object array (pandas-dev#23718) ...
* Move allocation of na_hashset down to avoid a leak on continue * Delete na_hashset if there is an exception * Clean up table before raising an exception Closes pandas-devgh-21353.
* Move allocation of na_hashset down to avoid a leak on continue * Delete na_hashset if there is an exception * Clean up table before raising an exception Closes pandas-devgh-21353.
Hi, I am facing this issue on google compute engine (Windows Server 2012 R2 Datacenter, 64 bit). How do I fix it? I have installed the latest version of Pandas. |
This PR fixes a memory leak in parsers.pyx detected by valgrind, and also adds some further cleanup that should avoid memory leaks on exceptions,
closes #21353
continue
is executed,na_hashset
if there is an exception,kset_from_list
before raising an exception.