-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to_* methods to compression='infer' #22011
Changes from 9 commits
8689167
3ccfb00
648bf4d
be724fa
9fe27c9
65f0689
868e671
c3b76ee
cebc0d9
8411eb2
c098c8f
2f6601d
eb7f9b5
d4a5c90
abd19e3
2f670fe
aa9ce13
a6aabad
8a0c97e
6be808d
63e6591
fadb943
0edffc7
97f5de5
83bc0a8
874a4bf
14c3945
9a4dc41
25bdb4c
1ba8f3a
24e051e
387d1d2
12f14e2
6db23d9
e3a0f56
af8c137
f8829a6
918c0f8
eadf68e
cf5b62e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -28,7 +28,7 @@ | |||
# interface to/from | ||||
def to_json(path_or_buf, obj, orient=None, date_format='epoch', | ||||
double_precision=10, force_ascii=True, date_unit='ms', | ||||
default_handler=None, lines=False, compression=None, | ||||
default_handler=None, lines=False, compression='infer', | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure where to update the to_json docs... didn't see a docstring in this function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Line 1905 in 322dbf4
|
||||
index=True): | ||||
|
||||
if not index and orient not in ['split', 'table']: | ||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
# -*- coding: utf-8 -*- | ||
|
||
import gzip | ||
import sys | ||
|
||
import pytest | ||
|
@@ -351,3 +352,15 @@ def test_to_csv_compression(self, compression_only, | |
result = pd.read_csv(path, index_col=0, | ||
compression=read_compression) | ||
tm.assert_frame_equal(result, df) | ||
|
||
def test_compression_defaults_to_infer(tmpdir): | ||
""" | ||
Test that to_csv defaults to inferring compression from paths. | ||
https://github.com/pandas-dev/pandas/pull/22011 | ||
""" | ||
df = DataFrame({"A": [1]}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I suppose this "works" it is definitely focused on We have a fixture called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The possible values that compression can take for I think it may make sense to make a similar test for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My point was that this test only makes sure that gzip compression works by default with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In abd19e3, I modified an existing parametrized test to look for compression by default for paths where inference should occur. This actually caught an issue (we hadn't switched default for Should I delete the gzip test? Note the parametrized test doesn't test that the right compression is occurring, just that a compression is occurring. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of going about it in this fashion can you not do a round trip with with tm.ensure_clean('compressed.csv.{}'.format(compression_only)) as path:
df.to_csv(path)
result = pd.read_csv(path, compression=compression_only)
tm.assert_frame_equal(result, df) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not a big fan of the roundtrip approach here, since it never actually tests that the file is compressed on disk. Given that the to_* and read_* methods rely on much of the same compression infrastructure, I think it's possible to modify the code such that all compression gets disabled and the roudtrip works perfectly. Now hopefully there are enough other tests to catch such a situation. |
||
with tm.ensure_clean('compressed.csv.gz') as path: | ||
df.to_csv(path, index=False) | ||
with gzip.open(path, 'rt') as read_file: | ||
lines = read_file.read().splitlines() | ||
assert lines == ['A', '1'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add versionchanged in each of the modified doc-strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 1ba8f3a