Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Expand encoding for C engine beyond utf-16 #30771

Merged
merged 1 commit into from
Jan 7, 2020

Conversation

gfyoung
Copy link
Member

@gfyoung gfyoung commented Jan 7, 2020

And by utf-16, we mean the string "utf-16"

Closes #24130

@gfyoung gfyoung added the IO CSV read_csv, to_csv label Jan 7, 2020
@gfyoung gfyoung added this to the 1.0 milestone Jan 7, 2020
@gfyoung gfyoung force-pushed the utf-xx-encodings-csv branch 3 times, most recently from 60ca4e3 to 74ab9cd Compare January 7, 2020 07:12
@gfyoung gfyoung added the Bug label Jan 7, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung thanks for the the PR. a couple of comments.

pandas/tests/io/parser/conftest.py Show resolved Hide resolved
pandas/tests/io/parser/conftest.py Show resolved Hide resolved
@@ -5,6 +5,7 @@

from io import BytesIO
import os
import tempfile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can pytest builtin fixture or ensure_clean be used instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need the encoding to be parameterized in this test, which can't be done with ensure_clean at the moment (though could be a useful enhancement as a follow-up).

The pytest builtin fixture has a similar issue (as an aside, if I had to choose between ensure_clean or pytest fixture, I would generally go with our in-house one since it's a little more flexible to use).

And by utf-16, we mean the string "utf-16"

Closes pandas-dev#24130
@gfyoung gfyoung force-pushed the utf-xx-encodings-csv branch from 74ab9cd to 2a62025 Compare January 7, 2020 11:05
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - nice find

@jreback jreback merged commit 21fd692 into pandas-dev:master Jan 7, 2020
@jreback
Copy link
Contributor

jreback commented Jan 7, 2020

thanks @gfyoung very nice

gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 11, 2020
These keywords will be passed through to
tempfile constructor functions.

Follow-up:

pandas-dev#30771
gfyoung added a commit to forking-repos/pandas that referenced this pull request Jan 11, 2020
These keywords will be passed through to
tempfile constructor functions.

Follow-up:

pandas-dev#30771
jreback pushed a commit that referenced this pull request Jan 15, 2020
These keywords will be passed through to
tempfile constructor functions.

Follow-up:

#30771
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_csv can't roundtrip with UTF16/32 encodings
4 participants