-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace codecs with TextIOWrapper to fix newline issues when reading text files #578
Conversation
Looks like flake8 picked up some boo-boos in your code. Can you please have a look and fix them? |
There seem to be some issues with the windows tests. I need to look at this in more detail but it will be hard for me to debug if the solution is not obvious since I don't have any windows systems around. |
Seems all good now, the two tests failing on windows are now fixed. The pathlib test didn't consider universal newline mode which is the default with python 3. It only worked before because smart_open didn't correctly implement this. The csv test failed because the encoding wasn't explicitly specified and defaulted to cp852 on windows which broke the test when I added the unicode newline separator. |
OK, looks great. Thank you for your contribution @markopy ! |
This is great, thanks @markopy! |
Thanks. Going back to my usual fare, ping me if there are any issues with this in the wild. |
Title
Use
TextIOWrapper
instead ofcodecs
to fix various newline issues when opening files in text mode.Motivation
smart_open currently uses the
codecs
module to read text files which incorrectly splits lines as described in #557. This patch replacescodecs
withTextIOWrapper
fixing those issues and making thenewline
parameter work as expected for all files. Previously it was ignored unless the file was local and the builtinopen
was used.Tests
New tests
test_newline_read
andtest_newline_write
have been added to ensure behavior of thenewline
parameter is the same as for python's builtinopen
.Unicode line separator has been added to several other test for
readline()
to ensure line separation behavior is correct for differnent use cases.Benchmarks
There seems to be no practical speed difference when reading read/writing text files on S3.