Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable "universal newlines" to avoid unwanted changes #23

Merged
merged 1 commit into from
Sep 3, 2014

Conversation

dforsi
Copy link
Contributor

@dforsi dforsi commented Mar 29, 2014

Python 3 is using "universal newlines" by default when reading text files, which means that foreign line terminators are silently discarded and are never seen by the program; to disable this behavior, newline='' (empty string) must be added to open(), see http://docs.python.org/3/library/functions.html#open

Test cases:

python -c "print('teh')," >test
python -c "print('teh\r')," >test-cr
python -c "print('teh\n')," >test-lf
python -c "print('teh\r\n')," >test-crlf
python -c "print('teh\n\r')," >test-lfcr
for f in test*; do echo === $f ===; hexdump -C $f; ~/Programmazione/codespell/codespell.py -w $f; hexdump -C $f; done

Output when running on Linux before this patch (note how 0d is changed to 0a except in test-crlf where it disappears):

=== test ===
00000000  74 65 68 0a                                       |teh.|
00000004
FIXED: test
00000000  74 68 65 0a                                       |the.|
00000004
=== test-cr ===
00000000  74 65 68 0d                                       |teh.|
00000004
FIXED: test-cr
00000000  74 68 65 0a                                       |the.|
00000004
=== test-crlf ===
00000000  74 65 68 0d 0a                                    |teh..|
00000005
FIXED: test-crlf
00000000  74 68 65 0a                                       |the.|
00000004
=== test-lf ===
00000000  74 65 68 0a                                       |teh.|
00000004
FIXED: test-lf
00000000  74 68 65 0a                                       |the.|
00000004
=== test-lfcr ===
00000000  74 65 68 0a 0d                                    |teh..|
00000005
FIXED: test-lfcr
00000000  74 68 65 0a 0a                                    |the..|
00000005

Output when running on Linux after this patch (0d is preserved):

=== test ===
00000000  74 65 68 0a                                       |teh.|
00000004
FIXED: test
00000000  74 68 65 0a                                       |the.|
00000004
=== test-cr ===
00000000  74 65 68 0d                                       |teh.|
00000004
FIXED: test-cr
00000000  74 68 65 0d                                       |the.|
00000004
=== test-crlf ===
00000000  74 65 68 0d 0a                                    |teh..|
00000005
FIXED: test-crlf
00000000  74 68 65 0d 0a                                    |the..|
00000005
=== test-lf ===
00000000  74 65 68 0a                                       |teh.|
00000004
FIXED: test-lf
00000000  74 68 65 0a                                       |the.|
00000004
=== test-lfcr ===
00000000  74 65 68 0a 0d                                    |teh..|
00000005
FIXED: test-lfcr
00000000  74 68 65 0a 0d                                    |the..|
00000005

Since Python 3, when reading from a file opened in text mode,
different end of line characters are handled transparently by
default regardless of the underlying operating system, however
when writing to text file the system default line separator is
always used. Disabling "universal newlines" will make foreign
line separators appear in the strings returned by readlines()
and the regexp will handle them like any other control
character (eg. \t) and they will be written to the output file.

The documentation of universal newlines is at
http://docs.python.org/3/library/functions.html#open
lucasdemarchi added a commit that referenced this pull request Sep 3, 2014
Disable "universal newlines" to avoid unwanted changes
@lucasdemarchi lucasdemarchi merged commit 552e49f into codespell-project:master Sep 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants