-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inconsistent with bz2.open on files containing vertical tab ^K #394
Comments
Tried with 1.9.0 as well and same problem. with smart_open.smart_open('debug.tsv.bz2', mode='r') as f:
with smart_open.smart_open('debug.tsv', mode='r') as f:
with open('debug.tsv', mode='r') as f:
with bz2.open('debug.tsv.bz2', mode='rt') as f:
|
Is it OK to open binary files in text mode ( Neither of your outputs (444 and 852) seem to match the line length of your example (7). I don't understand what you're trying to show. |
I think this is a duplicate of #269 |
Two different examples. The input contains special tokens so I quoted an excerpt of the binary representation. |
I think the culprit here is the vertical tab character \x0b. Not sure why it gets confused with the line return character. |
Please post a minimal reproducible example, including any required data. |
Problem description
Be sure your description clearly answers the following questions:
Trying to use smart_open to replace bz2.open
same behavior as bz2.open wrt recognizing line breaks
a long line got truncated due to the presence of non-line break symbol ^K
Steps/code to reproduce the problem
In order for us to be able to solve your problem, we have to be able to reproduce it on our end.
Without reproducing the problem, it is unlikely that we'll be able to help you.
Include full tracebacks, logs and datasets if necessary.
Please keep the examples minimal (minimal reproducible example).
take for instance the following binary uncompressed text. compress with bz2. The numbers of columns before and after bz2 as recognize with smart_open(..).readline() are different.
\xe5\x93\x81\x0b\xe3\x80\n
Versions
Please provide the output of:
Instead
pip show smart_open
Name: smart-open
Version: 1.7.1
Checklist
Before you create the issue, please make sure you have:
The text was updated successfully, but these errors were encountered: