Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv.reader() to support QUOTE_ALL #77100

Closed
PavelShpilev mannequin opened this issue Feb 23, 2018 · 4 comments
Closed

csv.reader() to support QUOTE_ALL #77100

PavelShpilev mannequin opened this issue Feb 23, 2018 · 4 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@PavelShpilev
Copy link
Mannequin

PavelShpilev mannequin commented Feb 23, 2018

BPO 32919
Nosy @bitdancer

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2018-02-23.04:55:23.870>
labels = ['extension-modules', '3.8', 'type-feature', '3.7']
title = 'csv.reader() to support QUOTE_ALL'
updated_at = <Date 2018-03-06.00:42:52.849>
user = 'https://bugs.python.org/PavelShpilev'

bugs.python.org fields:

activity = <Date 2018-03-06.00:42:52.849>
actor = 'Pavel Shpilev'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Extension Modules']
creation = <Date 2018-02-23.04:55:23.870>
creator = 'Pavel Shpilev'
dependencies = []
files = []
hgrepos = []
issue_num = 32919
keywords = []
message_count = 3.0
messages = ['312617', '313194', '313301']
nosy_count = 2.0
nosy_names = ['r.david.murray', 'Pavel Shpilev']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue32919'
versions = ['Python 3.7', 'Python 3.8']

@PavelShpilev
Copy link
Mannequin Author

PavelShpilev mannequin commented Feb 23, 2018

It appears that in current implementation csv.QUOTE_ALL has no effect on csv. reader(), it only affects csv.writer(). I know that csv is a poorly defined format and all, but I think this might be useful to distinguish None and '' values for the sources that use such quoting.

Example:

"1","Noneval",,"9"
"2","Emptystr","","10"
"3","somethingelse","","8"

Reader converts all values in the third column to empty strings. The suggestion is to adjust reader's behaviour so when quoting=csv.QUOTE_ALL that would instruct reader to convert empty values (like the one in the first row) to None instead.

@PavelShpilev PavelShpilev mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life extension-modules C modules in the Modules dir type-feature A feature request or enhancement labels Feb 23, 2018
@bitdancer
Copy link
Member

QUOTE_ALL only makes sense as an output control parameter, IMO. It is an output discipline but doesn't say anything about semantics. In csv format, an empty field and a field containing the empty quoted string are completely equivalent. I would be -1 on adding an option that differentiated them.

@PavelShpilev
Copy link
Mannequin Author

PavelShpilev mannequin commented Mar 6, 2018

I know that CSV specification says empty field and empty string are the same, however, I still believe there is practical use for unconventional processing of such fields.

In our specific case we parse CSVs produced by Amazon Athena (based on Presto) in which NULL and empty string values represented as above. Following CSV specs dogmatically, there's no way to distinguish between the two, but pragmatically you can tell them apart by simply looking at values.

Brief search shows we aren't the only ones facing the issue. After giving it some more thought, I'd agree that csv.QUOTE_ALL doesn't make much sense here, but may be an extra argument to csv.reader() will do the trick? Something like csv.reader(detect_none_values=False/True), with False being default, and emphasis in the documentation that True goes against CSV specification.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@serhiy-storchaka
Copy link
Member

Two new quoting rules QUOTE_NOTNULL and QUOTE_STRINGS were introduced in #67230, and #113732 makes the reader supporting them.

@serhiy-storchaka serhiy-storchaka closed this as not planned Won't fix, can't repro, duplicate, stale Jan 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life extension-modules C modules in the Modules dir type-feature A feature request or enhancement
Projects
Status: Done
Development

No branches or pull requests

2 participants