Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix to_csv delimiter handling of timestamp format #7023

Merged

Conversation

davidwendt
Copy link
Contributor

Closes #6699
The timestamp format(s) used by the CSV writer have the form %Y-%m-%dT%H:%M:%SZ. This means if the column delimiter ',' or the line delimiter \n is either ':' or '-' then the timestamp string output could conflict with these delimiters. The current logic simply removed these delimiters from the format if they detected a conflicting column or line delimiter. For example, specifying a dash '-' as column delimiter caused the timestamp format to change to %Y%m%d... (the dash is removed). I admit this was kind of hacky and also made the output inconsistent with Pandas to_csv().

It is easy enough to simply add double-quotes around the timestamp format to prevent these conflicts as well as make the output consistent. This PR fixes that logic.

Exception logic to check for a dash as column separator was also found in csv.py, specifically citing issue 6699 in the exception message. Also, there was a pytest specifically created to check for this exception. The exception is removed and the pytest function updated in this PR as well.

@davidwendt davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change labels Dec 16, 2020
@davidwendt davidwendt requested review from a team as code owners December 16, 2020 19:00
@davidwendt davidwendt self-assigned this Dec 16, 2020
@codecov
Copy link

codecov bot commented Dec 16, 2020

Codecov Report

Merging #7023 (8e1cf1b) into branch-0.18 (8c1f01e) will increase coverage by 0.19%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.18    #7023      +/-   ##
===============================================
+ Coverage        82.03%   82.23%   +0.19%     
===============================================
  Files               96       96              
  Lines            16381    16547     +166     
===============================================
+ Hits             13438    13607     +169     
+ Misses            2943     2940       -3     
Impacted Files Coverage Δ
python/cudf/cudf/io/csv.py 93.33% <ø> (-0.42%) ⬇️
python/cudf/cudf/_fuzz_testing/fuzzer.py 0.00% <0.00%> (ø)
python/cudf/cudf/utils/hash_vocab_utils.py 100.00% <0.00%> (ø)
python/cudf/cudf/core/indexing.py 96.35% <0.00%> (+0.72%) ⬆️
python/cudf/cudf/core/column/numerical.py 95.88% <0.00%> (+1.31%) ⬆️
python/cudf/cudf/core/abc.py 91.48% <0.00%> (+4.25%) ⬆️
python/cudf/cudf/utils/gpu_utils.py 58.53% <0.00%> (+4.87%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8c1f01e...8e1cf1b. Read the comment docs.

Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python changes LGTM

Copy link
Contributor

@karthikeyann karthikeyann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I suppose, parsing datetime columns with double quotes will work.

@davidwendt
Copy link
Contributor Author

LGTM.
I suppose, parsing datetime columns with double quotes will work.

Yes, that already works with every type.

@davidwendt
Copy link
Contributor Author

rerun tests

@rapids-bot rapids-bot bot merged commit ca1a4d6 into rapidsai:branch-0.18 Jan 4, 2021
@davidwendt davidwendt deleted the bug-csv-write-timestamp-fmt branch January 4, 2021 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] datetime column being written without any delimiters when sep='-'
4 participants