Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix to_csv delimiter handling of timestamp format(#7023)
Closes #6699 The timestamp format(s) used by the CSV writer have the form `%Y-%m-%dT%H:%M:%SZ`. This means if the column delimiter `','` or the line delimiter `\n` is either `':'` or `'-'` then the timestamp string output could conflict with these delimiters. The current logic simply removed these delimiters from the format if they detected a conflicting column or line delimiter. For example, specifying a dash `'-'` as column delimiter caused the timestamp format to change to `%Y%m%d...` (the dash is removed). I admit this was kind of hacky and also made the output inconsistent with Pandas `to_csv()`. It is easy enough to simply add double-quotes around the timestamp format to prevent these conflicts as well as make the output consistent. This PR fixes that logic. Exception logic to check for a dash as column separator was also found in [csv.py](https://github.com/rapidsai/cudf/blob/8c1f01e1fd713d873cf3d943ab409f3e9efc48f8/python/cudf/cudf/io/csv.py#L139-L149), specifically citing issue 6699 in the exception message. Also, there was a pytest specifically created to check for this exception. The exception is removed and the pytest function updated in this PR as well. Authors: - davidwendt <[email protected]> Approvers: - GALI PREM SAGAR - Karthikeyan - null URL: #7023
- Loading branch information