Skip to content

Commit

Permalink
io.metadata: Fix newline handling when reading CSV/TSV
Browse files Browse the repository at this point in the history
The csv module very clearly documents¹ that the "newline" parameter
should be set to the empty string so that the csv module can itself do
proper embedded newline handling.  Follow that recommendation.  This
change shouldn't affect existing inputs that worked fine but now allows
inputs with embedded newlines in fields.

The previously-added failing test now passes.

¹ <https://docs.python.org/3/library/csv.html#id4>
  • Loading branch information
tsibley committed Jul 30, 2024
1 parent 7d48743 commit a6b7265
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 2 deletions.
7 changes: 7 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

## __NEXT__

### Bug Fixes

* Embedded newlines in quoted field values of metadata files are now properly handled. [#1561][] (@tsibley)

[#1561]: https://github.com/nextstrain/augur/pull/1561



## 25.2.0 (24 July 2024)

Expand Down
4 changes: 2 additions & 2 deletions augur/io/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -603,7 +603,7 @@ def __init__(self, path: str, delimiters: Sequence[str], id_columns: Sequence[st

def open(self, **kwargs):
"""Open the file with auto-compression/decompression."""
return open_file(self.path, **kwargs)
return open_file(self.path, newline='', **kwargs)

def _find_first(self, columns: Sequence[str]):
"""Return the first column in `columns` that is present in the metadata.
Expand Down Expand Up @@ -646,7 +646,7 @@ def _get_delimiter(path: str, valid_delimiters: Iterable[str]):
if len(delimiter) != 1:
raise AugurError(f"Delimiters must be single-character strings. {delimiter!r} does not satisfy that condition.")

with open_file(path) as file:
with open_file(path, newline='') as file:
try:
# Infer the delimiter from the first line.
return csv.Sniffer().sniff(file.readline(), "".join(valid_delimiters)).delimiter
Expand Down

0 comments on commit a6b7265

Please sign in to comment.