Skip to content

Commit

Permalink
BUG: Fix CSV parsing of singleton list header (pandas-dev#17090)
Browse files Browse the repository at this point in the history
  • Loading branch information
threecgreen authored and jowens committed Sep 20, 2017
1 parent 56957cf commit d2e21c3
Show file tree
Hide file tree
Showing 4 changed files with 24 additions and 12 deletions.
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.21.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -287,6 +287,7 @@ I/O
- Bug in :func:`read_csv` in which non integer values for the header argument generated an unhelpful / unrelated error message (:issue:`16338`)
- Bug in :func:`read_csv` in which memory management issues in exception handling, under certain conditions, would cause the interpreter to segfault (:issue:`14696`, :issue:`16798`).
- Bug in :func:`read_csv` when called with ``low_memory=False`` in which a CSV with at least one column > 2GB in size would incorrectly raise a ``MemoryError`` (:issue:`16798`).
- Bug in :func:`read_csv` when called with a single-element list ``header`` would return a ``DataFrame`` of all NaN values (:issue:`7757`)
- Bug in :func:`read_stata` where value labels could not be read when using an iterator (:issue:`16923`)
- Bug in :func:`read_html` where import check fails when run in multiple threads (:issue:`16928`)

Expand Down
21 changes: 12 additions & 9 deletions pandas/_libs/parsers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -535,23 +535,26 @@ cdef class TextReader:
self.parser_start = 0
self.header = []
else:
if isinstance(header, list) and len(header):
# need to artifically skip the final line
# which is still a header line
header = list(header)
header.append(header[-1] + 1)
if isinstance(header, list):
if len(header) > 1:
# need to artifically skip the final line
# which is still a header line
header = list(header)
header.append(header[-1] + 1)
self.parser.header_end = header[-1]
self.has_mi_columns = 1
else:
self.parser.header_end = header[0]

self.parser_start = header[-1] + 1
self.parser.header_start = header[0]
self.parser.header_end = header[-1]
self.parser.header = header[0]
self.parser_start = header[-1] + 1
self.has_mi_columns = 1
self.header = header
else:
self.parser.header_start = header
self.parser.header_end = header
self.parser.header = header
self.parser_start = header + 1
self.parser.header = header
self.header = [ header ]

self.names = names
Expand Down
7 changes: 4 additions & 3 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2283,10 +2283,11 @@ def _infer_columns(self):
if self.header is not None:
header = self.header

# we have a mi columns, so read an extra line
if isinstance(header, (list, tuple, np.ndarray)):
have_mi_columns = True
header = list(header) + [header[-1] + 1]
have_mi_columns = len(header) > 1
# we have a mi columns, so read an extra line
if have_mi_columns:
header = list(header) + [header[-1] + 1]
else:
have_mi_columns = False
header = [header]
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/io/parser/header.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,3 +286,10 @@ def test_non_int_header(self):
self.read_csv(StringIO(data), sep=',', header=['a', 'b'])
with tm.assert_raises_regex(ValueError, msg):
self.read_csv(StringIO(data), sep=',', header='string_header')

def test_singleton_header(self):
# See GH #7757
data = """a,b,c\n0,1,2\n1,2,3"""
df = self.read_csv(StringIO(data), header=[0])
expected = DataFrame({"a": [0, 1], "b": [1, 2], "c": [2, 3]})
tm.assert_frame_equal(df, expected)

0 comments on commit d2e21c3

Please sign in to comment.