From 49b90ec3beb36c0438f7bdf5724f02594d8f1fff Mon Sep 17 00:00:00 2001 From: Chris Mazzullo Date: Mon, 30 Oct 2017 20:57:06 -0400 Subject: [PATCH 1/7] GH17483 Added more explicit documentation of the 'infer' keyword to the 'header' parameter --- pandas/io/parsers.py | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/pandas/io/parsers.py b/pandas/io/parsers.py index 8f6b013558396..fe50b551ea948 100755 --- a/pandas/io/parsers.py +++ b/pandas/io/parsers.py @@ -74,15 +74,19 @@ .. versionadded:: 0.18.1 support for the Python parser. header : int or list of ints, default 'infer' - Row number(s) to use as the column names, and the start of the data. - Default behavior is as if set to 0 if no ``names`` passed, otherwise - ``None``. Explicitly pass ``header=0`` to be able to replace existing - names. The header can be a list of integers that specify row locations for - a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not - specified will be skipped (e.g. 2 in this example is skipped). Note that - this parameter ignores commented lines and empty lines if - ``skip_blank_lines=True``, so header=0 denotes the first line of data - rather than the first line of the file. + Row number(s) to use as the column names, and the start of the + data. Default behavior is to infer the column names: if no names + are passed the behavior is identical to ``header=0`` and column + names are inferred from the first line of the file, if column + names are passed explicitly then the behavior is identical to + ``header=None``. Explicitly pass ``header=0`` to be able to + replace existing names. The header can be a list of integers that + specify row locations for a multi-index on the columns + e.g. [0,1,3]. Intervening rows that are not specified will be + skipped (e.g. 2 in this example is skipped). Note that this + parameter ignores commented lines and empty lines if + ``skip_blank_lines=True``, so header=0 denotes the first line of + data rather than the first line of the file. names : array-like, default None List of column names to use. If file contains no header row, then you should explicitly pass header=None. Duplicates in this list will cause From 21f1796801542638419a5d5ce5406dc251559350 Mon Sep 17 00:00:00 2001 From: Chris Mazzullo Date: Mon, 30 Oct 2017 20:59:14 -0400 Subject: [PATCH 2/7] GH17483 More 'infer' keyword documentation --- doc/source/io.rst | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/doc/source/io.rst b/doc/source/io.rst index 5390fc3399e23..785917ad9a738 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -103,15 +103,19 @@ Column and Index Locations and Names ++++++++++++++++++++++++++++++++++++ header : int or list of ints, default ``'infer'`` - Row number(s) to use as the column names, and the start of the data. Default - behavior is as if ``header=0`` if no ``names`` passed, otherwise as if - ``header=None``. Explicitly pass ``header=0`` to be able to replace existing - names. The header can be a list of ints that specify row locations for a - multi-index on the columns e.g. ``[0,1,3]``. Intervening rows that are not - specified will be skipped (e.g. 2 in this example is skipped). Note that - this parameter ignores commented lines and empty lines if - ``skip_blank_lines=True``, so header=0 denotes the first line of data - rather than the first line of the file. + Row number(s) to use as the column names, and the start of the + data. Default behavior is to infer the column names: if no names are + passed the behavior is identical to ``header=0`` and column names + are inferred from the first line of the file, if column names are + passed explicitly then the behavior is identical to + ``header=None``. Explicitly pass ``header=0`` to be able to replace + existing names. The header can be a list of ints that specify row + locations for a multi-index on the columns + e.g. ``[0,1,3]``. Intervening rows that are not specified will be + skipped (e.g. 2 in this example is skipped). Note that this + parameter ignores commented lines and empty lines if + ``skip_blank_lines=True``, so header=0 denotes the first line of + data rather than the first line of the file. names : array-like, default ``None`` List of column names to use. If file contains no header row, then you should explicitly pass ``header=None``. Duplicates in this list will cause From 7a891d802717e7aebef4414d35b29e204d7ab440 Mon Sep 17 00:00:00 2001 From: Chris Mazzullo Date: Wed, 1 Nov 2017 21:07:58 -0400 Subject: [PATCH 3/7] Added paragraph break --- doc/source/io.rst | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/doc/source/io.rst b/doc/source/io.rst index 785917ad9a738..8715fee4ddaf8 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -108,14 +108,15 @@ header : int or list of ints, default ``'infer'`` passed the behavior is identical to ``header=0`` and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to - ``header=None``. Explicitly pass ``header=0`` to be able to replace - existing names. The header can be a list of ints that specify row - locations for a multi-index on the columns - e.g. ``[0,1,3]``. Intervening rows that are not specified will be - skipped (e.g. 2 in this example is skipped). Note that this - parameter ignores commented lines and empty lines if - ``skip_blank_lines=True``, so header=0 denotes the first line of - data rather than the first line of the file. + ``header=None``. + + Explicitly pass ``header=0`` to be able to replace existing + names. The header can be a list of ints that specify row locations + for a multi-index on the columns e.g. ``[0,1,3]``. Intervening rows + that are not specified will be skipped (e.g. 2 in this example is + skipped). Note that this parameter ignores commented lines and empty + lines if ``skip_blank_lines=True``, so header=0 denotes the first + line of data rather than the first line of the file. names : array-like, default ``None`` List of column names to use. If file contains no header row, then you should explicitly pass ``header=None``. Duplicates in this list will cause From 1666ff9a0f98a1354963b9deda3253bd8a00d6e2 Mon Sep 17 00:00:00 2001 From: Chris Mazzullo Date: Wed, 1 Nov 2017 21:18:57 -0400 Subject: [PATCH 4/7] Added a note to the 'Handling column names' section --- doc/source/io.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/source/io.rst b/doc/source/io.rst index 8715fee4ddaf8..2347b58401393 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -558,6 +558,11 @@ If the header is in a row other than the first, pass the row number to data = 'skip this skip it\na,b,c\n1,2,3\n4,5,6\n7,8,9' pd.read_csv(StringIO(data), header=1) +.. note:: + + The default behavior of ``read_csv`` is to use ``header='infer'``, + which will use the first nonblank row of the file as a header row. + .. _io.dupe_names: Duplicate names parsing From 4911dc708af0cb2dcb75be59e9fd7571d7ae7ca4 Mon Sep 17 00:00:00 2001 From: Chris Mazzullo Date: Thu, 2 Nov 2017 08:43:48 -0400 Subject: [PATCH 5/7] Used text from docstring for note --- doc/source/io.rst | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/doc/source/io.rst b/doc/source/io.rst index 2347b58401393..f9893d106ae49 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -560,8 +560,11 @@ If the header is in a row other than the first, pass the row number to .. note:: - The default behavior of ``read_csv`` is to use ``header='infer'``, - which will use the first nonblank row of the file as a header row. + Default behavior is to infer the column names: if no names are + passed the behavior is identical to ``header=0`` and column names + are inferred from the first line of the file, if column names are + passed explicitly then the behavior is identical to + ``header=None``. .. _io.dupe_names: From b8894ea54a8330f6234440f0e782d88454e9eae8 Mon Sep 17 00:00:00 2001 From: Chris Mazzullo Date: Wed, 1 Nov 2017 21:18:57 -0400 Subject: [PATCH 6/7] Added a note to the 'Handling column names' section --- doc/source/io.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/source/io.rst b/doc/source/io.rst index f9893d106ae49..1367a5350ea44 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -565,6 +565,8 @@ If the header is in a row other than the first, pass the row number to are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to ``header=None``. + The default behavior of ``read_csv`` is to use ``header='infer'``, + which will use the first nonblank row of the file as a header row. .. _io.dupe_names: From 04ee499e10a21518e7c4ebc66b5a212bfb72666b Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Thu, 30 Nov 2017 15:24:54 +0100 Subject: [PATCH 7/7] small fixup --- doc/source/io.rst | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/doc/source/io.rst b/doc/source/io.rst index 1367a5350ea44..2aeafd99f6e72 100644 --- a/doc/source/io.rst +++ b/doc/source/io.rst @@ -108,10 +108,10 @@ header : int or list of ints, default ``'infer'`` passed the behavior is identical to ``header=0`` and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to - ``header=None``. + ``header=None``. Explicitly pass ``header=0`` to be able to replace + existing names. - Explicitly pass ``header=0`` to be able to replace existing - names. The header can be a list of ints that specify row locations + The header can be a list of ints that specify row locations for a multi-index on the columns e.g. ``[0,1,3]``. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty @@ -562,11 +562,9 @@ If the header is in a row other than the first, pass the row number to Default behavior is to infer the column names: if no names are passed the behavior is identical to ``header=0`` and column names - are inferred from the first line of the file, if column names are - passed explicitly then the behavior is identical to + are inferred from the first nonblank line of the file, if column + names are passed explicitly then the behavior is identical to ``header=None``. - The default behavior of ``read_csv`` is to use ``header='infer'``, - which will use the first nonblank row of the file as a header row. .. _io.dupe_names: