-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv in combination with index_col and usecols #2654
Comments
This feels buggy or at minimum not intuitive to me. I think it's just an edge case that's not addressed in the test suite. I'll have a look |
Version 0.10.1 * tag 'v0.10.1': (195 commits) RLS: set released to true RLS: Version 0.10.1 TST: skip problematic xlrd test Merging in MySQL support pandas-dev#2482 Revert "Merging in MySQL support pandas-dev#2482" BUG: don't let np.prod overflow int64 RLS: note changed return type in DatetimeIndex.unique RLS: more what's new for 0.10.1 RLS: some what's new for 0.10.1 API: restore inplace=TRue returns self, add FutureWarnings. re pandas-dev#1893 Merging in MySQL support pandas-dev#2482 BUG: fix python 3 dtype issue DOC: fix what's new 0.10 doc bug re pandas-dev#2651 BUG: fix C parser thread safety. verify gil release close pandas-dev#2608 BUG: usecols bug with implicit first index column. close pandas-dev#2654 BUG: plotting bug when base is nonzero pandas-dev#2571 BUG: period resampling bug when all values fall into a single bin. close pandas-dev#2070 BUG: fix memory error in sortlevel when many multiindex levels. close pandas-dev#2684 STY: CRLF BUG: perf_HEAD reports wrong vbench name when an exception is raised ...
Hi When using use_cols with an implicit index column, the index column is now ignored and pandas returns it's own indexing (0, 1, 2, 3 etc). There is no way to specify that the index column should be used, as it doesn't have a header... |
pls provide an explicity example (on a new issue) that shows your result |
Here is a related problem where These commands works fine >>> data = 'a,b,c\napple,bat,5.7\norange,cow,10'
a b c
0 apple bat 5.7
1 orange cow 10.0
>>> pd.read_csv(StringIO(data), index_col=[0, 1])
c
a b
apple bat 5.7
orange cow 10.0
>>> pd.read_csv(StringIO(data), usecols=[2])
c
0 5.7
1 10.0 But combining >>> pd.read_csv(StringIO(data), index_col=[0, 1], usecols=[2])
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-95-c02d1a70fe51> in <module>()
----> 1 pd.read_csv(StringIO(data), index_col=[0, 1], usecols=[2])
~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
653 skip_blank_lines=skip_blank_lines)
654
--> 655 return _read(filepath_or_buffer, kwds)
656
657 parser_f.__name__ = name
~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
403
404 # Create the parser.
--> 405 parser = TextFileReader(filepath_or_buffer, **kwds)
406
407 if chunksize or iterator:
~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
762 self.options['has_index_names'] = kwds['has_index_names']
763
--> 764 self._make_engine(self.engine)
765
766 def close(self):
~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
983 def _make_engine(self, engine='c'):
984 if engine == 'c':
--> 985 self._engine = CParserWrapper(self.f, **self.options)
986 else:
987 if engine == 'python':
~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1668 (index_names, self.names,
1669 self.index_col) = _clean_index_names(self.names,
-> 1670 self.index_col)
1671
1672 if self.index_names is None:
~/miniconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _clean_index_names(columns, index_col)
3081 break
3082 else:
-> 3083 name = cp_cols[c]
3084 columns.remove(name)
3085 index_names.append(name)
IndexError: list index out of range Same is true if I do |
That's not a bug. |
I see. Could be worth clarifying in the docstring. Thanks! |
Exactly. That's part of what I was proposing you do in #9098. |
Starting point:
http://pandas.pydata.org/pandas-docs/stable/io.html#index-columns-and-trailing-delimiters
If there is one more column of data than there are colum names, usecols exhibits some (at least for me) unintuitive behavior:
I was expecting it to be equal to
I am not sure if my expectation is unfounded, though, and that this behavior is indeed intentional?
The text was updated successfully, but these errors were encountered: