-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH/DOC/CLN: Document arguments and reconcile C and Python engines for read_csv #12686
Comments
I think |
thanks for the list @kawochen. There are some issues which are relevant for some of the points. Can you link them when you have a chance (put next to the check boxes) |
@kawochen : add |
|
closes #5888, xref #12686 Author: Chris <[email protected]> Closes #13293 from chris-b1/low-memory-doc and squashes the following commits: daf9bca [Chris] DOC: low_memory in read_csv
@kawochen, @jreback : |
updated |
@jreback : no, it's still CParser-only but just move it to the list above with an unchecked box. We would still want to give that functionality to the Python parser. |
I checked the box; its enough. |
Huh? The original classification was that it was undocumented AND only supported in the C engine. The checkbox gives the impression that both issues are resolved. |
@gfyoung better? |
Yes! That works. Thanks, @jreback ! |
Title is self-explanatory. xref #12686 - I don't quite understand why these are marked (if at all) as internal to the C engine only, as the benefits for having these options accepted for the Python engine is quite clear based on the documentation I added as well. Implementation simply just calls the already-written function in `pandas/parsers.pyx` - as it isn't specific to the `TextReader` class, crossing over to grab this function from Cython (instead of duplicating in pure Python) seems reasonable while maintaining that separation between the C and Python engines. Author: gfyoung <[email protected]> Closes #13323 from gfyoung/python-engine-compact-ints and squashes the following commits: 95f7ba8 [gfyoung] ENH: Add support for compact_ints and use_unsigned in Python engine
I updated |
So I wasn't 100% correct when I said that `float_precision` was documented <a href="#12686 (comment) ecomment-222684918">here<a/>. It was well documented internally for `TextParser` and in a section for `io.rst`, but it wasn't listed formally in the parameters for the `read_csv` documentation. Author: gfyoung <[email protected]> Closes #13377 from gfyoung/float-precision-doc and squashes the following commits: a9eed16 [gfyoung] DOC: actually document float_precision in read_csv
@jorisvandenbossche : You can check-off the |
@gfyoung are all of the open items on the check boxes still open? (IOW have we missed checking anything off). anything we should just take off (and/or just document)? |
All of those are valid differences that should be patched, though the implementation is not straightforward for any of them. It would be worthwhile to double check that they are properly documented for now. |
thanks @gfyoung more docs always welcome! |
@gfyoung can you review the top section and see where we are? |
@jreback : At the time of commenting, this list is correct and up-to-date with our progress. |
@gfyoung ok thanks. feel free to issue PR's to close some of these :> |
Known differences between Python & C engines
Update here
Features supported in the Python engine only
skipfooter
/skip_footer
(API: skipfooter or skip_footer? read_csv can't seem to decide #13349) - num of lines at the bottom of the file to skipsep=None
) - deduce thesep
ENH: add read_csv sniffing (sep=None) for C engine #9645sep
- regular expression/multicharacter seperatorFeatures supported in the C engine only
dtype
- specify dtype for providing dtype or{column_name: dtype}
(related as this is a conflicting option: read_csv dtype argument not working when there is a footer #5232) (done in API: add dtype= option to python parser #14295)warn_bad_lines
- issue a warnings for each bad line (ENH: Support malformed row handling in Python engine #15925)error_bad_lines
- ifFalse
, drop bad lines instead of raising (ENH: Support malformed row handling in Python engine #15925)lineterminator
- specify the line terminating characterfloat
fornrows
but Python engine raises read_csv python engine errors #10476 (closed by BUG: Properly validate and parse nrows in read_csv #13275)decimal
option, ENH: support decimal option in PythonParser #12933 (closed by ENH: support decimal option in PythonParser #12933 #13189)delim_whitespace
ENH: Python parser now accepts delim_whitespace=True #12958na_filter
ENH: add support for na_filter in Python engine #13321float_precision
, documented here and here, DOC: actually document float_precision in read_csv #13377In C engine only (but undocumented)
low_memory
(PR DOC: low_memory in read_csv #13293)marked as internal on C engine only (maybe be a bit louder about this in the internal code)
buffer_lines
DEPR, DOC: Deprecate buffer_lines in read_csv #13360Undocumented arguments to
read_csv
doublequote
DOC: document doublequote in read_csv #13368compact_ints
API: Deprecate compact_ints and use_unsigned in read_csv #13323use_unsigned
API: Deprecate compact_ints and use_unsigned in read_csv #13323as_recarray
#(DEPR: Deprecate as_recarray in read_csv #13373)memory_map
, IO: memory_map kw in read_csv #7477, DOC, ENH: Support memory_map for Python engine #13381Differences
names
and its length with respect tousecols
API/DOC: Specification fornames
parameter in read_csv #16469na_values
whenconverters
is also present. Inconsistent Handling of na_values and converters in read_csv #13302The text was updated successfully, but these errors were encountered: