Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: read_csv keywords: keep_date_col, delim_whitespace #55569

Open
jbrockmendel opened this issue Oct 18, 2023 · 14 comments
Open

DEPR: read_csv keywords: keep_date_col, delim_whitespace #55569

jbrockmendel opened this issue Oct 18, 2023 · 14 comments
Labels
API Design Deprecate Functionality to remove in pandas IO CSV read_csv, to_csv

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Oct 18, 2023

Issue Description

  • delim_whitespace: "Equivalent to setting sep='\\s+'" -> seems redundant
  • dialect: does anyone use this? looks like it overrides several other keywords
  • cache_dates: once ENH/API: resolution inference in vectorized datetime parsing #55564 is done, this option should not have any behavior impact, so seems unnecessary (also in to_datetime)
  • parse_dates: not the entire keyword, just the list-of-list and dict-of-list variants. seems like users can handle those cases in post-processing.
  • keep_date_col: only relevant in the parse_dates cases above.

Others? ive never used verbose, na_filter, keep_default_na, or na_values.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 18, 2023
@simonjayhawkins simonjayhawkins added API Design IO CSV read_csv, to_csv Deprecate Functionality to remove in pandas and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 18, 2023
@lithomas1
Copy link
Member

+1 on delim_whitespace, cache_dates, verbose, na_filter, keep_default_na.
-1 on na_values (I think some people use it at least)

I also don't think keeping dialect hurts to much either.

@jbrockmendel
Copy link
Member Author

I could also be convinced to deprecate converters. seems just as easy to do that in post-processing

@rmhowe425
Copy link
Contributor

take

@rmhowe425
Copy link
Contributor

@lithomas1 @jbrockmendel Can I file a PR for this issue?

It looks like the Needs Triage tag has been removed, but we're still deciding on which parameters we'll be deprecating?

@rmhowe425
Copy link
Contributor

Also thinking it would make sense to create a new decorator in pandas.util._decorators.py to handle the deprecation of multiple keyword arguments. That way we can minimize the number of conditional statements in the actual implementation for read_csv.

I already see that there is an IF statement being used to handle the deprecation for infer_datetime_format.

@lithomas1
Copy link
Member

You're free to deprecate everything that we agreed on.

For doing many deprecations, it might be worth studying
https://github.com/pandas-dev/pandas/pull/48849/files#diff-c2c56de582ca256e79e43c1ef22dadfe0628b713bba932b1f599458105c5bc84L497-L508

This was the old deprecation machinery for read_csv.

@rmhowe425
Copy link
Contributor

@lithomas1 thx.

The link provided appears to be the final, formal deprecation of certain features of read_csv. For this particular PR, we would want to alert users of upcoming changes by leveraging calls to warnings.warn wouldn't we?

Something similar to this: https://github.com/pandas-dev/pandas/pull/53710/files#diff-1a2e3df0db7dd8bddc2ec4bff9de8a7a55e328e6c32e2cecde761dc9549fcd46R9351

@lithomas1
Copy link
Member

Looks like the link broke.

I was talking about L497-508 of readers.py that was removed there.

@lithomas1
Copy link
Member

i.e. _deprecated_defaults

@sm-Fifteen
Copy link

Has this gone through? I see it listed in the deprecations for Pandas 2.2:

Deprecated support for combining parsed datetime columns in read_csv() along with the keep_date_col keyword (GH 55569)

...and in the I/O guide section on CSV:

Caution

Deprecated since version 2.2.0: Combining date columns inside read_csv is deprecated. Use pd.to_datetime on the relevant result columns instead.

...but this issue is currently still open?

I'm personally very fond of combining datetime components in the read_csv parameter list, and would hate to see it removed. CSV is a pretty messy format, after all, and I've seen dates, times and timezones stored separately several times before. Being able to group them into a proper timestamp column (especially in indexes and multi-indexes, which they are often a part of) at parsing-time instead of separately both improves readability and avoids the need to manipulate indexes directly.

@rmhowe425 rmhowe425 removed their assignment Apr 3, 2024
@rmhowe425
Copy link
Contributor

@sm-Fifteen It has not yet been completed. Feel free to assign the issue to yourself if you want to work on it

@sm-Fifteen
Copy link

@rmhowe425 : Shouldn't the deprecation notices in the doc be removed, in that case?

@rmhowe425
Copy link
Contributor

rmhowe425 commented Apr 3, 2024

That certainly makes sense to me. Not quit sure why they're in the doc given that my PR was never approved. 🤔

@lithomas1
Copy link
Member

@jbrockmendel deprecated this in #56569

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Deprecate Functionality to remove in pandas IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants