-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Support pandas future default string dtype #43683
Labels
Breaking Change
Includes a breaking change to the API
Component: Python
Priority: Blocker
Marks a blocker for the release
Milestone
Comments
jorisvandenbossche
added a commit
to jorisvandenbossche/arrow
that referenced
this issue
Sep 23, 2024
@jorisvandenbossche should I move this to 19.0.0? |
jorisvandenbossche
added a commit
to jorisvandenbossche/arrow
that referenced
this issue
Nov 6, 2024
…as-string-dtype
jorisvandenbossche
added a commit
to jorisvandenbossche/arrow
that referenced
this issue
Dec 6, 2024
…as-string-dtype
jorisvandenbossche
added a commit
to jorisvandenbossche/arrow
that referenced
this issue
Dec 11, 2024
…as-string-dtype
jorisvandenbossche
added a commit
to jorisvandenbossche/arrow
that referenced
this issue
Jan 3, 2025
…as-string-dtype
jorisvandenbossche
added a commit
to jorisvandenbossche/arrow
that referenced
this issue
Jan 5, 2025
…as-string-dtype
jorisvandenbossche
added a commit
to jorisvandenbossche/arrow
that referenced
this issue
Jan 9, 2025
…as-string-dtype
jorisvandenbossche
added a commit
that referenced
this issue
Jan 9, 2025
…44195) ### Rationale for this change With pandas' [PDEP-14](https://pandas.pydata.org/pdeps/0014-string-dtype.html) proposal, pandas is planning to introduce a default string dtype in pandas 3.0 (instead of the current object dtype). This will become the default in pandas 3.0, and can be enabled with an option in the upcoming pandas 2.3 (`pd.options.future.infer_string = True`). To prepare for that, we should start using that string dtype in `to_pandas()` conversions when that option is enabled. ### What changes are included in this PR? - If pandas >= 3.0 is used or the pandas option is enabled, ensure that `to_pandas()` calls use the default string dtype of pandas for string-like columns (string, large_string, string_view) ### Are these changes tested? It is tested in the pandas-nightly crossbow build. There is still one failure that is because of a bug on the pandas side (pandas-dev/pandas#59879) ### Are there any user-facing changes? **This PR includes breaking changes to public APIs.** Depending on the version of pandas, `to_pandas()` will change to use pandas' string dtype instead of object dtype. This is a breaking user-facing change, but essentially just following the equivalent change in default dtype on the pandas side. * GitHub Issue: #43683 Lead-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
Issue resolved by pull request 44195 |
amoeba
pushed a commit
that referenced
this issue
Jan 9, 2025
…44195) ### Rationale for this change With pandas' [PDEP-14](https://pandas.pydata.org/pdeps/0014-string-dtype.html) proposal, pandas is planning to introduce a default string dtype in pandas 3.0 (instead of the current object dtype). This will become the default in pandas 3.0, and can be enabled with an option in the upcoming pandas 2.3 (`pd.options.future.infer_string = True`). To prepare for that, we should start using that string dtype in `to_pandas()` conversions when that option is enabled. ### What changes are included in this PR? - If pandas >= 3.0 is used or the pandas option is enabled, ensure that `to_pandas()` calls use the default string dtype of pandas for string-like columns (string, large_string, string_view) ### Are these changes tested? It is tested in the pandas-nightly crossbow build. There is still one failure that is because of a bug on the pandas side (pandas-dev/pandas#59879) ### Are there any user-facing changes? **This PR includes breaking changes to public APIs.** Depending on the version of pandas, `to_pandas()` will change to use pandas' string dtype instead of object dtype. This is a breaking user-facing change, but essentially just following the equivalent change in default dtype on the pandas side. * GitHub Issue: #43683 Lead-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
amoeba
pushed a commit
that referenced
this issue
Jan 11, 2025
…44195) ### Rationale for this change With pandas' [PDEP-14](https://pandas.pydata.org/pdeps/0014-string-dtype.html) proposal, pandas is planning to introduce a default string dtype in pandas 3.0 (instead of the current object dtype). This will become the default in pandas 3.0, and can be enabled with an option in the upcoming pandas 2.3 (`pd.options.future.infer_string = True`). To prepare for that, we should start using that string dtype in `to_pandas()` conversions when that option is enabled. ### What changes are included in this PR? - If pandas >= 3.0 is used or the pandas option is enabled, ensure that `to_pandas()` calls use the default string dtype of pandas for string-like columns (string, large_string, string_view) ### Are these changes tested? It is tested in the pandas-nightly crossbow build. There is still one failure that is because of a bug on the pandas side (pandas-dev/pandas#59879) ### Are there any user-facing changes? **This PR includes breaking changes to public APIs.** Depending on the version of pandas, `to_pandas()` will change to use pandas' string dtype instead of object dtype. This is a breaking user-facing change, but essentially just following the equivalent change in default dtype on the pandas side. * GitHub Issue: #43683 Lead-authored-by: Joris Van den Bossche <[email protected]> Co-authored-by: Raúl Cumplido <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Breaking Change
Includes a breaking change to the API
Component: Python
Priority: Blocker
Marks a blocker for the release
With pandas' PDEP-14 proposal, pandas is planning to introduce a default string dtype in pandas 3.0 (instead of the current object dtype).
This will become the default in pandas 3.0, and can be enabled with an option in the upcoming pandas 2.3 (
pd.options.future.infer_string = True
). To prepare for that, we should start using that string dtype into_pandas()
conversions when that option is enabled.The text was updated successfully, but these errors were encountered: