Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Support pandas future default string dtype #43683

Closed
jorisvandenbossche opened this issue Aug 14, 2024 · 2 comments
Closed

[Python] Support pandas future default string dtype #43683

jorisvandenbossche opened this issue Aug 14, 2024 · 2 comments
Assignees
Labels
Breaking Change Includes a breaking change to the API Component: Python Priority: Blocker Marks a blocker for the release
Milestone

Comments

@jorisvandenbossche
Copy link
Member

With pandas' PDEP-14 proposal, pandas is planning to introduce a default string dtype in pandas 3.0 (instead of the current object dtype).

This will become the default in pandas 3.0, and can be enabled with an option in the upcoming pandas 2.3 (pd.options.future.infer_string = True). To prepare for that, we should start using that string dtype in to_pandas() conversions when that option is enabled.

@raulcd
Copy link
Member

raulcd commented Oct 10, 2024

@jorisvandenbossche should I move this to 19.0.0?

@raulcd raulcd modified the milestones: 18.0.0, 19.0.0 Oct 11, 2024
jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Nov 6, 2024
jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Dec 6, 2024
jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Dec 11, 2024
@raulcd raulcd added the Priority: Blocker Marks a blocker for the release label Dec 18, 2024
jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Jan 3, 2025
jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Jan 5, 2025
@amoeba amoeba added the Breaking Change Includes a breaking change to the API label Jan 5, 2025
jorisvandenbossche added a commit to jorisvandenbossche/arrow that referenced this issue Jan 9, 2025
jorisvandenbossche added a commit that referenced this issue Jan 9, 2025
…44195)

### Rationale for this change

With pandas' [PDEP-14](https://pandas.pydata.org/pdeps/0014-string-dtype.html) proposal, pandas is planning to introduce a default string dtype in pandas 3.0 (instead of the current object dtype).

This will become the default in pandas 3.0, and can be enabled with an option in the upcoming pandas 2.3 (`pd.options.future.infer_string = True`). To prepare for that, we should start using that string dtype in `to_pandas()` conversions when that option is enabled.

### What changes are included in this PR?

- If pandas >= 3.0 is used or the pandas option is enabled, ensure that `to_pandas()` calls use the default string dtype of pandas for string-like columns (string, large_string, string_view)

### Are these changes tested?

It is tested in the pandas-nightly crossbow build.

There is still one failure that is because of a bug on the pandas side (pandas-dev/pandas#59879)

### Are there any user-facing changes?

**This PR includes breaking changes to public APIs.** Depending on the version of pandas, `to_pandas()` will change to use pandas' string dtype instead of object dtype. This is a breaking user-facing change, but essentially just following the equivalent change in default dtype on the pandas side.

* GitHub Issue: #43683

Lead-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
@jorisvandenbossche
Copy link
Member Author

Issue resolved by pull request 44195
#44195

amoeba pushed a commit that referenced this issue Jan 9, 2025
…44195)

### Rationale for this change

With pandas' [PDEP-14](https://pandas.pydata.org/pdeps/0014-string-dtype.html) proposal, pandas is planning to introduce a default string dtype in pandas 3.0 (instead of the current object dtype).

This will become the default in pandas 3.0, and can be enabled with an option in the upcoming pandas 2.3 (`pd.options.future.infer_string = True`). To prepare for that, we should start using that string dtype in `to_pandas()` conversions when that option is enabled.

### What changes are included in this PR?

- If pandas >= 3.0 is used or the pandas option is enabled, ensure that `to_pandas()` calls use the default string dtype of pandas for string-like columns (string, large_string, string_view)

### Are these changes tested?

It is tested in the pandas-nightly crossbow build.

There is still one failure that is because of a bug on the pandas side (pandas-dev/pandas#59879)

### Are there any user-facing changes?

**This PR includes breaking changes to public APIs.** Depending on the version of pandas, `to_pandas()` will change to use pandas' string dtype instead of object dtype. This is a breaking user-facing change, but essentially just following the equivalent change in default dtype on the pandas side.

* GitHub Issue: #43683

Lead-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
amoeba pushed a commit that referenced this issue Jan 11, 2025
…44195)

### Rationale for this change

With pandas' [PDEP-14](https://pandas.pydata.org/pdeps/0014-string-dtype.html) proposal, pandas is planning to introduce a default string dtype in pandas 3.0 (instead of the current object dtype).

This will become the default in pandas 3.0, and can be enabled with an option in the upcoming pandas 2.3 (`pd.options.future.infer_string = True`). To prepare for that, we should start using that string dtype in `to_pandas()` conversions when that option is enabled.

### What changes are included in this PR?

- If pandas >= 3.0 is used or the pandas option is enabled, ensure that `to_pandas()` calls use the default string dtype of pandas for string-like columns (string, large_string, string_view)

### Are these changes tested?

It is tested in the pandas-nightly crossbow build.

There is still one failure that is because of a bug on the pandas side (pandas-dev/pandas#59879)

### Are there any user-facing changes?

**This PR includes breaking changes to public APIs.** Depending on the version of pandas, `to_pandas()` will change to use pandas' string dtype instead of object dtype. This is a breaking user-facing change, but essentially just following the equivalent change in default dtype on the pandas side.

* GitHub Issue: #43683

Lead-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking Change Includes a breaking change to the API Component: Python Priority: Blocker Marks a blocker for the release
Projects
None yet
Development

No branches or pull requests

3 participants