Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!: to_gbq loads unit8 columns to BigQuery INT64 instead of STRING #814

Merged
merged 6 commits into from
Sep 23, 2024

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Sep 20, 2024

fix!: to_gbq loads unit8 columns to BigQuery INT64 instead of STRING (#814)

fix!: to_gbq loads naive (no timezone) columns to BigQuery DATETIME instead of TIMESTAMP (#814)
fix!: to_gbq loads object column containing bool values to BOOLEAN instead of STRING (#814)
fix!: to_gbq loads object column containing dictionary values to STRUCT instead of STRING (#814)
deps: min pyarrow is now 4.0.0 to support compliant nested types (#814)
Release-As: 0.24.0

Note to Googlers, this copies some pandas -> BigQuery logic from both https://github.com/googleapis/python-bigquery and https://github.com/googleapis/python-bigquery-dataframes as part of an effort to reduce redundancy across code bases. My intention is to make those packages depend on pandas-gbq for pandas -> BigQuery logic.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #452, #105, #616, #450
🦕

fix!: `to_gbq` loads naive (no timezone) columns to BigQuery DATETIME instead of TIMESTAMP
fix!: `to_gbq` loads object column containing bool values to BOOLEAN instead of STRING
fix!: `to_gbq` loads object column containing dictionary values to STRUCT instead of STRING
@tswast tswast requested review from a team as code owners September 20, 2024 20:53
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigquery Issues related to the googleapis/python-bigquery-pandas API. labels Sep 20, 2024
@tswast tswast requested review from chelsea-lin and Linchin and removed request for farhan0102 September 20, 2024 20:56
@tswast
Copy link
Collaborator Author

tswast commented Sep 20, 2024

_________________ ERROR collecting tests/system/test_to_gbq.py _________________
[tests/system/test_to_gbq.py:392](https://cs.corp.google.com/piper///depot/google3/tests/system/test_to_gbq.py?l=392): in <module>
    dtype=pandas.ArrowDtype(
[.nox/system-3-8/lib/python3.8/site-packages/pandas/__init__.py:258](https://cs.corp.google.com/piper///depot/google3/.nox/system-3-8/lib/python3.8/site-packages/pandas/__init__.py?l=258): in __getattr__
    raise AttributeError(f"module 'pandas' has no attribute '{name}'")
E   AttributeError: module 'pandas' has no attribute 'ArrowDtype'

Looks like I need to update the tests to be compatible with older pandas.

@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024
@tswast
Copy link
Collaborator Author

tswast commented Sep 23, 2024

@Linchin @chelsea-lin please take a look. I got it working on Python 3.8 by upgrading the minimum pyarrow version from 3.0.0 (January 2021) to 4.0.0 (April 2021).

Copy link

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,156 @@
# Copyright (c) 2019 pandas-gbq Authors All rights reserved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2024

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains some code I copied from a file dated to 2019, so I don't think I should update this.

@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024
@tswast tswast merged commit 107bb40 into googleapis:main Sep 23, 2024
25 checks passed
@tswast tswast deleted the b323176126-issue300-streaming-to_gbq branch September 23, 2024 17:18
@@ -1219,9 +1220,16 @@ def _generate_bq_schema(df, default_type="STRING"):
be overridden: https://github.com/pydata/pandas-gbq/issues/218, this
method can be removed after there is time to migrate away from this
method."""
from pandas_gbq import schema
fields = pandas_gbq.schema.pandas_to_bigquery.dataframe_to_bigquery_fields(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should have un-deprecated generate_bq_schema for use in bigframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pyarrow.lib.ArrowTypeError: Expected bytes, got a 'dict' object
3 participants