-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscellaneous fixes to BigQuery connector #959
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Dec 14, 2023
austinweisgrau
force-pushed
the
bigquery_fixes
branch
7 times, most recently
from
December 20, 2023 19:59
f32f0c2
to
f58c4d9
Compare
austinweisgrau
force-pushed
the
bigquery_fixes
branch
from
January 11, 2024 23:13
f58c4d9
to
42593cc
Compare
austinweisgrau
force-pushed
the
bigquery_fixes
branch
2 times, most recently
from
January 23, 2024 23:35
eb7384b
to
b59e64b
Compare
If a Parsons Table column has values like `[None, None, True, False]`, the BigQuery connector will infer that the appropriate type for this column is NoneType, which it will translate into a STRING type. This change ensures that types returned by petl.typecheck() will choose the first available type that isn't 'NoneType' if that is available.
Source types ultimately come from `petl.typeset`, which calls `type(v).__name__`. This call does not include source module, but only the type name itself. e.g. `date` and not `datetime.date`
It looks like this line was accidentally commented out
Python datetime objects may represent timestamps or datetimes in BigQuery, depending on whether they do or do not have a timezone attached.
Always passing a schema to BigQuery is not necessary, and introduces situations for provided schema to mismatch actual schema. When table already exists in BigQuery, fetch the schema from BigQuery
austinweisgrau
force-pushed
the
bigquery_fixes
branch
from
January 30, 2024 18:45
748124a
to
2118cab
Compare
FYI all these force pushes are rebasing on top of main when there are new commits merged into main |
cmdelrio
approved these changes
Jan 30, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming you've tested it, looks great!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The BigQuery.copy() method does not seem to work for a variety of situations, fixes are made here as I encounter these issues and resolve them.
Fixed BigQuery type map
Source types ultimately come from
petl.typeset
, which callstype(v).__name__
. This call does not include source module, but onlythe type name itself. e.g.
date
and notdatetime.date
Prefer not NoneType when inferring schema for Table load to BigQuery
If a Parsons Table column has values like
[None, None, True, False]
,the BigQuery connector will infer that the appropriate type for this
column is NoneType, which it will translate into a STRING type.
This change ensures that types returned by petl.typecheck() will
choose the first available type that isn't 'NoneType' if that is
available.
Fix commented out row to use job_config passed as argument to BigQuery.copy()
It looks like this line was accidentally commented out
Parse python datetime objects for BigQuery as datetime or timestamp
Python datetime objects may represent timestamps or datetimes in
BigQuery, depending on whether they do or do not have a timezone
attached.
Before this change, a parsons Table that included datetimes with
timestamps would fail to load to BigQuery because BigQuery
would reject datetime strings with timezone information as the
"datetime" data type.
Only generate schema for BigQuery when table does not already exist
Always passing a schema to BigQuery is not necessary, and introduces
situations for provided schema to mismatch actual schema.
When table already exists in BigQuery, fetch the schema from BigQuery