Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous fixes to BigQuery connector #959

Merged
merged 8 commits into from
Jan 31, 2024

Conversation

austinweisgrau
Copy link
Collaborator

@austinweisgrau austinweisgrau commented Dec 14, 2023

The BigQuery.copy() method does not seem to work for a variety of situations, fixes are made here as I encounter these issues and resolve them.

Fixed BigQuery type map

Source types ultimately come from petl.typeset, which calls
type(v).__name__. This call does not include source module, but only
the type name itself. e.g. date and not datetime.date

Prefer not NoneType when inferring schema for Table load to BigQuery

If a Parsons Table column has values like [None, None, True, False],
the BigQuery connector will infer that the appropriate type for this
column is NoneType, which it will translate into a STRING type.

This change ensures that types returned by petl.typecheck() will
choose the first available type that isn't 'NoneType' if that is
available.

Fix commented out row to use job_config passed as argument to BigQuery.copy()

It looks like this line was accidentally commented out

Parse python datetime objects for BigQuery as datetime or timestamp

Python datetime objects may represent timestamps or datetimes in
BigQuery, depending on whether they do or do not have a timezone
attached.

Before this change, a parsons Table that included datetimes with
timestamps would fail to load to BigQuery because BigQuery
would reject datetime strings with timezone information as the
"datetime" data type.

Only generate schema for BigQuery when table does not already exist

Always passing a schema to BigQuery is not necessary, and introduces
situations for provided schema to mismatch actual schema.

When table already exists in BigQuery, fetch the schema from BigQuery

If a Parsons Table column has values like `[None, None, True, False]`,
the BigQuery connector will infer that the appropriate type for this
column is NoneType, which it will translate into a STRING type.

This change ensures that types returned by petl.typecheck() will
choose the first available type that isn't 'NoneType' if that is
available.
Source types ultimately come from `petl.typeset`, which calls
`type(v).__name__`. This call does not include source module, but only
the type name itself. e.g. `date` and not `datetime.date`
It looks like this line was accidentally commented out
Python datetime objects may represent timestamps or datetimes in
BigQuery, depending on whether they do or do not have a timezone
attached.
Always passing a schema to BigQuery is not necessary, and introduces
situations for provided schema to mismatch actual schema.

When table already exists in BigQuery, fetch the schema from BigQuery
@austinweisgrau
Copy link
Collaborator Author

FYI all these force pushes are rebasing on top of main when there are new commits merged into main

Copy link
Contributor

@cmdelrio cmdelrio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you've tested it, looks great!

@cmdelrio cmdelrio merged commit e515096 into move-coop:main Jan 31, 2024
1 check passed
@austinweisgrau austinweisgrau deleted the bigquery_fixes branch January 31, 2024 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants