-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use name and type comparising when appending a dataframe into table #14
Conversation
…en appending to a table
Codecov Report
@@ Coverage Diff @@
## master #14 +/- ##
===========================================
- Coverage 75.03% 37.81% -37.22%
===========================================
Files 4 4
Lines 1450 1457 +7
===========================================
- Hits 1088 551 -537
- Misses 362 906 +544
Continue to review full report at Codecov.
|
can you add some tests? |
also can you add a release note to here: https://github.com/pydata/pandas-gbq/blob/master/docs/source/changelog.rst (I just added this) |
Added a test and supplied changelog. |
this is going to close #13 right? |
Yes, that's right. |
docs/source/changelog.rst
Outdated
-------------- | ||
|
||
Fixed an issue with appending to a BigQuery table where fields have modes (NULLABLE,REQUIRED,REPEATED). The changes concern solely the comparision of the local (DataFrame) and remote (BQ) schema in GbqConnector.verify_schema function. The fix is to omit other field attributes than name and type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add (:issue:`13`)
.
is this also the same soln as pandas-dev/pandas#13086 (well is the issue the same)? |
Yes, it's the same issue. Though there was an issue with schema having descriptions, my case was just with modes – which violations could be made an exception case in some future time. |
@mremes is there anything you can take / tests from that issue for this to be more robust? (e.g. a test?) |
'type': 'TIMESTAMP'}]} | ||
|
||
self.table.create(TABLE_ID + test_id, test_schema_1) | ||
self.assertTrue(self.sut.verify_schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just use
assert self.sut.verify_schema(.......), .....
the self.assertTrue
was a nose convention, now using pytest so want to switch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I just looked the convention from the tests above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah, I was going to change it on merge...but forgot...no worries
@jreback the fix should be a solid approach both to mine issue and the pandas-dev issue you referenced. Because the local schema is constructed from DF's column names and types, it's approariate to select only a name,field-subset of BQ fields when comparing. I don't see a point in e.g. adding multiple discarded fields or anything like that. |
@mremes ok great. merging. |
thanks @mremes all set (though for some reason inter-sphinx links not working).... |
I modified GbqConnector.verify_schema function to parse name and type from the remote schema (basically dropping mode) and include those in the compared fields.
Currently, when appending to a BQ table, comparison between the destination table's schema and a dataframe schema is done over superset of a BQ schema definition (name, type, mode) when _generate_bq_schema parses only name and type from a dataframe.
IMO it would be inconvenient to make the mode check in the module by generating completeness of columns (includes null values or not). So raising a generic GBQ error is more convenient here.
closes #13