Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use name and type comparising when appending a dataframe into table #14

Closed
wants to merge 7 commits into from
Closed

Conversation

ghost
Copy link

@ghost ghost commented Feb 26, 2017

I modified GbqConnector.verify_schema function to parse name and type from the remote schema (basically dropping mode) and include those in the compared fields.

Currently, when appending to a BQ table, comparison between the destination table's schema and a dataframe schema is done over superset of a BQ schema definition (name, type, mode) when _generate_bq_schema parses only name and type from a dataframe.

IMO it would be inconvenient to make the mode check in the module by generating completeness of columns (includes null values or not). So raising a generic GBQ error is more convenient here.

closes #13

@codecov-io
Copy link

codecov-io commented Feb 26, 2017

Codecov Report

Merging #14 into master will decrease coverage by -37.22%.
The diff coverage is 14.28%.

@@             Coverage Diff             @@
##           master      #14       +/-   ##
===========================================
- Coverage   75.03%   37.81%   -37.22%     
===========================================
  Files           4        4               
  Lines        1450     1457        +7     
===========================================
- Hits         1088      551      -537     
- Misses        362      906      +544
Impacted Files Coverage Δ
pandas_gbq/gbq.py 30.97% <0%> (-46.11%)
pandas_gbq/tests/test_gbq.py 39.55% <16.66%> (-46.35%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 065eb15...bf8c378. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented Feb 26, 2017

can you add some tests?

@jreback
Copy link
Contributor

jreback commented Feb 26, 2017

also can you add a release note to here: https://github.com/pydata/pandas-gbq/blob/master/docs/source/changelog.rst (I just added this)

@ghost
Copy link
Author

ghost commented Feb 26, 2017

Added a test and supplied changelog.

@jreback
Copy link
Contributor

jreback commented Feb 26, 2017

this is going to close #13 right?

@ghost
Copy link
Author

ghost commented Feb 26, 2017

Yes, that's right.

--------------

Fixed an issue with appending to a BigQuery table where fields have modes (NULLABLE,REQUIRED,REPEATED). The changes concern solely the comparision of the local (DataFrame) and remote (BQ) schema in GbqConnector.verify_schema function. The fix is to omit other field attributes than name and type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add (:issue:`13`).

@jreback
Copy link
Contributor

jreback commented Feb 26, 2017

is this also the same soln as pandas-dev/pandas#13086 (well is the issue the same)?

@ghost
Copy link
Author

ghost commented Feb 26, 2017

Yes, it's the same issue. Though there was an issue with schema having descriptions, my case was just with modes – which violations could be made an exception case in some future time.

@jreback
Copy link
Contributor

jreback commented Feb 26, 2017

@mremes is there anything you can take / tests from that issue for this to be more robust? (e.g. a test?)

'type': 'TIMESTAMP'}]}

self.table.create(TABLE_ID + test_id, test_schema_1)
self.assertTrue(self.sut.verify_schema(
Copy link
Contributor

@jreback jreback Feb 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use

assert self.sut.verify_schema(.......), .....

the self.assertTrue was a nose convention, now using pytest so want to switch

Copy link
Author

@ghost ghost Feb 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I just looked the convention from the tests above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, I was going to change it on merge...but forgot...no worries

@ghost
Copy link
Author

ghost commented Feb 26, 2017

@jreback the fix should be a solid approach both to mine issue and the pandas-dev issue you referenced. Because the local schema is constructed from DF's column names and types, it's approariate to select only a name,field-subset of BQ fields when comparing.

I don't see a point in e.g. adding multiple discarded fields or anything like that.

@jreback
Copy link
Contributor

jreback commented Feb 26, 2017

@mremes ok great. merging.

@jreback jreback closed this in 89bf82d Feb 26, 2017
@jreback jreback added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label Feb 26, 2017
@jreback jreback added this to the 0.2.0 milestone Feb 26, 2017
@jreback
Copy link
Contributor

jreback commented Feb 26, 2017

thanks @mremes

all set (though for some reason inter-sphinx links not working)....
https://pandas-gbq.readthedocs.io/en/latest/changelog.html#id2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

to_gbq fails to append to table because of alleged schema mismatch
3 participants