Skip to content

Commit

Permalink
Use name and type comparising when appending a dataframe into table
Browse files Browse the repository at this point in the history
I modified GbqConnector.verify_schema function to parse name and type
from the remote schema (basically dropping mode) and include those in
the compared fields.    Currently, when appending to a BQ table,
comparison between the destination table's schema and a dataframe
schema is done over superset of a BQ schema definition (name, type,
mode) when _generate_bq_schema parses only name and type from a
dataframe.    IMO it would be inconvenient to make the mode check in
the module by generating completeness of columns (includes null values
or not). So raising a generic GBQ error is more convenient here.
closes #13

Author: Matti Remes <[email protected]>

Closes #14 from mremes/master and squashes the following commits:

bf8c378 [Matti Remes] added reference to issue #13
77b1fd5 [Matti Remes] changelog for verify_schema changes
70d08ef [Matti Remes] make the syntax of the test flake-pretty
45826f1 [Matti Remes] Merge remote-tracking branch 'upstream/master'
66aa616 [Matti Remes] Added test for validate_schema ignoring field mode when comparing schemas
5dafd55 [Matti Remes] fix bug with selecting key
631d66c [Matti Remes] Use name and type of fields for comparing remote and local schemas when appending to a table
  • Loading branch information
mremes authored and jreback committed Feb 26, 2017
1 parent 570c913 commit 89bf82d
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 6 deletions.
6 changes: 4 additions & 2 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
Changelog
=========

0.2.0 / 2017-?
--------------
0.2.0 / 2017-03-??
------------------

- Bug with appending to a BigQuery table where fields have modes (NULLABLE,REQUIRED,REPEATED) specified. These modes were compared versus the remote schema and writing a table via ``to_gbq`` would previously raise. (:issue:`13`)

This comment has been minimized.

Copy link
@jreback

jreback Feb 26, 2017

Contributor

@jorisvandenbossche you have any idea why the issue reference is not showing up correctly? (its the same when I build locally and on RTD): https://pandas-gbq.readthedocs.io/en/latest/changelog.html#id2

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Feb 26, 2017

Seems OK now? At least the link works for me

This comment has been minimized.

Copy link
@jreback

jreback Feb 26, 2017

Contributor

yeah I pushed again, I didn't have all of the needed extensions defined


0.1.2 / 2017-02-23
------------------
Expand Down
5 changes: 2 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,5 @@
intersphinx_mapping = {'https://docs.python.org/': None}

extlinks = {'issue': ('https://github.com/pydata/pandas-gbq/issues/%s',
'GH'),
'wiki': ('https://github.com/pydata/pandas-gbq/wiki/%s',
'wiki ')}
'GH#'),
'pr': ('https://github.com/pydata/pandas-gbq/pull/%s', 'GH#')}

This comment has been minimized.

Copy link
@jorisvandenbossche

jorisvandenbossche Feb 26, 2017

You don't really need this (certainly if you use the same GH# label), as links to PRs using /issues/number work as well

This comment has been minimized.

Copy link
@jreback

jreback Feb 26, 2017

Contributor

yeah was copying from another project :>

6 changes: 5 additions & 1 deletion pandas_gbq/gbq.py
Original file line number Diff line number Diff line change
Expand Up @@ -563,8 +563,12 @@ def verify_schema(self, dataset_id, table_id, schema):
datasetId=dataset_id,
tableId=table_id).execute()['schema']

remote_fields = [{'name': field_remote['name'],
'type': field_remote['type']}
for field_remote in remote_schema['fields']]

fields_remote = set([json.dumps(field_remote)
for field_remote in remote_schema['fields']])
for field_remote in remote_fields])
fields_local = set(json.dumps(field_local)
for field_local in schema['fields'])

Expand Down
28 changes: 28 additions & 0 deletions pandas_gbq/tests/test_gbq.py
Original file line number Diff line number Diff line change
Expand Up @@ -1161,6 +1161,34 @@ def test_upload_data_flexible_column_order(self):
_get_project_id(), if_exists='append',
private_key=_get_private_key_path())

def test_verify_schema_ignores_field_mode(self):
test_id = "14"
test_schema_1 = {'fields': [{'name': 'A',
'type': 'FLOAT',
'mode': 'NULLABLE'},
{'name': 'B',
'type': 'FLOAT',
'mode': 'NULLABLE'},
{'name': 'C',
'type': 'STRING',
'mode': 'NULLABLE'},
{'name': 'D',
'type': 'TIMESTAMP',
'mode': 'REQUIRED'}]}
test_schema_2 = {'fields': [{'name': 'A',
'type': 'FLOAT'},
{'name': 'B',
'type': 'FLOAT'},
{'name': 'C',
'type': 'STRING'},
{'name': 'D',
'type': 'TIMESTAMP'}]}

self.table.create(TABLE_ID + test_id, test_schema_1)
self.assertTrue(self.sut.verify_schema(
self.dataset_prefix + "1", TABLE_ID + test_id, test_schema_2),
'Expected schema to match')

def test_list_dataset(self):
dataset_id = self.dataset_prefix + "1"
self.assertTrue(dataset_id in self.dataset.datasets(),
Expand Down

0 comments on commit 89bf82d

Please sign in to comment.