Added query_config parameter to read_gbq #14742

necnec · 2016-11-25T13:05:31Z

Now more complicated queries could be processed.

jreback · 2016-11-25T14:21:48Z

accepting kwargs is a more general way of doing this. I don't really want to have to keep adding specific keywords.

Futher would need some tests.

necnec · 2016-11-28T09:52:26Z

@jreback thanks for your notes. I've changed parameters to kwargs style.
Could you give me some examples how to make tests and publish it as you mentioned? I've done it on my local but I think it is not enough.

jreback · 2016-11-28T11:03:08Z

https://github.com/pandas-dev/pandas/blob/master/pandas/io/tests/test_gbq.py

for tests and http://pandas-docs.github.io/pandas-docs-travis/contributing.html#running-google-bigquery-integration-tests

codecov-io · 2016-11-28T12:38:39Z

Current coverage is 84.75% (diff: 7.69%)

Merging #14742 into master will decrease coverage by 0.02%

@@             master     #14742   diff @@
==========================================
  Files           145        145          
  Lines         51090      51139    +49   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43315      43344    +29   
- Misses         7775       7795    +20   
  Partials          0          0

Powered by Codecov. Last update e27b296...3a238a5

necnec · 2016-11-28T13:50:37Z

@jreback do I must write some additional test in https://github.com/pandas-dev/pandas/blob/master/pandas/io/tests/test_gbq.py? As I see current version is passed by bot successfully.
If yes, do I need just write test and push it, or I must make these manipulations http://pandas-docs.github.io/pandas-docs-travis/contributing.html#running-google-bigquery-integration-tests before?

parthea · 2016-11-28T14:14:16Z

do I must write some additional test in https://github.com/pandas-dev/pandas/blob/master/pandas/io/tests/test_gbq.py? As I see current version is passed by bot successfully.

Correct. There should be new unit test(s) added in test_gbq.py

If yes, do I need just write test and push it, or I must make these manipulations http://pandas-docs.github.io/pandas-docs-travis/contributing.html#running-google-bigquery-integration-tests before?

Please follow the steps in the link mentioned. Ideally this is done before you push. You only need to do this once and hopefully you won't touch it again, except to change credentials if needed. It should only take 5-10 minutes. Separately, please let me know if any part of the instructions is unclear (I put the instructions together).

necnec · 2016-11-29T11:41:53Z

@parthea thank you for your notes.
I've added 2 new tests based on query config.
I'm not sure about test_query_with_parameters because as I see parameters is an experimental feature now.

BQ testing instruction is quite good, although I have problems with credential and my build crashes on my own Travis-CI tests. I've got an exception

InvalidPrivateKeyFormat: Private key is missing or invalid. It should be service account private key JSON (file path or string contents) with at least two keys: 'client_email' and 'private_key'

although I've put my JSON in single quotes

So, is it enough for my pull-request?

jreback · 2016-11-29T23:54:50Z

doc/source/whatsnew/v0.19.0.txt

@@ -379,6 +379,7 @@ Google BigQuery Enhancements

 - The :func:`read_gbq` method has gained the ``dialect`` argument to allow users to specify whether to use BigQuery's legacy SQL or BigQuery's standard SQL. See the :ref:`docs <io.bigquery_reader>` for more details (:issue:`13615`).
 - The :func:`~DataFrame.to_gbq` method now allows the DataFrame column order to differ from the destination table schema (:issue:`11359`).
+- The :func:`read_gbq` method now allows query configuration preferences


move to 0.19.2

jreback · 2016-11-29T23:58:53Z

pandas/io/gbq.py

@@ -682,6 +686,13 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None,

        .. versionadded:: 0.19.0

+    **kwargs: Arbitrary keyword arguments


this is just like using a named argument. what I would like to see is something like:

config = {'query' : ....} read_gbq(.....,configuration=config)

then just add these keys directly.

@jreback Hmm, in this case I can use read_gbq for loading files, isn't it? I mean:

config = {'load' : ....}

and function read_gbq could use not only for reading queries and parameter query seems unnecessary. Could you give me some more motivation about configuration parameter?
I think bigquery jobs like load, copy are outside of pandas philosophy?

not sure what you mean. the point here is that if someone wants to specify a configuration option, then they can just pass it thru as a a dict structure. I don't want to have to change pandas code again when someone want 'another' option. These should just pass thru.

@jreback sorry for big response but I haven't got your idea yet:

The things I worry about are arguments query and dialect. It is become redunt.

config = { 'query' : { 'query': 'select 1', 'useLegacySql': dialect == 'legacy' } } read_gbq(query, dialect, configuration=config)

So, I am going to use such logic: if query not specified in config I will use query parameter of read_gbq. But if query or dialect specified in config do I must throw an exception?

As I got your idea configuration parameter should be like

config = { 'query' : { "useQueryCache": False, } }

Not like:

config = { "useQueryCache": False }

And in this case I could pass:

config = {'load' : ....}

what I should do with query parameter of function read_gbq? just skip it inside?

@necnec I share a similar concern that there can be conflicting settings for the 'useLegacySql' option. One setting could come from the dialect parameter, and the other setting could come from the configuration kwargs. I agree that it may be better to throw an exception if there is a duplicate value for 'useLegacySql' rather than silently ignore the dialect parameter that was specified in read_gbq().

Setting dialect in read_gbq() does not have any effect when 'useLegacySql' is included as part of the configuration. See example below:

from pandas.io import gbq sql_statement = "SELECT @param1 + @param2 as VALID_RESULT FROM UNNEST([1, 2, 3, 100, 1000])" config = { 'query': { "useLegacySql": False, "parameterMode": "named", "queryParameters": [ { "name": "param1", "parameterType": { "type": "INTEGER" }, "parameterValue": { "value": 1 } }, { "name": "param2", "parameterType": { "type": "INTEGER" }, "parameterValue": { "value": 2 } } ] } } gbq.read_gbq(sql_statement,project_id='xxxxxx', configuration=config, verbose=False, dialect='legacy')

jorisvandenbossche · 2016-11-30T15:01:58Z

@necnec sorry to ask you to move it again, but as this is not a critical fix or enhancement, let's keep this for 0.20, so can you move the whatsnew notice to v0.20.0.txt?

jreback · 2016-12-01T15:33:06Z

doc/source/whatsnew/v0.20.0.txt

+
+.. _whatsnew_0200.gbq:
+
+Google BigQuery Enhancements


doesn't need a sub-section, just add to enhancements

jreback · 2016-12-01T15:33:32Z

doc/source/whatsnew/v0.20.0.txt

+Google BigQuery Enhancements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- The :func:`read_gbq` method now allows query configuration preferences


add a tiny example, add the issue number

@jreback
Sorry, but I have some newbie questions
if there is no issue for that feature do I need to create one now? And do I must add tiny example to whatsnew file or to issue file?

jorisvandenbossche · 2016-12-22T13:03:02Z

@necnec

@jorisvandenbossche about kwargs instead of configuration you can find there: #14742 (comment)
I think @jreback has better explanation why we use kwargs not configuration

kwargs can be used to pass through a bunch of keywords, but this is a single specific keyword that pandas implements to hold those kwargs, so I think we should document it like that.

Further, the user can now pass other keys in the config object besides 'query', and those will be ignored but no error/warning is raised?

necnec · 2016-12-22T13:49:49Z

@jorisvandenbossche

kwargs can be used to pass through a bunch of keywords, but this is a single specific keyword that pandas implements to hold those kwargs, so I think we should document it like that.

Sorry, I haven't got your idea. Do you think we should use config argument instead of kwargs ? It was my first version but @jreback reasonably ask me to change it to kwargs. I'm ready to add some more documentation if you show me where.

Further, the user can now pass other keys in the config object besides 'query', and those will be ignored but no error/warning is raised?

config allows only 'query' options. Otherwise, it throws an exception. Unit test named test_configuration_without_query check that

jorisvandenbossche · 2016-12-22T13:52:27Z

config allows only 'query' options. Otherwise, it throws an exception. Unit test named test_configuration_without_query check that

Yes, but if you pass a config object with 'query' and such a 'copy' key, the 'copy' key will be silently ignored?

necnec · 2016-12-22T13:56:45Z

@jorisvandenbossche Yes it will be ignored. Do you think it should throws an exception in this case?

jorisvandenbossche · 2016-12-23T23:49:20Z

@necnec IMO, yes, I would raise an error, that is more informative to the user (or at least raise a warning)

…sources

necnec · 2016-12-30T12:37:15Z

@jorisvandenbossche I've added exception if you pass 2 types of job in config. But I haven't added additional unit test.
So after test this feature on my local I receive exception something like this:
ValueError: Only one job type must be specified, but given query,load
I'm not sure that message looks well so I'll happy if you help me to make it more clear.

jreback · 2016-12-30T19:47:31Z

@necnec the reason I thought it a nice idea to allow arbitrary kwargs to be passed is that I don't want pandas having to add lots of kwargs to the signature when they are simply passed thru.

does this assertion still hold? is this useful?

necnec · 2016-12-31T10:53:25Z

Yes, I think adding kwargs is a good idea to pass arbitrary keys

jreback · 2016-12-31T15:50:19Z

doc/source/io.rst

@@ -4649,6 +4649,20 @@ destination DataFrame as well as a preferred column order as follows:
                             index_col='index_column_name',
                             col_order=['col1', 'col2', 'col3'], projectid)

+


add starting in 0.20.0 (or you can add a versionadded tag)

jreback · 2016-12-31T15:50:45Z

doc/source/io.rst

+
+You can specify the query config as parameter
+
+.. code-block:: python


say why this is useful as well. If you have a doc-link to things that you might want to pass here, pls add it.

jreback · 2016-12-31T15:51:14Z

pandas/io/gbq.py

            }
        }
+        config = kwargs.get('config')


can you add a comment on what you are doing here (and why)

jreback · 2016-12-31T15:51:38Z

pandas/io/gbq.py

+
+            config = {'query': {'useQueryCache': False}}
+
+        For more information see `BigQuery SQL Reference


yes this is a good reference, add this above where I indicated

jreback · 2016-12-31T15:52:08Z

@necnec just a couple of doc comments. ping when green.

jorisvandenbossche · 2017-01-02T11:53:56Z

pandas/io/gbq.py

+            config = {'query': {'useQueryCache': False}}
+
+        For more information see `BigQuery SQL Reference
+            <https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query>`


no indentation relative to "For more ...) is needed here (otherwise possibly will give errors when building the docs)

jorisvandenbossche · 2017-01-02T11:59:15Z

@jreback Repeating my question from above: I know you asked to rename the kwarg from configuration to config, but as this is to specifically pass the 'configuration' from GBQ terminology, shouldn't we rather keep it consistent with their naming and use 'configuration' as well?

jreback · 2017-01-02T15:45:35Z

I just though config is more in-line with what is actually used in the JSON (and a bit shorter). If we are actually using configuration (to pass the data) then that would be fine too.

jorisvandenbossche · 2017-01-02T20:09:52Z

@jreback yes, the key in the dict / json that is passed is actually 'configuration', therefore I would use that.

jreback · 2017-01-02T20:21:09Z

@necnec ok pls make the adjustment from config -> configuration as @jorisvandenbossche indicates. ping on green.

jreback · 2017-01-03T11:21:07Z

thanks @necnec

sometimes these things takes time. thanks for the patience.

necnec · 2017-01-03T11:28:59Z

Thank you too for this experience! I'm glad to be participated in it.

Added udf_resource_uri parameter to read_gbq

55bf05c

Now more complicated queries could be processed.

jreback added the IO Google label Nov 25, 2016

necnec added 2 commits November 28, 2016 12:37

Change parameter to kwargs

dad9288

Merge branch 'bigquery-udf-resources'

9a16a8c

necnec changed the title ~~Added udf_resource_uri parameter to read_gbq~~ Added query_config parameter to read_gbq Nov 28, 2016

necnec added 2 commits November 28, 2016 15:28

Fix formatting

f9fae0c

Merge remote-tracking branch 'origin/bigquery-udf-resources'

42dc9e6

necnec added 6 commits November 29, 2016 00:10

add read_gbq tests: query parameters and cache

c66169d

add unit tests read_gbq: query parameters, cache

a96811d

fix whatsnew text

ad35a43

Merge branch 'bigquery-udf-resources'

ddb4fd1

test formatting

94fa514

check tests

d69ed7f

Merge branch 'bigquery-udf-resources'

834a2ff

jreback reviewed Nov 29, 2016

View reviewed changes

Change whatnew 0.19.0->0.19.2

640be7a

Change whatsnew 0.19.2 -> 0.20.0

b849300

jreback reviewed Dec 1, 2016

View reviewed changes

necnec added 2 commits December 2, 2016 14:46

Move whatsnew BQ Enhancements -> Enhancements

a952710

delete newlines

0b365da

configuration->config & formatting

df5dec6

parthea approved these changes Dec 22, 2016

View reviewed changes

Delete trailing whitespaces

8720b03

necnec and others added 4 commits December 29, 2016 15:15

Throw exception if more than 1 job type in config

ec590af

Merge remote-tracking branch 'pandas-dev/master' into bigquery-udf-re…

2e02d76

…sources

hotfix

e2f801f

formatting

b97a1be

jreback reviewed Dec 31, 2016

View reviewed changes

pandas/io/gbq.py

}

}

config = kwargs.get('config')

Copy link

Contributor

jreback Dec 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment on what you are doing here (and why)

jreback reviewed Dec 31, 2016

View reviewed changes

jorisvandenbossche reviewed Jan 2, 2017

View reviewed changes

Add some documentation & formatting

82f4409

config->configuration

3a238a5

jreback closed this in ff3c464 Jan 3, 2017

		@@ -682,6 +686,13 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None,

		.. versionadded:: 0.19.0

		**kwargs: Arbitrary keyword arguments

		@@ -4649,6 +4649,20 @@ destination DataFrame as well as a preferred column order as follows:
		index_col='index_column_name',
		col_order=['col1', 'col2', 'col3'], projectid)


		You can specify the query config as parameter

		.. code-block:: python


		config = {'query': {'useQueryCache': False}}

		For more information see `BigQuery SQL Reference

Added query_config parameter to read_gbq #14742

Added query_config parameter to read_gbq #14742

Conversation

necnec commented Nov 25, 2016

jreback commented Nov 25, 2016

necnec commented Nov 28, 2016 • edited Loading

jreback commented Nov 28, 2016

codecov-io commented Nov 28, 2016 • edited Loading

Current coverage is 84.75% (diff: 7.69%)

necnec commented Nov 28, 2016

parthea commented Nov 28, 2016

necnec commented Nov 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

necnec Nov 30, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

necnec Dec 5, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 30, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Dec 22, 2016

necnec commented Dec 22, 2016

jorisvandenbossche commented Dec 22, 2016

necnec commented Dec 22, 2016

jorisvandenbossche commented Dec 23, 2016

necnec commented Dec 30, 2016

jreback commented Dec 30, 2016

necnec commented Dec 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 31, 2016

Choose a reason for hiding this comment

jorisvandenbossche commented Jan 2, 2017

jreback commented Jan 2, 2017

jorisvandenbossche commented Jan 2, 2017

jreback commented Jan 2, 2017

jreback commented Jan 3, 2017

necnec commented Jan 3, 2017

necnec commented Nov 28, 2016 •

edited

Loading

codecov-io commented Nov 28, 2016 •

edited

Loading

necnec commented Nov 29, 2016 •

edited

Loading

necnec Nov 30, 2016 •

edited

Loading

necnec Dec 5, 2016 •

edited

Loading