-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added query_config parameter to read_gbq #14742
Conversation
Now more complicated queries could be processed.
cc @parthea accepting kwargs is a more general way of doing this. I don't really want to have to keep adding specific keywords. Futher would need some tests. |
@jreback thanks for your notes. I've changed parameters to kwargs style. |
Current coverage is 84.75% (diff: 7.69%)@@ master #14742 diff @@
==========================================
Files 145 145
Lines 51090 51139 +49
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43315 43344 +29
- Misses 7775 7795 +20
Partials 0 0
|
@jreback do I must write some additional test in https://github.com/pandas-dev/pandas/blob/master/pandas/io/tests/test_gbq.py? As I see current version is passed by bot successfully. |
Correct. There should be new unit test(s) added in
Please follow the steps in the link mentioned. Ideally this is done before you push. You only need to do this once and hopefully you won't touch it again, except to change credentials if needed. It should only take 5-10 minutes. Separately, please let me know if any part of the instructions is unclear (I put the instructions together). |
@parthea thank you for your notes. BQ testing instruction is quite good, although I have problems with credential and my build crashes on my own Travis-CI tests. I've got an exception
although I've put my JSON in single quotes So, is it enough for my pull-request? |
@@ -379,6 +379,7 @@ Google BigQuery Enhancements | |||
|
|||
- The :func:`read_gbq` method has gained the ``dialect`` argument to allow users to specify whether to use BigQuery's legacy SQL or BigQuery's standard SQL. See the :ref:`docs <io.bigquery_reader>` for more details (:issue:`13615`). | |||
- The :func:`~DataFrame.to_gbq` method now allows the DataFrame column order to differ from the destination table schema (:issue:`11359`). | |||
- The :func:`read_gbq` method now allows query configuration preferences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 0.19.2
@@ -682,6 +686,13 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, | |||
|
|||
.. versionadded:: 0.19.0 | |||
|
|||
**kwargs: Arbitrary keyword arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just like using a named argument. what I would like to see is something like:
config = {'query' : ....}
read_gbq(.....,configuration=config)
then just add these keys directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Hmm, in this case I can use read_gbq
for loading files, isn't it? I mean:
config = {'load' : ....}
and function read_gbq
could use not only for reading queries and parameter query
seems unnecessary. Could you give me some more motivation about configuration parameter?
I think bigquery jobs like load
, copy
are outside of pandas philosophy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean. the point here is that if someone wants to specify a configuration option, then they can just pass it thru as a a dict structure. I don't want to have to change pandas code again when someone want 'another' option. These should just pass thru.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback sorry for big response but I haven't got your idea yet:
- The things I worry about are arguments
query
anddialect
. It is become redunt.
config = {
'query' : {
'query': 'select 1',
'useLegacySql': dialect == 'legacy'
}
}
read_gbq(query, dialect, configuration=config)
So, I am going to use such logic: if query not specified in config I will use query parameter of read_gbq. But if query
or dialect
specified in config
do I must throw an exception?
- As I got your idea configuration parameter should be like
config = {
'query' : {
"useQueryCache": False,
}
}
Not like:
config = {
"useQueryCache": False
}
And in this case I could pass:
config = {'load' : ....}
what I should do with query parameter of function read_gbq
? just skip it inside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@necnec I share a similar concern that there can be conflicting settings for the 'useLegacySql'
option. One setting could come from the dialect
parameter, and the other setting could come from the configuration kwargs. I agree that it may be better to throw an exception if there is a duplicate value for 'useLegacySql'
rather than silently ignore the dialect parameter that was specified in read_gbq()
.
Setting dialect
in read_gbq()
does not have any effect when 'useLegacySql'
is included as part of the configuration. See example below:
from pandas.io import gbq
sql_statement = "SELECT @param1 + @param2 as VALID_RESULT FROM UNNEST([1, 2, 3, 100, 1000])"
config = {
'query': {
"useLegacySql": False,
"parameterMode": "named",
"queryParameters": [
{
"name": "param1",
"parameterType": {
"type": "INTEGER"
},
"parameterValue": {
"value": 1
}
},
{
"name": "param2",
"parameterType": {
"type": "INTEGER"
},
"parameterValue": {
"value": 2
}
}
]
}
}
gbq.read_gbq(sql_statement,project_id='xxxxxx', configuration=config, verbose=False, dialect='legacy')
@necnec sorry to ask you to move it again, but as this is not a critical fix or enhancement, let's keep this for 0.20, so can you move the whatsnew notice to v0.20.0.txt? |
|
||
.. _whatsnew_0200.gbq: | ||
|
||
Google BigQuery Enhancements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't need a sub-section, just add to enhancements
Google BigQuery Enhancements | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
- The :func:`read_gbq` method now allows query configuration preferences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a tiny example, add the issue number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
Sorry, but I have some newbie questions
if there is no issue for that feature do I need to create one now? And do I must add tiny example to whatsnew
file or to issue
file?
kwargs can be used to pass through a bunch of keywords, but this is a single specific keyword that pandas implements to hold those kwargs, so I think we should document it like that. Further, the user can now pass other keys in the |
Sorry, I haven't got your idea. Do you think we should use
|
Yes, but if you pass a config object with 'query' and such a 'copy' key, the 'copy' key will be silently ignored? |
@jorisvandenbossche Yes it will be ignored. Do you think it should throws an exception in this case? |
@necnec IMO, yes, I would raise an error, that is more informative to the user (or at least raise a warning) |
@jorisvandenbossche I've added exception if you pass 2 types of job in config. But I haven't added additional unit test. |
@necnec the reason I thought it a nice idea to allow arbitrary kwargs to be passed is that I don't want pandas having to add lots of kwargs to the signature when they are simply passed thru. does this assertion still hold? is this useful? |
Yes, I think adding kwargs is a good idea to pass arbitrary keys |
@@ -4649,6 +4649,20 @@ destination DataFrame as well as a preferred column order as follows: | |||
index_col='index_column_name', | |||
col_order=['col1', 'col2', 'col3'], projectid) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add starting in 0.20.0 (or you can add a versionadded tag)
|
||
You can specify the query config as parameter | ||
|
||
.. code-block:: python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say why this is useful as well. If you have a doc-link to things that you might want to pass here, pls add it.
} | ||
} | ||
config = kwargs.get('config') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment on what you are doing here (and why)
|
||
config = {'query': {'useQueryCache': False}} | ||
|
||
For more information see `BigQuery SQL Reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is a good reference, add this above where I indicated
@necnec just a couple of doc comments. ping when green. |
config = {'query': {'useQueryCache': False}} | ||
|
||
For more information see `BigQuery SQL Reference | ||
<https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no indentation relative to "For more ...) is needed here (otherwise possibly will give errors when building the docs)
@jreback Repeating my question from above: I know you asked to rename the kwarg from configuration to config, but as this is to specifically pass the 'configuration' from GBQ terminology, shouldn't we rather keep it consistent with their naming and use 'configuration' as well? |
I just though |
@jreback yes, the key in the dict / json that is passed is actually |
@necnec ok pls make the adjustment from |
thanks @necnec sometimes these things takes time. thanks for the patience. |
Thank you too for this experience! I'm glad to be participated in it. |
Now more complicated queries could be processed.