-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added query_config parameter to read_gbq #14742
Changes from 9 commits
55bf05c
dad9288
9a16a8c
f9fae0c
42dc9e6
c66169d
a96811d
ad35a43
ddb4fd1
94fa514
d69ed7f
834a2ff
640be7a
b849300
a952710
0b365da
c199935
028c8be
ce8ebe4
146f0f3
8fe77b2
c21588a
395c0e9
8a38650
929ad1a
86ed96d
0ac26a2
99521aa
df5dec6
8720b03
ec590af
2e02d76
e2f801f
b97a1be
82f4409
3a238a5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -83,3 +83,10 @@ Performance Improvements | |
|
||
Bug Fixes | ||
~~~~~~~~~ | ||
|
||
.. _whatsnew_0200.gbq: | ||
|
||
Google BigQuery Enhancements | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
- The :func:`read_gbq` method now allows query configuration preferences | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a tiny example, add the issue number There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -375,7 +375,7 @@ def process_insert_errors(self, insert_errors): | |
|
||
raise StreamingInsertError | ||
|
||
def run_query(self, query): | ||
def run_query(self, query, **kwargs): | ||
try: | ||
from googleapiclient.errors import HttpError | ||
except: | ||
|
@@ -395,6 +395,9 @@ def run_query(self, query): | |
} | ||
} | ||
} | ||
query_config = kwargs.get('query_config') | ||
if query_config is not None: | ||
job_data['configuration']['query'].update(query_config) | ||
|
||
self._start_timer() | ||
try: | ||
|
@@ -622,7 +625,8 @@ def _parse_entry(field_value, field_type): | |
|
||
|
||
def read_gbq(query, project_id=None, index_col=None, col_order=None, | ||
reauth=False, verbose=True, private_key=None, dialect='legacy'): | ||
reauth=False, verbose=True, private_key=None, dialect='legacy', | ||
**kwargs): | ||
"""Load data from Google BigQuery. | ||
|
||
THIS IS AN EXPERIMENTAL LIBRARY | ||
|
@@ -682,6 +686,13 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, | |
|
||
.. versionadded:: 0.19.0 | ||
|
||
**kwargs: Arbitrary keyword arguments | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is just like using a named argument. what I would like to see is something like:
then just add these keys directly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback Hmm, in this case I can use
and function There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure what you mean. the point here is that if someone wants to specify a configuration option, then they can just pass it thru as a a dict structure. I don't want to have to change pandas code again when someone want 'another' option. These should just pass thru. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback sorry for big response but I haven't got your idea yet:
So, I am going to use such logic: if query not specified in config I will use query parameter of read_gbq. But if
Not like:
And in this case I could pass:
what I should do with query parameter of function There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @necnec I share a similar concern that there can be conflicting settings for the Setting
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Space after 'kwargs' (before the colon, this is a numpydoc peculiarity) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, the If you want to keep them, you need to make the docstring a raw string by adding a 'r' at the front (so |
||
query_config (dict): query configuration parameters for job processing. | ||
For more information see `BigQuery SQL Reference | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indent this also one level less (same level as the "For example .." |
||
<https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.query>` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no indentation relative to "For more ...) is needed here (otherwise possibly will give errors when building the docs) |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you put the mini example here |
||
.. versionadded:: 0.20.0 | ||
|
||
Returns | ||
------- | ||
df: DataFrame | ||
|
@@ -698,7 +709,7 @@ def read_gbq(query, project_id=None, index_col=None, col_order=None, | |
connector = GbqConnector(project_id, reauth=reauth, verbose=verbose, | ||
private_key=private_key, | ||
dialect=dialect) | ||
schema, pages = connector.run_query(query) | ||
schema, pages = connector.run_query(query, **kwargs) | ||
dataframe_list = [] | ||
while len(pages) > 0: | ||
page = pages.pop() | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -707,10 +707,57 @@ def test_invalid_option_for_sql_dialect(self): | |
private_key=_get_private_key_path()) | ||
|
||
# Test that a correct option for dialect succeeds | ||
# to make sure ValueError was due to invalid dialect | ||
# to make sure ValueError was due to invalid dialect | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please revert the extra space before dialect here |
||
gbq.read_gbq(sql_statement, project_id=_get_project_id(), | ||
dialect='standard', private_key=_get_private_key_path()) | ||
|
||
def test_query_with_parameters(self): | ||
sql_statement = "SELECT @param1 + @param2 as VALID_RESULT" | ||
query_config = { | ||
"useLegacySql": False, | ||
"parameterMode": "named", | ||
"queryParameters": [ | ||
{ | ||
"name": "param1", | ||
"parameterType": { | ||
"type": "INTEGER" | ||
}, | ||
"parameterValue": { | ||
"value": 1 | ||
} | ||
}, | ||
{ | ||
"name": "param2", | ||
"parameterType": { | ||
"type": "INTEGER" | ||
}, | ||
"parameterValue": { | ||
"value": 2 | ||
} | ||
} | ||
] | ||
} | ||
# Test that an invalid query without query_config | ||
with tm.assertRaises(ValueError): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this necessary? I thought There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback Yes, configuration is optional. But this unit test is very special. It processes query with parameters. And in this case you must pass parameters values in configuration. I've made 2 unit tests. So if you think this test if very special I can remove that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. its fine to test. is it seems that this tests means its required somehow though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback so I don't need to change anything here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we update the comment to better explain why we expect a failure here? For example, |
||
gbq.read_gbq(sql_statement, project_id=_get_project_id(), | ||
private_key=_get_private_key_path()) | ||
|
||
# Test that a correct query with query config | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
df = gbq.read_gbq(sql_statement, project_id=_get_project_id(), | ||
private_key=_get_private_key_path(), | ||
query_config=query_config) | ||
tm.assert_frame_equal(df, DataFrame({'VALID_RESULT': [3]})) | ||
|
||
def test_query_no_cache(self): | ||
query = 'SELECT "PI" as VALID_STRING' | ||
query_config = { | ||
"useQueryCache": False, | ||
} | ||
df = gbq.read_gbq(query, project_id=_get_project_id(), | ||
private_key=_get_private_key_path(), | ||
query_config=query_config) | ||
tm.assert_frame_equal(df, DataFrame({'VALID_STRING': ['PI']})) | ||
|
||
|
||
class TestToGBQIntegration(tm.TestCase): | ||
# Changes to BigQuery table schema may take up to 2 minutes as of May 2015 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't need a sub-section, just add to enhancements