Implementation of DB-API for BigQuery. #2921

tswast · 2017-01-06T17:38:28Z

Implements Cursor.execute() and Cursor.fetchone() without support
for query parameters.

Tested manually with a Jupyter notebook

# In[1]:
from google.cloud import bigquery
from google.cloud.bigquery import bqdb
connection = bqdb.connect()
cursor = connection.cursor()

# In[2]:
cursor.execute("SELECT (1 + 2) AS s;")

# In[3]:
cursor.fetchone()

# In[4]:
cursor.fetchone()

# In[5]:
cursor.description

# In[6]:
cursor.rowcount

# In[7]:
cursor.execute("DELETE FROM `swast-scratch.hello_world.hello` WHERE id = 1;")

# In[8]:
cursor.rowcount

Makes progress on #2434

bigquery/google/cloud/bigquery/bqdb/__init__.py

+
+apilevel = "2.0"
+
+# Threads may share the module, but not connections.


bigquery/google/cloud/bigquery/bqdb/connection.py

+
+
+class Connection(object):
+    """Connection to Google BigQuery.


bigquery/google/cloud/bigquery/bqdb/cursor.py

+            return None
+
+        rows, _, page_token = self._query_results.fetch_data(
+                max_results=1, page_token=self._page_token)


bigquery/google/cloud/bigquery/bqdb/cursor.py

+        #       infer types from parameter inputs.
+        query_job = client.run_async_query(job_id, operation)
+        query_job.use_legacy_sql = False
+        query_job.begin()


bigquery/google/cloud/bigquery/bqdb/cursor.py

+        self._has_fetched_all_rows = False
+        client = self.connection._client
+        job_id = str(uuid.uuid4())
+        # TODO: parameters: if not ``None``, check if ``dict`` or sequence and


bigquery/google/cloud/bigquery/bqdb/cursor.py

+        if self._has_fetched_all_rows:
+            return None
+
+        rows, _, page_token = self._query_results.fetch_data(


bigquery/google/cloud/bigquery/bqdb/__init__.py

@@ -0,0 +1,49 @@
+# Copyright 2016 Google Inc.


Bamieh · 2017-04-24T15:57:50Z

thanks for the effort! i want to use big query with superset via sqlalchemy, i believe this is a good starting point, im willing to help out if needed!

bigquery/google/cloud/bigquery/dbapi/_helpers.py

+        if job.state == 'DONE':
+            if job.error_result:
+                # TODO: raise a more specific exception, based on the error.
+                # See: https://cloud.google.com/bigquery/troubleshooting-errors


bigquery/google/cloud/bigquery/dbapi/connection.py

+
+    def close(self):
+        """No-op."""
+        pass


bigquery/google/cloud/bigquery/dbapi/cursor.py

+        pass
+
+    def _set_description(self, schema):
+        """Set description from schema."""


bigquery/google/cloud/bigquery/dbapi/cursor.py

+        for field in schema:
+            desc.append(tuple([
+                field.name,
+                None,


bigquery/google/cloud/bigquery/dbapi/cursor.py

+        self.description = tuple(desc)
+
+    def execute(self, operation):
+        """Prepare and execute a database operation."""


bigquery/google/cloud/bigquery/dbapi/cursor.py

+        query_job.begin()
+        _helpers.wait_for_job(query_job)
+        self._query_results = query_job.results()
+        _, total_rows, _ = self._query_results.fetch_data(max_results=0)


bigquery/google/cloud/bigquery/dbapi/cursor.py

+            self._has_fetched_all_rows = True
+
+        self._page_token = page_token
+        return rows[0]


bigquery/tests/system.py

+                self.assertEqual(len(row), 1)
+                self.assertEqual(row[0], example['expected'])
+                row = Config.CURSOR.fetchone()
+                self.assertIsNone(row)


bigquery/tests/unit/test_dbapi_connection.py

+        from google.cloud.bigquery import Client
+        from google.cloud.bigquery.dbapi import connect
+        from google.cloud.bigquery.dbapi import Connection
+        connection = connect()


bigquery/tests/unit/test_dbapi_cursor.py

+        from google.cloud.bigquery.dbapi import Cursor
+        connection = connect(_Client())
+        cursor = connection.cursor()
+        row = cursor.fetchone()


tswast · 2017-06-05T15:52:11Z

Hold off on reviewing. I still need to address a few things from your last review & implement query parameters.

tswast · 2017-06-20T23:14:00Z

I've pushed a new commit. Should be ready to review. (I'll be making a couple extra unit tests to make coverage report happy, but with the integration tests, I'm pretty confident this is working.)

I believe I've addressed most of your comments. I'll file issues for the TODOs once we're confident the PR won't change much before merging.

tswast · 2017-06-22T19:02:24Z

Coverage back @ 100%. @jonparrott PTAL

bigquery/google/cloud/bigquery/dbapi/__init__.py

+   or deprecation policy.
+"""
+
+from google.cloud.bigquery.dbapi.connection import connect  # noqa


bigquery/google/cloud/bigquery/dbapi/__init__.py

+apilevel = "2.0"
+
+# Threads may share the module, but not connections.
+threadsafety = 1


bigquery/google/cloud/bigquery/dbapi/_helpers.py

+def scalar_to_query_parameter(name=None, value=None):
+    """Convert a scalar value into a query parameter.
+
+    Note: the bytes type cannot be distinguished from a string in Python 2.


bigquery/google/cloud/bigquery/dbapi/_helpers.py

+    for value in parameters:
+        query_parameters.append(scalar_to_query_parameter(value=value))
+
+    return query_parameters


bigquery/google/cloud/bigquery/dbapi/_helpers.py

+        value = parameters[name]
+        query_parameters.append(scalar_to_query_parameter(name, value))
+
+    return query_parameters


bigquery/google/cloud/bigquery/dbapi/cursor.py

+        self.rowcount = total_rows
+
+    def _format_operation_list(self, operation, parameters):
+        """Formats parameters in operation in way BigQuery expects.


bigquery/google/cloud/bigquery/dbapi/cursor.py

+            raise exceptions.ProgrammingError(ex)
+
+    def _format_operation_dict(self, operation, parameters):
+        """Formats parameters in operation in way BigQuery expects.


bigquery/google/cloud/bigquery/dbapi/types.py

+Timestamp = datetime.datetime
+DateFromTicks = datetime.date.fromtimestamp
+TimestampFromTicks = datetime.datetime.fromtimestamp
+Binary = bytes


bigquery/tests/system.py

@@ -819,6 +901,93 @@ def test_sync_query_w_query_params(self):
            self.assertEqual(len(query.rows[0]), 1)
            self.assertEqual(query.rows[0][0], example['expected'])

+    def test_dbapi_w_query_parameters(self):
+        EXAMPLES = [


bigquery/tests/unit/test_dbapi__helpers.py

+            self.assertEqual(named_parameter.type_, expected_type, msg=msg)
+            self.assertEqual(named_parameter.value, value, msg=msg)
+
+    @unittest.skipIf(six.PY2, 'Bytes cannot be distinguished from string.')


bigquery/google/cloud/bigquery/dbapi/cursor.py

+            return
+
+        self.description = tuple([
+            Column(


theacodes · 2017-06-26T16:21:56Z

@tseaver @dhermes This LGTM, can one of you do a final pass?

dhermes

I'm still reviewing (just got to cursor.py) but here is some high-level janitorial type feedback:

General fixes needed "everywhere".

Make sure the copyright year is 2017

Use the name of the variable

:type foo: int
:param foo: A foo to be ``bar``-ed.

instead of the current "everywhere" usage

:type: int
:param foo: A foo to be ``bar``-ed.

Use a :returns: section everywhere you have
an :rtype:
Convert all of your "Raises ..." prose into :raises:
Sphinx directives

bigquery/google/cloud/bigquery/dbapi/__init__.py

@@ -0,0 +1,70 @@
+# Copyright 2016 Google Inc.


bigquery/google/cloud/bigquery/dbapi/__init__.py

+apilevel = "2.0"
+
+# Threads may share the module, but not connections.
+threadsafety = 1


bigquery/google/cloud/bigquery/dbapi/__init__.py

+from google.cloud.bigquery.dbapi.types import STRING
+
+
+apilevel = "2.0"


bigquery/google/cloud/bigquery/dbapi/__init__.py

+# Threads may share the module, but not connections.
+threadsafety = 1
+
+paramstyle = "pyformat"


bigquery/google/cloud/bigquery/dbapi/_helpers.py

@@ -0,0 +1,131 @@
+# Copyright 2016 Google Inc.


bigquery/google/cloud/bigquery/dbapi/_helpers.py

+    :param parameters: Sequence of query parameter values.
+
+    :rtype:
+        list of :class:`~google.cloud.bigquery._helpers.AbstractQueryParameter`


bigquery/google/cloud/bigquery/dbapi/_helpers.py

+def to_query_parameters_dict(parameters):
+    """Converts a dictionary of parameter values into query parameters.
+
+    :type: Mapping[str, Any]


bigquery/google/cloud/bigquery/dbapi/_helpers.py

+    return [
+        scalar_to_query_parameter(value, name=name)
+        for name, value
+        in six.iteritems(parameters)]


bigquery/google/cloud/bigquery/dbapi/connection.py

@@ -0,0 +1,56 @@
+# Copyright 2016 Google Inc.


bigquery/google/cloud/bigquery/dbapi/_helpers.py

+    elif isinstance(value, six.binary_type):
+        parameter_type = 'BYTES'
+    elif isinstance(value, datetime.datetime):
+        parameter_type = 'TIMESTAMP' if value.tzinfo else 'DATETIME'


dhermes

I am now down to the unit tests, do I have to look at them?

bigquery/google/cloud/bigquery/dbapi/cursor.py

+                internal_size=None,
+                precision=None,
+                scale=None,
+                null_ok=field.mode == 'NULLABLE')


bigquery/google/cloud/bigquery/dbapi/cursor.py

+
+        try:
+            return operation % tuple(formatted_params)
+        except TypeError as ex:


bigquery/google/cloud/bigquery/dbapi/cursor.py

+            total_rows = num_dml_affected_rows
+        self.rowcount = total_rows
+
+    def _format_operation_list(self, operation, parameters):


bigquery/google/cloud/bigquery/dbapi/cursor.py

+        :type: Sequence[Any]
+        :param parameters: Sequence of parameter values.
+        """
+        formatted_params = ['?' for _ in parameters]


bigquery/google/cloud/bigquery/dbapi/cursor.py

+        """
+        formatted_params = {}
+        for name in parameters:
+            formatted_params[name] = '@{}'.format(name)


bigquery/tests/system.py

+            'UPDATE {}.{} '
+            'SET greeting = \'Guten Tag\' '
+            'WHERE greeting = \'Hello World\''.format(
+                dataset_name, table_name))


bigquery/tests/system.py

+        with _NamedTemporaryFile() as temp:
+            with open(temp.name, 'w') as csv_write:
+                writer = csv.writer(csv_write)
+                writer.writerow(('Greeting'))


bigquery/tests/system.py

+                Config.CURSOR.execute(
+                    example['sql'], example['query_parameters'])
+            except dbapi.DatabaseError as ex:
+                raise dbapi.DatabaseError('{} {}'.format(ex, msg))


bigquery/tests/system.py

+            self.assertEqual(len(row), 1, msg=msg)
+            self.assertEqual(row[0], example['expected'], msg=msg)
+            row = Config.CURSOR.fetchone()
+            self.assertIsNone(row, msg=msg)


bigquery/tests/system.py

@@ -838,7 +1009,6 @@ def test_large_query_w_public_data(self):
        SQL = 'SELECT * from `{}.{}.{}` LIMIT {}'.format(
            PUBLIC, DATASET_NAME, TABLE_NAME, LIMIT)

-        dataset = Config.CLIENT.dataset(DATASET_NAME, project=PUBLIC)


tswast · 2017-06-26T23:29:37Z

Oops, I think GitHub sent my review early. I haven't uploaded my fixes yet.

tswast · 2017-06-27T00:10:06Z

Okay. I just pushed my latest changes.

tswast · 2017-06-28T19:21:57Z

Ready for another review pass when you get a chance.

I believe this commit now covers all of the required implementation details in the PEP-249 DB-API specification.

- improved docstring formatting - used namedtuple for column descriptions

Docstring formatting.

tswast · 2017-07-10T22:49:40Z

I've rebased on the latest master. Okay to merge?

The `google.cloud.bigquery.dbapi` package covers all of the required implementation details in the PEP-249 DB-API specification.

tswast added the api: bigquery Issues related to the BigQuery API. label Jan 6, 2017

tswast requested a review from theacodes January 6, 2017 17:38

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 6, 2017

theacodes suggested changes Jan 6, 2017

View reviewed changes

theacodes reviewed Jan 6, 2017

View reviewed changes

bigquery/google/cloud/bigquery/bqdb/__init__.py Outdated

@@ -0,0 +1,49 @@

# Copyright 2016 Google Inc.

This comment was marked as spam.

Sign in to view

lukesneeringer added the priority: p2 Moderately-important priority. Fix may not be included in next release. label Apr 19, 2017

tswast mentioned this pull request May 15, 2017

BigQuery should have a module that follows the DB-API to allow for a SQLAlchemy dialect #2434

Closed

tswast force-pushed the bq-db-api-2434 branch from b3ab7ba to 432916a Compare May 15, 2017 18:12

tseaver suggested changes May 16, 2017

View reviewed changes

tswast mentioned this pull request Jun 5, 2017

Google BigQuery Support apache/superset#945

Closed

tswast force-pushed the bq-db-api-2434 branch from 1bf2118 to 91de44a Compare June 20, 2017 23:08

tswast force-pushed the bq-db-api-2434 branch from 91de44a to 5b29a9e Compare June 22, 2017 18:30

theacodes suggested changes Jun 22, 2017

View reviewed changes

tswast mentioned this pull request Jun 23, 2017

BigQuery DB-API: Support struct and repeated query parameter #3524

Closed

tswast force-pushed the bq-db-api-2434 branch 2 times, most recently from d5e3c3e to 1eddf20 Compare June 24, 2017 00:03

theacodes approved these changes Jun 26, 2017

View reviewed changes

bigquery/google/cloud/bigquery/dbapi/cursor.py

return

self.description = tuple([

Column(

This comment was marked as spam.

Sign in to view

This comment was marked as spam.

Sign in to view

tswast force-pushed the bq-db-api-2434 branch from 1eddf20 to 7d59e97 Compare June 26, 2017 16:29

dhermes reviewed Jun 26, 2017

View reviewed changes

tswast force-pushed the bq-db-api-2434 branch from af2c64a to 3b7c505 Compare June 27, 2017 17:43

tswast changed the title ~~Partial implementation of DB-API for BigQuery.~~ Implementation of DB-API for BigQuery. Jun 30, 2017

tswast requested a review from lukesneeringer June 30, 2017 16:37

lukesneeringer approved these changes Jul 5, 2017

View reviewed changes

tswast added 4 commits July 10, 2017 15:33

Partial implementation of DB-API for BigQuery.

e01d69d

I believe this commit now covers all of the required implementation details in the PEP-249 DB-API specification.

BQ DB-API: Use unicode for string type in params

087fd2c

- improved docstring formatting - used namedtuple for column descriptions

BQ-DBAPI: Use named params for description construction.

9be0572

BQ-DBAPI: Escape names in query parameters.

d5eed0e

Docstring formatting.

tswast force-pushed the bq-db-api-2434 branch from 3b7c505 to d5eed0e Compare July 10, 2017 22:49

tswast merged commit 68720f6 into googleapis:master Jul 12, 2017

tswast deleted the bq-db-api-2434 branch July 12, 2017 17:04


		apilevel = "2.0"

		# Threads may share the module, but not connections.

		from google.cloud.bigquery.dbapi.types import STRING


		apilevel = "2.0"

Implementation of DB-API for BigQuery. #2921

Implementation of DB-API for BigQuery. #2921

Conversation

tswast commented Jan 6, 2017

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

Bamieh commented Apr 24, 2017

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

tswast commented Jun 5, 2017

tswast commented Jun 20, 2017

tswast commented Jun 22, 2017

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

theacodes commented Jun 26, 2017

dhermes left a comment

Choose a reason for hiding this comment

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

dhermes left a comment

Choose a reason for hiding this comment

This comment was marked as spam.