BigQuery: deprecate `client.dataset()` part 1 #9032

emar-kar · 2019-08-15T15:14:44Z

Towards #8989
This PR contains five snippets:

add_empty_column
browse_table_data
dataset_exists (transferred like a check function, as it was before)
list_datasets_by_label
manage_dataset_labels - which is divided into 3 parts (not sure about it!):
- label_dataset
- get_dataset_labels
- delete_dataset_labels

sync forks

*.rst + *.py + test + conf +

Methods were divided into 3 files: - add label - get labels - delete labels *.rst - docs updated tests passed successfully

minor corrections, 'dataset_exists' moved to the 'Getting a Dataset' section

grammar fix

Deleted extra 'samples/'

chged quotes + added dots

import updates

Chged assertion unit

corrected the test asserts

* deleted unnecessary schema (add_empty_column) * added 'get_dataset' method to check that dataset actually was updated (label_dataset & delete_dataset_labels) * chged name of the fixture on more suitable (conftest & test_delete_dataset_labels & test_get_dataset_labels & list_datasets_by_label) * deleted unnecessary 'labels' variable (test_label_dataset)

googlebot · 2019-08-15T15:14:50Z

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

emar-kar · 2019-08-15T15:19:09Z

bigquery/samples/label_dataset.py

+
+    dataset = client.update_dataset(dataset, ["labels"])
+
+    dataset = client.get_dataset(dataset_id)


This line was additionally added to check that changes were uploaded to the source.

update_dataset returns a full dataset resource, so this should be unnecessary in the sample. I agree that it makes sense to have this in the test for the sample.

Suggested change

dataset = client.get_dataset(dataset_id)

emar-kar · 2019-08-15T15:20:19Z

bigquery/samples/delete_dataset_labels.py

+
+    dataset = client.update_dataset(dataset, ["labels"])
+
+    dataset = client.get_dataset(dataset_id)


Same as https://github.com/googleapis/google-cloud-python/pull/9032/files#r314359855

Let's move this to the sample test. Since update_dataset returns the full resource, get_dataset after update_dataset is unnecessary for normal use.

Suggested change

dataset = client.get_dataset(dataset_id)

IlyaFaer · 2019-08-15T15:22:32Z

@googlebot I consent

googlebot · 2019-08-15T15:22:44Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

tswast

Thank you for your contribution. Looking pretty good, but I'd like to make some improvements to the samples while we are making these changes.

Thank you for your patience regarding the code review.

tswast · 2019-08-22T19:42:12Z

bigquery/samples/label_dataset.py

+
+    dataset = client.update_dataset(dataset, ["labels"])
+
+    dataset = client.get_dataset(dataset_id)


update_dataset returns a full dataset resource, so this should be unnecessary in the sample. I agree that it makes sense to have this in the test for the sample.

Suggested change

dataset = client.get_dataset(dataset_id)

tswast · 2019-08-22T19:42:40Z

bigquery/samples/label_dataset.py

+    # dataset_id = "your-project.your_dataset"
+
+    dataset = client.get_dataset(dataset_id)
+


Suggested change

This is too much whitespace for a code sample.

tswast · 2019-08-22T19:43:06Z

bigquery/samples/label_dataset.py

+    dataset = client.get_dataset(dataset_id)
+
+    dataset.labels = {"color": "green"}
+


Suggested change

I think it makes more sense to group all these lines together.

tswast · 2019-08-22T19:44:49Z

bigquery/samples/delete_dataset_labels.py

+
+    dataset = client.update_dataset(dataset, ["labels"])
+
+    dataset = client.get_dataset(dataset_id)


Let's move this to the sample test. Since update_dataset returns the full resource, get_dataset after update_dataset is unnecessary for normal use.

Suggested change

dataset = client.get_dataset(dataset_id)

tswast · 2019-08-22T19:47:55Z

bigquery/samples/browse_table_data.py

+
+    # Load all rows from a table
+    rows = client.list_rows(table)
+    if len(list(rows)) == table.num_rows:


Suggested change

if len(list(rows)) == table.num_rows:

# Iterate over rows to make the API requests to fetch row data.

rows = list(rows_iter)

I know we had asserts in these samples before, but I've changed my mind regarding these statements. I think this causes more confusion than it explains and should be removed from the sample.

tswast · 2019-08-22T20:11:56Z

bigquery/samples/tests/test_get_dataset_labels.py

+
+def test_get_dataset_labels(capsys, client, dataset_id, dataset_with_labels_id):
+
+    get_dataset_labels.get_dataset_labels(client, dataset_id)


Shouldn't this be dataset_with_labels_id? The dataset_id fixture is not needed, right?

bigquery/samples/delete_dataset_labels.py

tswast · 2019-08-22T20:14:49Z

bigquery/samples/tests/conftest.py

@@ -78,6 +87,13 @@ def table_id(client, dataset_id):
    client.delete_table(table, not_found_ok=True)


+@pytest.fixture
+def table_w_data(client):


Let's call this table_with_data

I changed this fixture to return the table_id instead of the table reference. That's why I assume better name is gonna be table_with_data_id.

tswast · 2019-08-22T20:15:33Z

bigquery/samples/tests/conftest.py

@@ -65,6 +65,15 @@ def dataset_id(client):
    client.delete_dataset(dataset, delete_contents=True, not_found_ok=True)


+@pytest.fixture
+def dataset_with_labels_id(client, dataset_id):


Optional: Rather than make this fixture, we could combine all the "labels" tests into a common test file. See the models samples for an example.

I decided to follow your advice and combined all dataset_label tests into one.

tswast · 2019-08-22T20:17:01Z

bigquery/samples/delete_dataset_labels.py

+
+    dataset = client.get_dataset(dataset_id)
+
+    print("Dataset ID: {}".format(dataset_id))


I wonder if we really want to be showing how to print the labels every time? It feels redundant.

Let's instead make a call to get_dataset in the tests for this sample and make sure the color label has been removed.

emar-kar · 2019-08-23T09:33:52Z

bigquery/samples/browse_table_data.py

+        print("First {} rows of the table are loaded".format(number_of_rows))
+
+    # Specify selected fields to limit the results to certain columns
+    fields = table.schema[:2]  # first two columns


@tswast Most of the changes are clear to me, but I assume it's gonna be a problem here. Since I removed table = client.get_table(table_id), I can't find the way how to select only 2 columns to insert them into the selected_fields option.

Makes sense. Yes, we can keep the get_table call for this part of the sample.

emar-kar · 2019-08-23T11:02:03Z

bigquery/samples/tests/conftest.py

@@ -65,6 +65,15 @@ def dataset_id(client):
    client.delete_dataset(dataset, delete_contents=True, not_found_ok=True)


+@pytest.fixture
+def dataset_with_labels_id(client, dataset_id):


I decided to follow your advice and combined all dataset_label tests into one.

emar-kar · 2019-08-23T11:15:51Z

bigquery/samples/tests/conftest.py

@@ -78,6 +87,13 @@ def table_id(client, dataset_id):
    client.delete_table(table, not_found_ok=True)


+@pytest.fixture
+def table_w_data(client):


I changed this fixture to return the table_id instead of the table reference. That's why I assume better name is gonna be table_with_data_id.

emar-kar · 2019-08-23T13:35:06Z

@tswast the last commit will partially close #9081. It will fix flake8 errors in the original samples, which were rewrote before this PR.

tswast

Looking good. Thank you for making those updates. Just a few more suggestions.

tswast · 2019-08-23T22:39:27Z

bigquery/samples/browse_table_data.py

+        print("First {} rows of the table are loaded".format(number_of_rows))
+
+    # Specify selected fields to limit the results to certain columns
+    fields = table.schema[:2]  # first two columns


Makes sense. Yes, we can keep the get_table call for this part of the sample.

tswast · 2019-08-23T22:42:09Z

bigquery/samples/tests/conftest.py

+def table_with_data_id(client):
+    dataset = client.get_dataset("bigquery-public-data.samples")
+    table = dataset.table("shakespeare")
+    return "{}.{}.{}".format(table.project, table.dataset_id, table.table_id)


Suggested change

return "{}.{}.{}".format(table.project, table.dataset_id, table.table_id)

return "bigquery-public-data.samples.shakespeare"

tswast · 2019-08-23T22:42:19Z

bigquery/samples/tests/conftest.py

+@pytest.fixture
+def table_with_data_id(client):
+    dataset = client.get_dataset("bigquery-public-data.samples")
+    table = dataset.table("shakespeare")


Suggested change

table = dataset.table("shakespeare")

tswast · 2019-08-23T22:42:47Z

bigquery/samples/tests/conftest.py

@@ -78,6 +78,13 @@ def table_id(client, dataset_id):
    client.delete_table(table, not_found_ok=True)


+@pytest.fixture
+def table_with_data_id(client):
+    dataset = client.get_dataset("bigquery-public-data.samples")


The API call to get_dataset is unnecessary.

Suggested change

dataset = client.get_dataset("bigquery-public-data.samples")

tswast · 2019-08-23T22:43:57Z

bigquery/samples/list_datasets_by_label.py

+
+    if datasets:
+        print("Datasets filtered by {}:".format(label_filter))
+        for dataset in datasets:  # API request(s)


Since you've already called list on the return value, this actually doesn't make any additional API requests.

Suggested change

for dataset in datasets: # API request(s)

for dataset in datasets:

* rewrote table_with_data_id fixture * removed extra "# API request"

tswast · 2019-08-27T00:10:35Z

bigquery/samples/delete_dataset_labels.py

+
+    dataset = client.update_dataset(dataset, ["labels"])
+    print("Labels deleted from {}".format(dataset_id))
+    # [END bigquery_delete_label_dataset]


Let's have this function return dataset after the # [END ...] line, so it's outside the sample. Then update the test for this sample to verify that dataset.labels.get("color") == None.

tswast

Looks great! Thanks for your patience with the review.

You might have to update to master to get the fix for the failing unit tests (due to resumable upload package release).

IlyaFaer · 2019-08-28T08:04:54Z

@tswast, all checks are finished with OK, so I think it can be merged

Deprecate `client.dataset()` part 1

mf2199 and others added 20 commits August 8, 2019 23:11

Merge pull request #26 from googleapis/master

6b3a2a3

sync forks

Move every snippet to it's own file. Create test templates.

4c6d7b4

list_datasets_by_label

377310c

*.rst + *.py + test + conf +

Merge remote-tracking branch 'upstream/master'

5ce976f

manage_dataset_labels

a7b2376

Methods were divided into 3 files: - add label - get labels - delete labels *.rst - docs updated tests passed successfully

add_empty_column

0de5d8b

browse_table_data

5e05cd8

dataset_exists

f6b60a0

complete change of dataset_exists

a773251

Merge remote-tracking branch 'upstream/master'

8375877

five updated snippets

a0792bd

Update datasets.rst

95dc8f8

minor corrections, 'dataset_exists' moved to the 'Getting a Dataset' section

cosmetic chgs

cf7576e

cosmetic chgs

5654dcd

grammar fix

*.rst files correction

0e7ccf5

Deleted extra 'samples/'

update first-five

6dd1174

chged quotes + added dots

Update test_dataset_exists.py

7a654c3

import updates

update list_datasets_by_label

0cf0071

Chged assertion unit

Update test_browse_table_data.py

f602e3f

corrected the test asserts

googlebot added the cla: no This human has *not* signed the Contributor License Agreement. label Aug 15, 2019

emar-kar commented Aug 15, 2019

View reviewed changes

googlebot added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Aug 15, 2019

IlyaFaer added api: bigquery Issues related to the BigQuery API. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Aug 15, 2019

IlyaFaer requested review from tswast and removed request for a team August 15, 2019 17:00

Update tables.rst

9e77624

IlyaFaer added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 19, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 19, 2019

tswast requested changes Aug 22, 2019

View reviewed changes

emar-kar added 3 commits August 23, 2019 14:03

update manage_dataset_labels

e37f5a2

update browse_table_data

af27c29

deleted extra spaces

47ee156

emar-kar commented Aug 23, 2019

View reviewed changes

emar-kar added 2 commits August 23, 2019 16:02

update import settings

b78f6c9

reformate samples/

6ee351e

tswast requested changes Aug 23, 2019

View reviewed changes

emar-kar added 4 commits August 26, 2019 10:22

minor update

14757f6

* rewrote table_with_data_id fixture * removed extra "# API request"

cosmetic chgs

9941957

Merge branch 'master' into first-five

9cc5661

Update test_list_datasets_by_label.py

70c032f

tswast reviewed Aug 27, 2019

View reviewed changes

emar-kar added 2 commits August 27, 2019 10:53

update delete_dataset_labels

cde3e62

Update test_dataset_label_samples.py

9d3c1aa

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 28, 2019

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 28, 2019

tswast approved these changes Aug 28, 2019

View reviewed changes

IlyaFaer merged commit ca275d8 into googleapis:master Aug 28, 2019

HemangChothani pushed a commit to HemangChothani/google-cloud-python that referenced this pull request Aug 29, 2019

BigQuery: deprecate client.dataset() part 1 (googleapis#9032)

3d4099b

Deprecate `client.dataset()` part 1

emar-kar deleted the first-five branch September 3, 2019 07:35

emar-kar added a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019

BigQuery: deprecate client.dataset() part 1 (googleapis#9032)

76e1d4a

Deprecate `client.dataset()` part 1

emar-kar added a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019

BigQuery: deprecate client.dataset() part 1 (googleapis#9032)

eb18dee

Deprecate `client.dataset()` part 1


		dataset = client.update_dataset(dataset, ["labels"])

		dataset = client.get_dataset(dataset_id)

		# dataset_id = "your-project.your_dataset"

		dataset = client.get_dataset(dataset_id)

		dataset = client.get_dataset(dataset_id)

		dataset.labels = {"color": "green"}

-    if len(list(rows)) == table.num_rows:
+    # Iterate over rows to make the API requests to fetch row data.
+    rows = list(rows_iter)


		def test_get_dataset_labels(capsys, client, dataset_id, dataset_with_labels_id):

		get_dataset_labels.get_dataset_labels(client, dataset_id)


		dataset = client.get_dataset(dataset_id)

		print("Dataset ID: {}".format(dataset_id))

	return "{}.{}.{}".format(table.project, table.dataset_id, table.table_id)
	return "bigquery-public-data.samples.shakespeare"

	for dataset in datasets: # API request(s)
	for dataset in datasets:

BigQuery: deprecate client.dataset() part 1 #9032

BigQuery: deprecate client.dataset() part 1 #9032

Conversation

emar-kar commented Aug 15, 2019 • edited by IlyaFaer Loading

googlebot commented Aug 15, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emar-kar Aug 15, 2019 • edited by IlyaFaer Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IlyaFaer commented Aug 15, 2019

googlebot commented Aug 15, 2019

tswast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emar-kar Aug 23, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emar-kar Aug 23, 2019 • edited Loading

Choose a reason for hiding this comment

emar-kar commented Aug 23, 2019

tswast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tswast left a comment

Choose a reason for hiding this comment

IlyaFaer commented Aug 28, 2019

BigQuery: deprecate `client.dataset()` part 1 #9032

BigQuery: deprecate `client.dataset()` part 1 #9032

emar-kar commented Aug 15, 2019 •

edited by IlyaFaer

Loading

emar-kar Aug 15, 2019 •

edited by IlyaFaer

Loading

emar-kar Aug 23, 2019 •

edited

Loading

emar-kar Aug 23, 2019 •

edited

Loading