Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: deprecate client.dataset() part 1 #9032

Merged
merged 33 commits into from
Aug 28, 2019

Conversation

emar-kar
Copy link
Contributor

@emar-kar emar-kar commented Aug 15, 2019

Towards #8989
This PR contains five snippets:

  • add_empty_column
  • browse_table_data
  • dataset_exists (transferred like a check function, as it was before)
  • list_datasets_by_label
  • manage_dataset_labels - which is divided into 3 parts (not sure about it!):
    • label_dataset
    • get_dataset_labels
    • delete_dataset_labels

mf2199 and others added 20 commits August 8, 2019 23:11
*.rst    +
*.py    +
test    +
conf   +
Methods were divided into 3 files:
- add label
- get labels
- delete labels

*.rst - docs updated
tests passed successfully
minor corrections, 'dataset_exists' moved to the 'Getting a Dataset' section
grammar fix
Deleted extra 'samples/'
chged quotes + added dots
Chged assertion unit
corrected the test asserts
* deleted unnecessary schema (add_empty_column)
* added 'get_dataset' method to check that dataset actually was updated (label_dataset & delete_dataset_labels)
* chged name of the fixture on more suitable (conftest & test_delete_dataset_labels & test_get_dataset_labels & list_datasets_by_label)
* deleted unnecessary 'labels' variable (test_label_dataset)
@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no This human has *not* signed the Contributor License Agreement. label Aug 15, 2019

dataset = client.update_dataset(dataset, ["labels"])

dataset = client.get_dataset(dataset_id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was additionally added to check that changes were uploaded to the source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_dataset returns a full dataset resource, so this should be unnecessary in the sample. I agree that it makes sense to have this in the test for the sample.

Suggested change
dataset = client.get_dataset(dataset_id)


dataset = client.update_dataset(dataset, ["labels"])

dataset = client.get_dataset(dataset_id)
Copy link
Contributor Author

@emar-kar emar-kar Aug 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to the sample test. Since update_dataset returns the full resource, get_dataset after update_dataset is unnecessary for normal use.

Suggested change
dataset = client.get_dataset(dataset_id)

@IlyaFaer
Copy link

@googlebot I consent

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Aug 15, 2019
@IlyaFaer IlyaFaer added api: bigquery Issues related to the BigQuery API. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Aug 15, 2019
@IlyaFaer IlyaFaer requested review from tswast and removed request for a team August 15, 2019 17:00
@IlyaFaer IlyaFaer added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 19, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 19, 2019
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. Looking pretty good, but I'd like to make some improvements to the samples while we are making these changes.

Thank you for your patience regarding the code review.


dataset = client.update_dataset(dataset, ["labels"])

dataset = client.get_dataset(dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update_dataset returns a full dataset resource, so this should be unnecessary in the sample. I agree that it makes sense to have this in the test for the sample.

Suggested change
dataset = client.get_dataset(dataset_id)

# dataset_id = "your-project.your_dataset"

dataset = client.get_dataset(dataset_id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too much whitespace for a code sample.

dataset = client.get_dataset(dataset_id)

dataset.labels = {"color": "green"}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to group all these lines together.


dataset = client.update_dataset(dataset, ["labels"])

dataset = client.get_dataset(dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to the sample test. Since update_dataset returns the full resource, get_dataset after update_dataset is unnecessary for normal use.

Suggested change
dataset = client.get_dataset(dataset_id)


# Load all rows from a table
rows = client.list_rows(table)
if len(list(rows)) == table.num_rows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if len(list(rows)) == table.num_rows:
# Iterate over rows to make the API requests to fetch row data.
rows = list(rows_iter)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we had asserts in these samples before, but I've changed my mind regarding these statements. I think this causes more confusion than it explains and should be removed from the sample.


def test_get_dataset_labels(capsys, client, dataset_id, dataset_with_labels_id):

get_dataset_labels.get_dataset_labels(client, dataset_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be dataset_with_labels_id? The dataset_id fixture is not needed, right?

bigquery/samples/delete_dataset_labels.py Show resolved Hide resolved
@@ -78,6 +87,13 @@ def table_id(client, dataset_id):
client.delete_table(table, not_found_ok=True)


@pytest.fixture
def table_w_data(client):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this table_with_data

Copy link
Contributor Author

@emar-kar emar-kar Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this fixture to return the table_id instead of the table reference. That's why I assume better name is gonna be table_with_data_id.

@@ -65,6 +65,15 @@ def dataset_id(client):
client.delete_dataset(dataset, delete_contents=True, not_found_ok=True)


@pytest.fixture
def dataset_with_labels_id(client, dataset_id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Rather than make this fixture, we could combine all the "labels" tests into a common test file. See the models samples for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to follow your advice and combined all dataset_label tests into one.


dataset = client.get_dataset(dataset_id)

print("Dataset ID: {}".format(dataset_id))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we really want to be showing how to print the labels every time? It feels redundant.

Let's instead make a call to get_dataset in the tests for this sample and make sure the color label has been removed.

print("First {} rows of the table are loaded".format(number_of_rows))

# Specify selected fields to limit the results to certain columns
fields = table.schema[:2] # first two columns
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tswast Most of the changes are clear to me, but I assume it's gonna be a problem here. Since I removed table = client.get_table(table_id), I can't find the way how to select only 2 columns to insert them into the selected_fields option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Yes, we can keep the get_table call for this part of the sample.

@@ -65,6 +65,15 @@ def dataset_id(client):
client.delete_dataset(dataset, delete_contents=True, not_found_ok=True)


@pytest.fixture
def dataset_with_labels_id(client, dataset_id):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to follow your advice and combined all dataset_label tests into one.

@@ -78,6 +87,13 @@ def table_id(client, dataset_id):
client.delete_table(table, not_found_ok=True)


@pytest.fixture
def table_w_data(client):
Copy link
Contributor Author

@emar-kar emar-kar Aug 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this fixture to return the table_id instead of the table reference. That's why I assume better name is gonna be table_with_data_id.

@emar-kar
Copy link
Contributor Author

@tswast the last commit will partially close #9081. It will fix flake8 errors in the original samples, which were rewrote before this PR.

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Thank you for making those updates. Just a few more suggestions.

print("First {} rows of the table are loaded".format(number_of_rows))

# Specify selected fields to limit the results to certain columns
fields = table.schema[:2] # first two columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Yes, we can keep the get_table call for this part of the sample.

def table_with_data_id(client):
dataset = client.get_dataset("bigquery-public-data.samples")
table = dataset.table("shakespeare")
return "{}.{}.{}".format(table.project, table.dataset_id, table.table_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "{}.{}.{}".format(table.project, table.dataset_id, table.table_id)
return "bigquery-public-data.samples.shakespeare"

@pytest.fixture
def table_with_data_id(client):
dataset = client.get_dataset("bigquery-public-data.samples")
table = dataset.table("shakespeare")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
table = dataset.table("shakespeare")

@@ -78,6 +78,13 @@ def table_id(client, dataset_id):
client.delete_table(table, not_found_ok=True)


@pytest.fixture
def table_with_data_id(client):
dataset = client.get_dataset("bigquery-public-data.samples")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API call to get_dataset is unnecessary.

Suggested change
dataset = client.get_dataset("bigquery-public-data.samples")


if datasets:
print("Datasets filtered by {}:".format(label_filter))
for dataset in datasets: # API request(s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you've already called list on the return value, this actually doesn't make any additional API requests.

Suggested change
for dataset in datasets: # API request(s)
for dataset in datasets:


dataset = client.update_dataset(dataset, ["labels"])
print("Labels deleted from {}".format(dataset_id))
# [END bigquery_delete_label_dataset]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have this function return dataset after the # [END ...] line, so it's outside the sample. Then update the test for this sample to verify that dataset.labels.get("color") == None.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 28, 2019
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 28, 2019
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for your patience with the review.

You might have to update to master to get the fix for the failing unit tests (due to resumable upload package release).

@IlyaFaer
Copy link

@tswast, all checks are finished with OK, so I think it can be merged

@IlyaFaer IlyaFaer merged commit ca275d8 into googleapis:master Aug 28, 2019
HemangChothani pushed a commit to HemangChothani/google-cloud-python that referenced this pull request Aug 29, 2019
@emar-kar emar-kar deleted the first-five branch September 3, 2019 07:35
emar-kar added a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019
emar-kar added a commit to MaxxleLLC/google-cloud-python that referenced this pull request Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants