Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gcs hidden error #930

Merged
merged 16 commits into from
Nov 30, 2023
2 changes: 1 addition & 1 deletion .flake8
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[flake8]
max-line-length = 100
max-line-length = 200
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional? If so, this should probably be a separate PR because it's pretty opinionated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is still present. Did we get approval from the Parsons community on doubling the maximum line length to 200?

I noticed you had an issue with a long string. Python allows you to separate strings inside parenthesis with a line break to avoid long strings breaking line length limitations. Like this:

foo(f"pretend this is {a} a really long string {b}", 1, 2)

# Can be written:
foo(
  f"pretend this is {a}"
  " a really long"
  f" string {b}",
  1,
  2
)


1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ venv.bak/
# scratch
scratch*
old!_*
test.ipynb

# vscode
.vscode/
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ repos:
language_version: python3
args: [
'--extend-ignore=E203,W503',
'--max-line-length=100'
'--max-line-length=200'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about line length

]
- repo: https://github.com/psf/black
rev: 22.3.0
Expand Down
19 changes: 11 additions & 8 deletions parsons/google/google_cloud_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
import logging
import time
import uuid
from grpc import StatusCode
import gzip
import shutil
from typing import Optional
Expand Down Expand Up @@ -374,7 +373,6 @@ def copy_bucket_to_gcs(
aws_secret_access_key (str):
Secret key to authenticate storage transfer
"""

if source not in ["gcs", "s3"]:
raise ValueError(
f"Blob transfer only supports gcs and s3 sources [source={source}]"
Expand Down Expand Up @@ -439,12 +437,11 @@ def copy_bucket_to_gcs(

# Create the transfer job
create_result = client.create_transfer_job(create_transfer_job_request)
logger.info(f"Created TransferJob: {create_result.name}")

polling = True
wait_time = 0
wait_between_attempts_in_sec = 10
max_wait_in_sec = 60 * 10 # Ten Minutes
# max_wait_in_sec = 60 * 10 # Ten Minutes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this variable isn't being used in the final version of the code, we should probably delete the line.


# NOTE: This value defaults to an empty string until GCP
# triggers the job internally ... we'll use this value to
Expand All @@ -453,27 +450,33 @@ def copy_bucket_to_gcs(

while polling:
if latest_operation_name:

operation = client.get_operation({"name": latest_operation_name})

if not operation.done:
logger.debug("Operation still running...")

else:
if int(operation.error.code) not in StatusCode["OK"].value:
operation_metadata = storage_transfer.TransferOperation.deserialize(
operation.metadata.value
)
error_output = operation_metadata.error_breakdowns
if len(error_output) != 0:
raise Exception(
f"""{blob_storage} to GCS Transfer Job {create_result.name} failed with error: {operation.error.message}
f"""{blob_storage} to GCS Transfer Job {create_result.name} failed with error: {error_output}
"""
)
if operation.response:
else:
sharinetmc marked this conversation as resolved.
Show resolved Hide resolved
logger.info(f"TransferJob: {create_result.name} succeeded.")
return
return

else:
logger.info("Waiting to kickoff operation...")
get_transfer_job_request = storage_transfer.GetTransferJobRequest(
{"job_name": create_result.name, "project_id": self.project}
)
get_result = client.get_transfer_job(request=get_transfer_job_request)
logger.info(f"get_result: {get_result}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a debug log, instead of an info? Is this something that would be helpful to someone looking through the log of a higher-level ETL script after it had an error, or something?

latest_operation_name = get_result.latest_operation_name

wait_time += wait_between_attempts_in_sec
Expand Down