-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cryptic Error when blob.download_to_file from within Google Cloud Dataflow #3836
Comments
@dhermes That works thanks. Just as a favor, would you comment on the following. I'm running a clouddataflow task that pulls videos from my GCP bucket, processes locally, and reuploads them. Would you expect any difference in speed between gsutil and google-cloud-storage from within python? My videos are 3 to 4gb and whiles its hard to exactly track, I was expecting the latency among GCP to dataflow workers to be very fast. It doesn't appear so. Any guess on what speed might be reasonable to expect? |
@bw4sz Yes. |
Thanks, just to follow up. blob.download_to_file is definitely having trouble with the very large files. Just testing locally on OSX, monitoring du -h in the destination folder, I can see that the 3gb file seems to just hang out at 300 MB. Is there a cap (should I submit new issue?). My file is the .avi, checking du -h 10 minutes apart sees no movement, the python call is still hanging. Perhaps this isn't the right use case?
|
I'm a bit thrown off by the You say it hangs, so you mean the process freezes up? |
I was just calling it in a silly way, here
I have two terminals open, one that was calling the apache beam pipeline (testing locally on DirectRunner) and one looking at the destination of the video file. I can see the video file get created, and it grows until that 365M size, and then just hangs out. The process is still "running", but its clearly just stuck on blob.download_to_file. I have a logging statement before and after
and the stdout just chilling there
it never makes it to the next logging statement. Everything works great with test clips (~100mb) |
Forgive me for pestering, but i'm sure there will be others interesting in this. How about compared to apache-beam gcsio? This SO question |
I had to uninstall requests 2.18.4 (pip uninstall requests) and install 2.18.0 (pip install requests==2.18.0) to make it work |
Hi, I'm running requests 2.18.0 & google-cloud-storage==1.7.0 and get the same error. |
Hi, I'm running requests 2.19.1 & google-cloud-storage==1.10.0 and get the same error. |
I am running a cloud dataflow pipeline which requires that each worker download some videos from a gcp bucket, process them, and reupload. This pipeline works locally, but when deployed to dataflow I get the cryptic error when using google-cloud-storage to download blobs.
returns:
The function in question is
Mostly i'm trying to understand what this error means. Is this a permissions error? A latency error? A local write error? I cannot recreate outside of dataflow. What I can tell you from my logging is that dataflow worker is trying to write to write "video.avi" to root (i.e "/"). I've tried writing to /tmp/, but without knowing if this is a permissions error, either locally or from gcp bucket, i'm having trouble debugging. Can you give me a sense of what kind of error this stems from?
SO question is here.
Dataflow processes require a setup.py, so all packages are current pip install google-cloud.
I've tried explicitly passing credentials using google-auth, but doesn't seem to make a difference.
The text was updated successfully, but these errors were encountered: