Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cloud): Upload through HTTP proxy node #103

Merged
merged 15 commits into from
Oct 22, 2024

Conversation

mariocynicys
Copy link
Member

@mariocynicys mariocynicys commented Nov 12, 2021

Changes:

  • Disable multiperiod when using cloud upload (for now)
  • Use default credentials from GCS or S3 Python libraries to upload
  • Remove gsutil dependency and use threaded HTTP server, without temp files

Closes #47

Copy link
Member

@joeyparrish joeyparrish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to play with this locally for a bit. I might be able to make some concrete suggestions or contribute to your branch.

streamer/pipeline_configuration.py Outdated Show resolved Hide resolved
streamer/proxy_node.py Outdated Show resolved Hide resolved
streamer/proxy_node.py Outdated Show resolved Hide resolved
streamer/proxy_node.py Outdated Show resolved Hide resolved
shaka-streamer Outdated Show resolved Hide resolved
@joeyparrish
Copy link
Member

My -H "Authorization=Bearer $(gcloud auth print-access-token)" hack failed after about one hour of running. I started getting HTTP 429 responses from Google Cloud, which makes me think it wasn't an issue with the token itself, but something more fundamental we will need to resolve.

Google's docs say:

If you run into any issues such as increased latency or error rates, pause your ramp-up or reduce the request rate temporarily in order to give Cloud Storage more time to scale your bucket. You should use exponential backoff to retry your requests when:

Receiving errors with 408 and 429 response codes.
Receiving errors with 5xx response codes.

@joeyparrish
Copy link
Member

I'm going to revive this and push changes into your branch so you are properly credited for the PR. Changes I'm planning for this PR:

  1. Go back to using the cloud URL in -c for compatibility for now
  2. Revert extra headers feature for now (use case unclear for now)
  3. Remove upload to HTTP without gs:// or s3:// (use case unclear for now)
  4. Ignore -o if -c given (no need for local files if uploading to cloud)
  5. Revert changes to periodconcat_node and mark it as incompatible with cloud upload (already can't be used for live, helps avoid local temp files)
  6. Remove use_local_proxy from pipeline config, implied by -c
  7. Pipeline directly from incoming request to outgoing request in proxy_node
  8. Try to handle authentication via default credentials in gcs and s3 python modules

I'll follow up later with new changes to:

  1. Allow cloud URLs to be placed into -o
  2. Make -c a deprecated alias to -o (overrides -o if both given)
  3. Add command line arg to pass initial access tokens for oauth via file, maybe also environment variable

@joeyparrish joeyparrish changed the title feat(cloud): Adding an upload proxy node feat(cloud): Upload through HTTP proxy node Oct 22, 2024
@joeyparrish
Copy link
Member

I've started by rolling up your original commits, rebasing them, and force-pushing.

mariocynicys and others added 2 commits October 22, 2024 12:30
 - Add convenience script for linting and testing
 - Revert -c for compatibility
 - Remove -H (use case unclear)
 - Do not treat proxy node as special
 - Simplify interface to proxy node
 - Simplify proxy node internals to use Python libraries for GCS and S3
 - Disable multiperiod with cloud upload (for now)
 - Revert changes to multiperiod node
@joeyparrish
Copy link
Member

@mariocynicys, still a WIP since I haven't finished the S3 upload yet, but it's working well enough for GCS that I'm going to deploy it to my lab to revive our live stream. Without this, my live stream was running into errors where gsutil would leave files behind in /tmp and fill up the disk.

Please take a look and give me your thoughts.

@joeyparrish joeyparrish marked this pull request as ready for review October 22, 2024 20:53
@joeyparrish
Copy link
Member

Okay, S3 is working now, too.

@joeyparrish
Copy link
Member

This is working in our lab deployment now!

@joeyparrish joeyparrish merged commit 20c2704 into shaka-project:main Oct 22, 2024
2 checks passed
@jspizziri
Copy link

@joeyparrish now that 1.0.0 is out we started using it and discovered that HTTP was removed

Remove upload to HTTP without gs:// or s3:// (use case unclear for now)

This was actually how we were using the streamer, as we use storage that is neither S3 or GCS. If you're doing that, the only option was HTTP. If there was some kind of storage adapter that would be nice and we'd write our own adapter, however in the absence of that HTTP was our only option.

Any possibility to get it added back?

@joeyparrish
Copy link
Member

Yes, definitely. What are your requirements?

Do you need specific headers or auth tokens, or just for Packager to write output directly to an HTTP(S) URL?

@joeyparrish
Copy link
Member

@jspizziri, I filed #210 to discuss your requirements for HTTP upload. Please let us know what you need!

@github-actions github-actions bot added the status: archived Archived and locked; will not be updated label Dec 21, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: archived Archived and locked; will not be updated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace gsutil-based CloudNode with local authentication proxies and HTTP output support
3 participants