Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: support resumable uploads #299

Merged

Conversation

stephenplusplus
Copy link
Contributor

  • Allow a choice in upload and createWriteStream between simple/resumable upload technique
  • upload: Stat the incoming file for size, default to simple for < 5mb, resumable for > 5mb
  • createWriteStream: default to resumable uploads
  • Integrate { resumableThreshold: n } option on storage instantiation (defaults to 5mb)
  • test integrity upload stream
  • finalize error messaging & objects

Fixes #298

createWriteStream uses the Resumable Upload API: http://goo.gl/jb0e9D.

The process involves these steps:

  1. POST the file's metadata. We get a resumable upload URI back, then cache it with ConfigStore.
  2. PUT data to that URI with a Content-Range header noting what position the data is beginning from. We also cache, at most, the first 16 bytes of the data being uploaded.
  3. Delete the ConfigStore cache after the upload completes.

If the initial upload operation is interrupted, the next time the user uploads the file, these steps occur:

  1. Detect the presence of a cached URI in ConfigStore.
  2. Make an empty PUT request to that URI to get the last byte written to the remote file.
  3. PUT data to the URI starting from the first byte after the last byte returned from the call above.

If the user tries to upload entirely different data to the remote file:

  1. -- same as above --
  2. -- same as above --
  3. -- same as above --
  4. Compare the first chunk of the new data with the chunk in cache. If it's different, start a new resumable upload (Step 1 of the first example).

@stephenplusplus stephenplusplus force-pushed the spp--storage-resumable-uploads branch 3 times, most recently from e5a72fe to ba2e8cf Compare November 11, 2014 22:05

var bytesWritten = 0;
var limitStream = through(function(chunk, enc, next) {
// Determine if this is the same content uploaded previously.

This comment was marked as spam.

@stephenplusplus stephenplusplus force-pushed the spp--storage-resumable-uploads branch 2 times, most recently from a665191 to f76c406 Compare November 12, 2014 19:13
return;
}

lastByteWritten = -1;

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@ryanseys
Copy link
Contributor

Overall this looks good. No big issues. RETRY_LIMIT might want to be increased if we put in exponential backoff as suggested. 5 seems like a more sane default (as suggested here).

numBytesWritten = null;

setTimeout(resumeUpload, waitTime);
return;

This comment was marked as spam.

This comment was marked as spam.

@stephenplusplus
Copy link
Contributor Author

Recent best practices have emerged, so I figured we should put these in place before merging. I've added a task list in the initial post with the intended revisions so far; more will likely be coming.

One of the revisions is allowing a user to specify a preference of a simple or resumable upload. I'm seeking opinions on converting the upload method to a different signature than what we have currently.

Current:

myBucket.upload("./photo.jpg", myFile, { my: "metadata" }, function (err, file) {})

Suggested:

myBucket.upload("./photo.jpg", {
  destination: myFile,
  metadata: {
    my: "metadata"
  },
  resumable: (true || false)
}, function (err, file) {})

file.createWriteStream() will also need this functionality added.

Current:

myFile.createWriteStream({ my: "metadata" })

Suggested:

myFile.createWriteStream({
  resumable: false, // default: true
  metadata: {
    my: "metadata"
  }
})

Any better ideas?

@silvolu
Copy link
Contributor

silvolu commented Nov 20, 2014

Looks good, but I'd like the user to be able to change the resumable_threshold (that defaults to 5MB). Could we expose a configuration for storage, or add setters for similar values? In the future we might need it for the chunk size as well, and we could use it to allow the user to change the default for createWriteStream at a global level.

@stephenplusplus
Copy link
Contributor Author

Config on the storage level makes sense to me.

var gcloud = require("gcloud")({ /* conn info */ })
gcloud.storage({ resumableThreshold: n })

&

var gcloud = require("gcloud")
var storage = gcloud.storage({ /* conn info */, resumableThreshold: n  })
  1. What format do we accept for n? (bytes, kb, mb?)
  2. Is it ok to rely on our docs to explain resumableThreshold won't have an effect on createWriteStream uploads? I can see that being a point of confusion

@ryanseys
Copy link
Contributor

What format do we accept for n? (bytes, kb, mb?)

Bytes. The header is in bytes, so this seems like a simple choice.

Is it ok to rely on our docs to explain resumableThreshold won't have an effect on createWriteStream uploads?

Wait, why not?

@stephenplusplus
Copy link
Contributor Author

In a stream, we can't stat a file for its size. It comes to us in small kb chunks, meaning we don't know if it's over a threshold until after we've already formed the request.

I suppose if we wanted to, we could buffer n threshold into memory before beginning the request (which is the time we have to say resumable vs simple), but that seems like a dangerous approach.

& +1 on bytes.

@ryanseys
Copy link
Contributor

Fair enough. Plus you don't really know that the readable stream is a file at all. That being said, should resumable even work with streams unless they explicitly give us the filename to use?

@stephenplusplus
Copy link
Contributor Author

That's a great question, but I think it's impossible to answer. Still, I anticipate resumable will be a desirable default, and speaking technically, we have a solution for if we resume an upload, but are sent different data than we were originally (we bail and start a new upload).

And in any case, the user knows best what they are doing, so we [will] allow them to be explicit about what type of upload to use at the time of the upload.

@ryanseys
Copy link
Contributor

Can we get access to the readable stream that is piping their data to our writable stream? In theory, if we can, we could try to yank the fd (file descriptor) from it and then we could sneakily stat the file to find out its name and size? This is a total shot in the dark though.

@stephenplusplus
Copy link
Contributor Author

With a stream, we should only be aware of the data coming in, and not about how/where/etc. It would also be a bit magical if we tried to implement something like that. And usually, whenever there's magic, the solution is to add an option or variation of the method that gives the user explicit control of the outcome. We will have both of those things ({ resumable: false } and bucket.upload)

@ryanseys
Copy link
Contributor

Yeah, that would be too much magic, agreed. Getting back to the original question, I think that it's safe to say that if the developer is uploading using a stream, they know that resumableThreshold won't work. I would only expect it to work if we explicitly give the filename i.e. like in .upload(), so if you can do better than that, that's exceeding expectations in my mind.

@stephenplusplus stephenplusplus force-pushed the spp--storage-resumable-uploads branch from 7bd17d9 to 7e1dde8 Compare November 20, 2014 17:13
sofisl pushed a commit that referenced this pull request Nov 11, 2022
- [ ] Regenerate this pull request now.

PiperOrigin-RevId: 485941276

Source-Link: https://togithub.com/googleapis/googleapis/commit/a5f5928e736ea88c03e48c506a19fa632b43de9e

Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/61ebfaa325101bc9b29ee34900b45b2f0d23981e
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiNjFlYmZhYTMyNTEwMWJjOWIyOWVlMzQ5MDBiNDViMmYwZDIzOTgxZSJ9

BEGIN_NESTED_COMMIT
chore: override API mixins when needed
PiperOrigin-RevId: 477248447

Source-Link: https://togithub.com/googleapis/googleapis/commit/4689c7380444972caf11fd1b96e7ec1f864b7dfb

Source-Link: https://togithub.com/googleapis/googleapis-gen/commit/c4059786a5cd805a0151d95b477fbc486bcbcedc
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYzQwNTk3ODZhNWNkODA1YTAxNTFkOTViNDc3ZmJjNDg2YmNiY2VkYyJ9
END_NESTED_COMMIT
sofisl pushed a commit that referenced this pull request Nov 11, 2022
* chore(main): release 3.1.0

* 🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
sofisl pushed a commit that referenced this pull request Nov 16, 2022
…ript generator. (#299)

Also removing the explicit generator tag for the IAMPolicy mixin for the kms and pubsub APIS as the generator will now read it from the .yaml file.

PiperOrigin-RevId: 385101839

Source-Link: googleapis/googleapis@80f4042

Source-Link: googleapis/googleapis-gen@d3509d2
sofisl pushed a commit that referenced this pull request Nov 16, 2022
🤖 I have created a release *beep* *boop*
---


## 1.0.0 (2022-05-18)


### ⚠ BREAKING CHANGES

* update library to use Node 12 (#374)
* rename parent to project in SearchRelatedAccountGroupMembershipsRequest (#370)
* remove key management API (#366)
* Remove RecaptchaEnterpriseServiceV1Beta1Client.
* The library now supports Node.js v10+. The last version to support Node.js v8 is tagged legacy-8 on NPM.
* upgrade engines field to >=8.10.0 (#2)

### Features

* add crud support for keys ([#84](googleapis/nodejs-recaptcha-enterprise#84)) ([adfc3f9](googleapis/nodejs-recaptcha-enterprise@adfc3f9))
* add GetMetrics and MigrateKey methods to reCAPTCHA enterprise API ([#318](googleapis/nodejs-recaptcha-enterprise#318)) ([55b1adc](googleapis/nodejs-recaptcha-enterprise@55b1adc))
* add new reCAPTCHA Enterprise fraud annotations ([#334](googleapis/nodejs-recaptcha-enterprise#334)) ([24fdff1](googleapis/nodejs-recaptcha-enterprise@24fdff1))
* add plural and singular resource descriptor ([#78](googleapis/nodejs-recaptcha-enterprise#78)) ([a67ffa7](googleapis/nodejs-recaptcha-enterprise@a67ffa7))
* add reCAPTCHA Enterprise account defender API methods ([#328](googleapis/nodejs-recaptcha-enterprise#328)) ([2099c50](googleapis/nodejs-recaptcha-enterprise@2099c50))
* Add support for Password Check through the private_password_leak_verification field in the reCAPTCHA Assessment ([#376](googleapis/nodejs-recaptcha-enterprise#376)) ([7c1583c](googleapis/nodejs-recaptcha-enterprise@7c1583c))
* add the v1 API surface ([#141](googleapis/nodejs-recaptcha-enterprise#141)) ([bb1bd33](googleapis/nodejs-recaptcha-enterprise@bb1bd33))
* deferred client initialization ([#128](googleapis/nodejs-recaptcha-enterprise#128)) ([3de999e](googleapis/nodejs-recaptcha-enterprise@3de999e))
* drop node8 support, support for async iterators ([#145](googleapis/nodejs-recaptcha-enterprise#145)) ([cc4cc51](googleapis/nodejs-recaptcha-enterprise@cc4cc51))
* export protos in src/index.ts ([ffd77ca](googleapis/nodejs-recaptcha-enterprise@ffd77ca))
* introduces style enumeration ([#234](googleapis/nodejs-recaptcha-enterprise#234)) ([35f1bb6](googleapis/nodejs-recaptcha-enterprise@35f1bb6))
* load protos from JSON, grpc-fallback support ([7a4b2a6](googleapis/nodejs-recaptcha-enterprise@7a4b2a6))
* move to typescript code generation ([#87](googleapis/nodejs-recaptcha-enterprise#87)) ([11051db](googleapis/nodejs-recaptcha-enterprise@11051db))
* support apiEndpoint override in client constructor ([#30](googleapis/nodejs-recaptcha-enterprise#30)) ([1192afd](googleapis/nodejs-recaptcha-enterprise@1192afd))
* turns on self-signed JWT feature flag ([#311](googleapis/nodejs-recaptcha-enterprise#311)) ([c12da34](googleapis/nodejs-recaptcha-enterprise@c12da34))
* update scopes and classifications ([#60](googleapis/nodejs-recaptcha-enterprise#60)) ([b216630](googleapis/nodejs-recaptcha-enterprise@b216630))


### Bug Fixes

* allow calls with no request, add JSON proto ([ab643f8](googleapis/nodejs-recaptcha-enterprise@ab643f8))
* **browser:** check for fetch on window ([#226](googleapis/nodejs-recaptcha-enterprise#226)) ([8eb79dd](googleapis/nodejs-recaptcha-enterprise@8eb79dd))
* **build:** switch primary branch to main ([#315](googleapis/nodejs-recaptcha-enterprise#315)) ([2fc99ad](googleapis/nodejs-recaptcha-enterprise@2fc99ad))
* DEADLINE_EXCEEDED retry code is idempotent ([#10](googleapis/nodejs-recaptcha-enterprise#10)) ([746151c](googleapis/nodejs-recaptcha-enterprise@746151c))
* **deps:** bump google-gax to 1.7.5 ([#68](googleapis/nodejs-recaptcha-enterprise#68)) ([0605bb8](googleapis/nodejs-recaptcha-enterprise@0605bb8))
* **deps:** google-gax v2.17.0 with mTLS ([#294](googleapis/nodejs-recaptcha-enterprise#294)) ([45c12e5](googleapis/nodejs-recaptcha-enterprise@45c12e5))
* **deps:** google-gax v2.17.1 ([#297](googleapis/nodejs-recaptcha-enterprise#297)) ([15640f1](googleapis/nodejs-recaptcha-enterprise@15640f1))
* **deps:** google-gax v2.24.1 ([#309](googleapis/nodejs-recaptcha-enterprise#309)) ([de80090](googleapis/nodejs-recaptcha-enterprise@de80090))
* **deps:** pin TypeScript below 3.7.0 ([0e96508](googleapis/nodejs-recaptcha-enterprise@0e96508))
* **deps:** require google-gax v2.12.0 ([#270](googleapis/nodejs-recaptcha-enterprise#270)) ([ab16a25](googleapis/nodejs-recaptcha-enterprise@ab16a25))
* **deps:** update dependency google-gax to v1 ([#17](googleapis/nodejs-recaptcha-enterprise#17)) ([0f9e159](googleapis/nodejs-recaptcha-enterprise@0f9e159))
* do not modify options object, use defaultScopes ([#222](googleapis/nodejs-recaptcha-enterprise#222)) ([807b692](googleapis/nodejs-recaptcha-enterprise@807b692))
* do not retry request on DEADLINE_EXCEEDED ([a6e9f4a](googleapis/nodejs-recaptcha-enterprise@a6e9f4a))
* **docs:** bump the release level to beta ([#76](googleapis/nodejs-recaptcha-enterprise#76)) ([8a2e2c0](googleapis/nodejs-recaptcha-enterprise@8a2e2c0))
* **docs:** link to reference docs section on googleapis.dev ([#35](googleapis/nodejs-recaptcha-enterprise#35)) ([14ada6b](googleapis/nodejs-recaptcha-enterprise@14ada6b))
* **docs:** move to new client docs URL ([#32](googleapis/nodejs-recaptcha-enterprise#32)) ([6a95276](googleapis/nodejs-recaptcha-enterprise@6a95276))
* **docs:** snippets are now replaced in jsdoc comments ([#74](googleapis/nodejs-recaptcha-enterprise#74)) ([b3c31fc](googleapis/nodejs-recaptcha-enterprise@b3c31fc))
* enum, bytes, and Long types now accept strings ([394cfd8](googleapis/nodejs-recaptcha-enterprise@394cfd8))
* export explicit version from protos.js ([#150](googleapis/nodejs-recaptcha-enterprise#150)) ([0bfb3c7](googleapis/nodejs-recaptcha-enterprise@0bfb3c7))
* GoogleAdsError missing using generator version after 1.3.0 ([#279](googleapis/nodejs-recaptcha-enterprise#279)) ([6dc35a7](googleapis/nodejs-recaptcha-enterprise@6dc35a7))
* include the correct version of node in a header ([#46](googleapis/nodejs-recaptcha-enterprise#46)) ([2cc8099](googleapis/nodejs-recaptcha-enterprise@2cc8099))
* make request optional in all cases ([#290](googleapis/nodejs-recaptcha-enterprise#290)) ([e18a1d1](googleapis/nodejs-recaptcha-enterprise@e18a1d1))
* pass x-goog-request-params header for streaming calls ([983411e](googleapis/nodejs-recaptcha-enterprise@983411e))
* proper fallback option handling ([#180](googleapis/nodejs-recaptcha-enterprise#180)) ([52fe53d](googleapis/nodejs-recaptcha-enterprise@52fe53d))
* proper routing headers ([4d1b1d3](googleapis/nodejs-recaptcha-enterprise@4d1b1d3))
* regen protos and tests, formatting ([#169](googleapis/nodejs-recaptcha-enterprise#169)) ([731fe3b](googleapis/nodejs-recaptcha-enterprise@731fe3b))
* remove eslint, update gax, fix generated protos, run the generator ([#155](googleapis/nodejs-recaptcha-enterprise#155)) ([21b09f5](googleapis/nodejs-recaptcha-enterprise@21b09f5))
* remove key management API ([#366](googleapis/nodejs-recaptcha-enterprise#366)) ([44a5a4b](googleapis/nodejs-recaptcha-enterprise@44a5a4b))
* rename parent to project in SearchRelatedAccountGroupMembershipsRequest ([#370](googleapis/nodejs-recaptcha-enterprise#370)) ([aad0883](googleapis/nodejs-recaptcha-enterprise@aad0883))
* synth.py clean up for multiple version ([#172](googleapis/nodejs-recaptcha-enterprise#172)) ([ee1c250](googleapis/nodejs-recaptcha-enterprise@ee1c250))
* Updating WORKSPACE files to use the newest version of the Typescript generator. ([#299](googleapis/nodejs-recaptcha-enterprise#299)) ([6787e23](googleapis/nodejs-recaptcha-enterprise@6787e23))
* use compatible version of google-gax ([dfb174a](googleapis/nodejs-recaptcha-enterprise@dfb174a))
* use require() to load JSON protos ([#273](googleapis/nodejs-recaptcha-enterprise#273)) ([fdbc0fe](googleapis/nodejs-recaptcha-enterprise@fdbc0fe))


### Build System

* update library to use Node 12 ([#374](googleapis/nodejs-recaptcha-enterprise#374)) ([4042ae2](googleapis/nodejs-recaptcha-enterprise@4042ae2))
* upgrade engines field to >=8.10.0 ([#2](googleapis/nodejs-recaptcha-enterprise#2)) ([94d6a49](googleapis/nodejs-recaptcha-enterprise@94d6a49))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
sofisl pushed a commit that referenced this pull request Nov 18, 2022
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [@types/mocha](https://togithub.com/DefinitelyTyped/DefinitelyTyped) | devDependencies | major | [`^7.0.0` -> `^8.0.0`](https://renovatebot.com/diffs/npm/@types%2fmocha/7.0.2/8.0.0) |

---

### Renovate configuration

:date: **Schedule**: "after 9am and before 3pm" (UTC).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#googleapis/nodejs-security-center).
sofisl pushed a commit that referenced this pull request Jan 26, 2023
🤖 I have created a release *beep* *boop*
---


## [2.0.0](googleapis/nodejs-ai-platform@v1.19.0...v2.0.0) (2022-06-23)


### ⚠ BREAKING CHANGES

* update library to use Node 12 (#304)

### Features

* add ConvexAutomatedStoppingSpec to StudySpec in aiplatform v1beta1 study.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add display_name and metadata to ModelEvaluation in aiplatform model_evaluation.proto ([#297](googleapis/nodejs-ai-platform#297)) ([1e6dcb6](googleapis/nodejs-ai-platform@1e6dcb6))
* add Examples to Explanation related messages in aiplatform v1beta1 explanation.proto ([#307](googleapis/nodejs-ai-platform#307)) ([c69ac2b](googleapis/nodejs-ai-platform@c69ac2b))
* add IAM policy to aiplatform_v1beta1.yaml ([#308](googleapis/nodejs-ai-platform#308)) ([6557767](googleapis/nodejs-ai-platform@6557767))
* add JOB_STATE_UPDATING to JobState in aiplatform v1beta1 job_state.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add LatestMonitoringPipelineMetadata to ModelDeploymentMonitoringJob in aiplatform v1beta1 model_deployment_monitoring_job.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add ListModelVersion, DeleteModelVersion, and MergeVersionAliases rpcs to aiplatform v1beta1 model_service.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add MfsMount in aiplatform v1beta1 machine_resources.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add model_id and parent_model to TrainingPipeline in aiplatform v1beta1 training_pipeline.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add model_version_id to DeployedModel in aiplatform v1beta1 endpoint.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add model_version_id to PredictResponse in aiplatform v1beta1 prediction_service.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add model_version_id to UploadModelRequest and UploadModelResponse in aiplatform v1beta1 model_service.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add nfs_mounts to WorkPoolSpec in aiplatform v1beta1 custom_job.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add PredictRequestResponseLoggingConfig to aiplatform v1beta1 endpoint.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add reserved_ip_ranges to CustomJobSpec in aiplatform v1 custom_job.proto ([#286](googleapis/nodejs-ai-platform#286)) ([863748a](googleapis/nodejs-ai-platform@863748a))
* add reserved_ip_ranges to CustomJobSpec in aiplatform v1beta1 custom_job.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* add version_id to Model in aiplatform v1beta1 model.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* rename Similarity to Examples, and similarity to examples in ExplanationParameters in aiplatform v1beta1 explanation.proto ([863748a](googleapis/nodejs-ai-platform@863748a))
* **samples:** add create-featurestore samples ([#317](googleapis/nodejs-ai-platform#317)) ([5876d81](googleapis/nodejs-ai-platform@5876d81))


### Bug Fixes

* added retries to flaky test ([#299](googleapis/nodejs-ai-platform#299)) ([ffc9a3f](googleapis/nodejs-ai-platform@ffc9a3f))


### Build System

* update library to use Node 12 ([#304](googleapis/nodejs-ai-platform#304)) ([0679cda](googleapis/nodejs-ai-platform@0679cda))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: resumable uploads
3 participants