-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Refactor upchunk to use readable streams for memory usage impro… #95
Conversation
import xhr, { type XhrUrlConfig, type XhrHeaders, type XhrResponse } from 'xhr'; | ||
|
||
const DEFAULT_CHUNK_SIZE = 30720; | ||
const DEFAULT_MAX_CHUNK_SIZE = 512000; // in kB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
512MB seems a lot for a chunk, will it ever use this much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh I'm not sure but I think that can happen (since there are tradeoffs in overall time between big chunk that takes awhile and several smaller chunks that each take shorter but add overhead on total RTT). This is just keeping it original value (originally defined inline in UpChunk::constructor()
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a simple algorithm on dynamic chunk sizing at the moment: the chunk size doubles when the last chunk duration (lastChunkInterval
) was less than 10 seconds, and it halves when the last chunk took more than 30 seconds to upload. So as-is we should only see a 512MB chunk if the client was just able to upload a 256MB chunk in < 10 seconds (25.6 MB/s / 205 Mbit/s).
If we don't scale the chunk size up, then high bandwidth clients are kneecapped by round trip time and don't really have the opportunity to hit full throughput. However there's of course a tradeoff where it starts using too much memory.
Honestly 512MB is as large or larger than what I saw from any of the other sites I tested back in February. Google Drive, for example, caps out at 200MB. If there's concern that 512MB is too large, then 256MB is probably a more reasonable limit.
I also worry that 256 KB is too low, but then nobody will ever hit that unless it's taking them >30 seconds to upload 512KB. Really this setting is about trying to balance upload throughput vs memory usage vs maximum amount of time lost if a chunk has to retry.
return this.chunkSize * 1024; | ||
} | ||
|
||
async *[Symbol.asyncIterator](): AsyncIterator<Blob> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool!
let res: XhrResponse | undefined; | ||
try { | ||
this.attemptCount = this.attemptCount + 1; | ||
this.lastChunkStart = new Date(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (non-blocking): using Date.now()
is a little more direct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was another example of me trying to "leave code the same as much as possible". Since this is how it was previously implemented, I was planning on keeping as is.
// Side effects | ||
const lastChunkEnd = new Date(); | ||
const lastChunkInterval = | ||
(lastChunkEnd.getTime() - this.lastChunkStart.getTime()) / 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if using Date.now()
update here too. it also saves some characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
"private-method-regex": "^\\*?\\[?[a-zA-Z][\\w\\d\\.]*\\]?$", | ||
"protected-method-regex": "^\\*?\\[?[a-zA-Z][\\w\\d\\.]*\\]?$", | ||
"static-method-regex": "^\\*?\\[?[a-zA-Z][\\w\\d\\.]*\\]?$", | ||
"function-regex": "^\\*?\\[?[a-zA-Z][\\w\\d\\.]*\\]?$" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these really needed? seems a lot of extra config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that they're all needed. This was based on a solution to the general problem of tslint not properly supporting symbols, taken from here: microsoft/tslint-microsoft-contrib#459. I can add a comment referencing that? Given that tslint is deprecated/unmaintained, I'd lean towards not changing this and instead having a separate effort to migrate from tslint to eslint (more like our setup in Open Elements).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM I'll leave for others to chime in as well
return this.chunkSize * 1024; | ||
} | ||
|
||
async *[Symbol.asyncIterator](): AsyncIterator<Blob> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is upchunk's expected browser support matrix? Does our usage of Symbols, generators, etc. inadvertently change it? Or does it compile away with regenerator runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's always been a bit squishy, but, given the hard dependency on ReadableStream
(and the corresponding update to the File API) for this effort, we should be good. Good callout though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, and since you're targeting making this a major, it should be fine. (I also wasn't sure what the overlap between generators/ReadableStream/Symbols was, but probably large enough not to worry about except IE).
Worth solidifying the support matrix somewhere, though.
…vement.
Overview
This PR is a rearchitecture of
Upchunk
to useReadableStreams
as the basis for reading bytes from a file. Unlike the current implementation, which relies on loading the entire file into the JavaScript runtime heap, this new architecture allows us to reduce the memory footprint to a givenread()
from the file'sReadableStream
(plus any remaining bytes from the previous read that have yet to be uploaded).Why a separate
AsyncIterable
class (ChunkedStreamIterable
)?While the code could have been entirely written inline as part of the
UpChunk
instance, creating a class that conforms to ECMA standards for async iteration allows us to:UpChunk::sendChunks()
.ChunkedStreamIterable
shouldn't need to be refactored as frequently.MediaRecorder
support.WebSockets
orWebRTC
data channels, transforms, etc.)Why a "pull"-based use of streams (instead of a "push")
Using things like
pipes
withbackpressure
can often be a very clean way to chain together discrete transformations and side effects when working with streams. Unfortunately, the current APIs don't have sufficient ways of cleanly handling:dynamicChunkSize
enabled)Additional notes
Given the scope of changes here, additional tests have been added to validate that the "uploaded" files are identical (in bytes) to the files provided to
UpChunk
. Also, even though the API has only been changed in an additive way (e.g. addingoff()
andonce()
methods) and should be fully backwards compatible, this will likely be released as a major version change, due to the scope of the refactor.resolves: #89