Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing tasks in parallel steps in Web Cryptography API #368

Closed
10 tasks
dontcallmedom-bot opened this issue Sep 9, 2024 · 16 comments · May be fixed by w3c/strudy#840 or w3c/strudy#842
Closed
10 tasks

Missing tasks in parallel steps in Web Cryptography API #368

dontcallmedom-bot opened this issue Sep 9, 2024 · 16 comments · May be fixed by w3c/strudy#840 or w3c/strudy#842

Comments

@dontcallmedom-bot
Copy link

While crawling Web Cryptography API, the following algorithms fire an event, or resolve or reject a Promise, within a step that runs in parallel without first queuing a task:

See Dealing with the event loop in the HTML specification for guidance on how to deal with algorithm sections that run in parallel.

Cc @dontcallmedom @tidoust

This issue was detected and reported semi-automatically by Strudy based on data collected in webref.

@twiss
Copy link
Member

twiss commented Dec 12, 2024

@dontcallmedom(-bot) Thanks for raising this! I'm trying to address this, and wondering what would be an appropriate task source to queue tasks for this on? None of the generic task sources seem particularly appropriate. Should we define our own, or just leave it up to the implementation? (Are there observable consequences?)

@dontcallmedom
Copy link
Member

a task source is basically a mechanism to enforce order of execution of tasks; unless you need to be interleaved with tasks queued by other specs, defining your own is likely the right approach (caveat: that's the best of my understanding, but I'm by no mean an expert :); @tidoust has been helping some of his groups in this space, so he may have more informed advice

@davidben
Copy link
Collaborator

I do not believe any of these tasks have any reason to be ordered.

(Though they also didn't have any reason to be asynchronous. That was a mistake that cannot be fixed without making a new API, something that is needed for a lot of reasons. Indeed we've gotten reports of performance problems with WebCrypto... while not yet fully investigated, I expect the conclusion to be that it comes from being async.)

@annevk
Copy link
Member

annevk commented Dec 12, 2024

If implementations do it all on the same thread there's not a real need to go in parallel or queue tasks. You'd still have to await the promise, but it'd resolve much more quickly. That's somewhat observable though.

@davidben
Copy link
Collaborator

davidben commented Dec 12, 2024

Chromium currently tosses it onto a worker thread, but I suspect we want to just do it on the calling thread (no concrete plans, just something I've been eying based on WebCrypto's perf problems) because cryptography is broadly fast and the thread hop costs more than the operation. But we'll still be paying for the microtask hop.

Except RSA key generation is extremely not fast and moving that onto the calling thread may regress folks who were relying on that being on a worker thread for interactive performance. So I'm not sure moving that one would be a good idea.

This silliness is purely an artifact of WebCrypto's API. Fast or slow, cryptography is a purely CPU-bound workload. We're not waiting on network I/O, a random bit of external hardware, or user interaction. (You should think of it as Array.sort.) This means what we should have done is just make all synchronous APIs, even for slow ops like RSA keygen. Rather, if the operation is slow, we could have recommended and/or required that developers put it on a worker thread themselves. This is important because a single high-level operation usually calls into many cryptographic primitives in a row (e.g. you may have a whole key schedule of KDF calls). If that exceeds your interactivity budget, you'd punt the whole thing to a worker thread, without paying for task hops everywhere.

@annevk
Copy link
Member

annevk commented Dec 12, 2024

I see, that makes me a bit more hesitant about moving away from tasks, at least outside of dedicated workers.

@dlongley
Copy link

This silliness is purely an artifact of WebCrypto's API. Fast or slow, cryptography is a purely CPU-bound workload. We're not waiting on network I/O, a random bit of external hardware, or user interaction. (You should think of it as Array.sort.)

The asynchronous API does allow for future innovation that puts, for example, a local HSM or cloud-based HSM behind the signature API, where that kind of waiting would happen. Maybe this will never happen or will only ever go through the credentials API instead, but I thought worth noting.

@davidben
Copy link
Collaborator

No, this is entirely the wrong layer for that. WebCrypto contains low-level primitives including even hashing operations. If you're doing something funny with an external HSM, this is no longer some origin-local state it can compute over. It is an external credential that has specific semantics and security boundary that make sense for whatever that use case was. The credentials API makes much more sense for that.

Indeed, it's telling that WebAuthn, which is much closer to what you describe, is built into the credentials API and not WebCrypto.

@dlongley
Copy link

No, this is entirely the wrong layer for that. WebCrypto contains low-level primitives including even hashing operations. If you're doing something funny with an external HSM, this is no longer some origin-local state it can compute over. It is an external credential that has specific semantics and security boundary that make sense for whatever that use case was. The credentials API makes much more sense for that.

I think there are other considerations, but this isn't the right forum to debate this. I just wanted to note one possible use for the asynchronous API, which I understand you dismiss, but that others may not. (And I understand that it may also be that your view of the model here means you'd prefer no one even be tempted to consider using this layer with an external HSM).

@twiss
Copy link
Member

twiss commented Dec 12, 2024

I do not believe any of these tasks have any reason to be ordered.

I agree. However, I'm not sure what that means for the spec text (assuming for a moment that we still want to keep some of these operations as running in parallel)? The HTML spec seems to require a task source to queue a task. Are you saying we should leave it up to the implementation, or define our own, or?

[the discussion on running in parallel]

When called from the main thread, I think it would be polite to continue not to block the UI with expensive operations (not just RSA key generation, but also PBKDF2 and potentially Argon2 if we end up adding that, as well as certain PQC algorithms).

However, when called from a Worker, it might indeed be reasonable to run the operations synchronously. That might even be an improvement in terms of parallelizability (at least compared to Chromium's current implementation) in the sense that applications could then run two operations in parallel by creating two Workers.


As an added wrinkle, if we want to add streaming, I imagine it will complicate things further (if we don't run things synchronously), since we'd have to transfer or clone each chunk to the parallel thread. I'm not sure how much implementation effort this would be, either? If it's complicated, perhaps we could say that this is is only allowed on a Worker thread, where we then run everything on the same thread, to simplify things a bit?

@davidben
Copy link
Collaborator

davidben commented Dec 12, 2024

The (very very few) things where streaming makes sense are also the ones that are fast enough that there is no reason to move them off-thread.

@twiss
Copy link
Member

twiss commented Dec 12, 2024

Right, I suppose we could just say that e.g. crypto.subtle.digest runs on the same thread regardless of where it's called from, and then we allow streaming anywhere.

@davidben
Copy link
Collaborator

To clarify, that's not what is currently implemented in Chromium. I think we probably should make Chromium run it on the same thread (which further demonstrates that this was just a bad API because now we're paying for task overhead for no reason), but no imminent plans to do anything with that, and perhaps we'll find that that won't work for some horrible reason.

When called from the main thread, I think it would be polite to continue not to block the UI with expensive operations (not just RSA key generation, but also PBKDF2 and potentially Argon2 if we end up adding that, as well as certain PQC algorithms).

Keep in mind there are two ways to not block the UI. You can provide an async API that punts to a worker pool under the covers, or you can provide a sync API and either encourage or enforce that callers don't use it on the UI thread.

The latter makes more sense for a CPU-bound process. It's also possible to perform expensive computation in JavaScript, like trying to sort a huge array, but we don't provide async versions of those. We provide developers with the ability to make worker threads so they can manage that themselves. That's the right way to do this because, for instance, your PBKDF2 is probably used as part of a bigger protocol.

Of course, we've already shipped a bad WebCrypto API, so we have to define what it does. In that case, yeah, changing the expensive ops to run synchronously is probably a bad move. But I think ideally the spec would be written to be agnostic to this and just say they might be resolved whenever. (Running a fast operation on the calling thread is indistinguishable from running it on a worker thread that happened to come back very very fast.)

@twiss
Copy link
Member

twiss commented Dec 12, 2024

Yeah, I understand. I'm also not proposing to change that as part of fixing this issue, although perhaps we could try to change it as part of working towards #73 - but yeah it's a somewhat separate issue and perhaps off-topic.

For now, I'll just make a PR to queue tasks to resolve or reject the Promises on a Web Crypto-specific task queue (even though indeed they don't need to be ordered but the spec seems to require it), unless anyone objects or has a better idea; and then we can discuss larger changes later :)

@annevk
Copy link
Member

annevk commented Dec 13, 2024

You could give each algorithm its own task source so it's only ordered with respect to itself. I suppose you could even mint a unique task source each time. In practice though implementations only have a couple of task sources and all of these will likely end up on the same so a single task source seems fine and probably even preferred.

@twiss
Copy link
Member

twiss commented Dec 19, 2024

Fixed by #386.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants