Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic "flow escalators" #3517

Open
warner opened this issue Jul 24, 2021 · 7 comments
Open

Implement basic "flow escalators" #3517

warner opened this issue Jul 24, 2021 · 7 comments
Labels
enhancement New feature or request SwingSet package: SwingSet

Comments

@warner
Copy link
Member

warner commented Jul 24, 2021

What is the Problem Being Solved?

In #3465 we outlined a plan for a basic pair of swingset run-queues ("high priority" and "low priority"), as a starting point for more sophisticated scheduling. Today @michaelfig and I sketched out a slightly more interesting scheme that could give users some meaningful control. The basic user story is:

Alice wants to make an AMM trade, but the chain is very busy, and the run-queue is never empty. She submits her transaction, but it gets stuck at the back of the line for a long time. She gets impatient, and is willing to pay more for expedited service. She needs a way to 1: learn about the state of her request, 2: influence it.

The main problems with #3465 are:

  • the queues are not visible from outside the kernel: there is no RPC query you can make to learn about its depth, and no way to correlate your transaction request with the contents of the run-queue
  • there is no way to move your message to a different queue once submitted
  • there are no fees involved in requesting higher priority

Description of the Design

The basic idea is the "Flow", following @dtribble's original formulation. Within a swingset machine, all messages sent on a single Flow are FIFO-ordered, so any related messages that depend upon ordering must be sent on the same Flow. As a starting point, each client machine will have a single Flow. On the chain, a new Flow will be allocated within the kernel during the provisioning process. The kernel uses a flowid to name the Flow. Each Flow is a FIFO queue of messages, possibly (usually) empty. Each Flow has a number of "boost points" (usually zero). Each point buys you one high-priority message delivery. The kernel exposes APIs (to the host) to create/allocate a Flow, and to add boost points to a flow.

The old run-queue of messages is replaced by two queues (high and low priority) of Flows. The kernel services the high-priority queue before the low-priority queue. The kernel services all messages from the first flow on a queue before the next flow on a queue (subject to the "boost points" limit, below).

When a message is added to an empty zero-boost-point flow, it is moved onto the back of the low-priority queue. If the flow has any boost points, the flow is instead moved to the back of the high-priority queue. If boost points are added to a flow on the low-priority queue, it is moved to the back of the high-priority queue. When a flow is empty, it is removed from the queues.

When executing messages from the high-priority queue, one boost point is deducted for each delivery. If the boost point count drops to zero, the flow is moved to the back of the low-priority queue, even if it still has messages waiting to be delivered. New messages may be added to the flow during that delivery, and as long as there are still boost points left, they will be executed before anything from the next flow on the high-priority queue.

When executing messages from the low-priority queue, the kernel will keep processing messages from the first flow until 1: the flow becomes empty, or 2: a message is added to a flow with boost points, or 3: boost points are added to a flow with messages. The kernel will then service the high-priority queue until it is empty, before moving back to the low-priority queue.

Each delivery comes from a flow, which establishes a default flowid, which will be inherited by events created during that crank. This ensures that the priority of the request is shared with the response. A future mechanism will enable vats to select an alternate flowid for some deliveries (the illustrative use case is a high-priority AMM trade, whose response gets the benefit of the client's prioritization, but the price-change notification that results should not), but for now everything gets the inherited value.

The Mailbox device will be the one place where the flowid can be set. The host currently calls this with (effectively) a list of (remote name, sequence number, message body) items, plus a single ack number, and the Mailbox device uses syscall.sendOnly() to enqueue each message to vat-vattp, from which they travel to vat-comms, and then on to Zoe and contract vats.

We augment these items to include a flowid. The Mailbox device will use a new argument to syscall.sendOnly to specify the flowid. For now, this will be the only control over flowids. In the future, we'll want to track flows through a c-list, but for now they'll be widely held. Any low-level vat code able to invoke syscall.sendOnly could use any existing flowid they like. Note that liveslots does not currently expose sendOnly to userspace, so for now only devices can use it.

All immediate consequences of the vattp-bound message will use the same flow. If the trade or other transaction can be completed without leaving the kernel (e.g. IBC) or waiting for a timer event (e.g. it could all happen within the same block, if blocks had infinite capacity), then all these consequences will happen before any other flows are serviced. This is probably better than:

  • the Mailbox-to-vattp message runs the queue
  • vattp-to-comms runs the queue
  • comms-to-zoe runs the queue
  • zoe-to-contract runs the queue
  • etc

The kernel will also expose a getSchedulerState API to the host. This will return a serializable data structure that details the ordered list of { flowid, boostPoints, numMessages } for each of the two queues. The full external kernel API is:

  • controller.allocateFlow() -> flowid
  • mailbox.deliverInbound([{msgnum, body, flowid}, ..], acknum)
  • controller.boost(flowid, numPoints)
  • controller.getSchedulerState() -> queues
    • where each queue is: [{flowid, boostPoints, numMessages}, ..]

This ticket is mostly about the kernel facilities, but @michaelfig will have a separate ticket for the cosmic-swingset code to match. This code will:

  • during provisioning, call allocateFlow to associated a swingset flowid with the client's address
    • this mapping will be stored in the cosmos state vector
    • the RPC client will be augmented to provide a CLI command that translates client address into flowid
  • when receiving solo-to-swingset messages, provide the flowid to deliverInbound
  • after the block is complete, call getSchedulerState and write the result into the cosmos state vector, under a well-known key.
    • the RPC client will be augmented to provide a CLI command to fetch+validate this data, look for a specific flowid, and report the Flow's place in the queues
    • by following this data over time, the user should be able to track their progress through the queue
  • the RPC client will be augmented to produce a new signed transaction type named "boost", which takes a flowid and a number of boost points
    • the cosmic-swingset handler for "boost" will deduct a fixed amount of RUN per boost point (perhaps 1000 uRUN) and call controller.boost() to increase that flow's boost points

Security Considerations

Test Plan

kernel-side unit tests

@warner
Copy link
Member Author

warner commented Jul 26, 2021

cc @rowgraus to think about the UX of this approach

@dckc
Copy link
Member

dckc commented Jul 28, 2021

... inherited by events created during that crank ...

"events" means send and resolve syscalls? I don't remember "events" in kernel-speak before.

@warner
Copy link
Member Author

warner commented Jul 28, 2021

I've been waffling on the terminology, but the issue is that there isn't a 1:1 relationship between syscalls and run-queue entries:

  • a successful syscall.send always puts a single message event on the run-queue
  • a successful syscall.resolve will cause one notify event to be placed on the run-queue for every current subscriber of that promise
  • a successful syscall.subscribe will add one notify event (for the subscribing vat), but only if the promise was already resolved

It's those two kinds of events that need to inherit a flow, from those three kinds of syscalls.

@warner
Copy link
Member Author

warner commented Aug 3, 2021

@dtribble was -1 on using "flow" to describe these (I think it doesn't sufficiently match his original definition), and also thought "streams" should be held in reserve for something else (although I need to understand what he has in mind), and suggested "activities". That feels a bit off to me, so I'm thinking of using "activity stream" for this, at least for now.

@michaelfig
Copy link
Member

@warner: @JimLarson and I were discussing the upcalls needed for the lien mechanism, and we noticed that there would need to be a special "immediate-priority" queue so that synchronous Golang calls would resolve. This could happen during other JS downcalls (such as a transfer caused by vat-bank.js). Such upcalls need to be scheduled and resolve their promises within the same chain context, without expecting the chain to make any further progress.

Otherwise, our liens would deadlock in trying to get a result from JS if users or other calls are pushing themselves onto the same queue and the upcall scheduling is deferred to another block.

@warner
Copy link
Member Author

warner commented Aug 3, 2021

Hm, it sounds like that queue needs to bypass everything, even the #3582 run-policy that could end the block early. We should talk more, I'm not sure that a queue of any sort is the right mechanism for this. And if vat-bank is waiting for a syscall.callNow() (i.e. device invocation) to return, we should not be allowing anything else to run within the kernel, and certainly not allowing other vats to get time.

What's the nature of the upcall? What swingset/vat-side activity does it need to trigger, and what sort of return data is it expecting?

@michaelfig
Copy link
Member

What's the nature of the upcall?

An attempt by cosmos to transfer tokens must first check with the attestation contract how much is locked up in a lien.

This is a bridge device message that must wait on the resolution of a promise to a value before returning that value to cosmos. In short, that delivery and all of its consequences must run immediately. The vat will probably signal its completion by sending back a call over the bridge.

What swingset/vat-side activity does it need to trigger

Messages via he bridge vat, a middleware vat, and the attestation contract vat.

, and what sort of return data is it expecting?

Purely jsonable data is the resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet
Projects
None yet
Development

No branches or pull requests

4 participants