Skip to content

Commit

Permalink
docs: ucan invocation (#181)
Browse files Browse the repository at this point in the history
Closes #175

---------

Co-authored-by: Benjamin Goering <[email protected]>
  • Loading branch information
vasco-santos and gobengo authored Apr 10, 2023
1 parent abe2410 commit 04198ef
Showing 1 changed file with 54 additions and 0 deletions.
54 changes: 54 additions & 0 deletions docs/ucan-invocation-stream.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# UCAN Invocation Stream

> UCAN invocations cause transactions that mutate the w3up platform. The history of these invocations will be an audit of the system, which we can rely on for playback operations from a given time, as well as for asynchronous computations for Telemetry, Metrics, User Facing aggregations, among others.
## Background

UCAN is a chained-capability format. A UCAN contains all of the information that one would need to perform some task, and the provable authority to do so.

We can identify three core components on our services built relying on UCANs:
- Task to be executed (`with`, `can` and `nb` fields of UCAN)
- Invocation (task to be executed together with the provable authority to do so `proofs` + `signature`)
- Workflow (file containing one or more invocations to be executed)

With the above components, we can say that:
- One task may have many receipts (one per invocation) all with the same result.
- We could (in theory) receive same invocation in multiple CARs

## Architecture

The entry point for the UCAN Invocation stream is an HTTP endpoint `POST /ucan`. It will receive `workflows` and `receipts` from other services. All invocations and their receipts are persisted in buckets and added into the Stream.

AWS Kinesis is the central piece of this architecture. Multiple stream consumers can be hooked into AWS Kinesis for post processing of UCAN invocations.

![High level Architecture](https://bafybeifub7gefocq2yqw4dbvpbon2aduw6sq4aqfaergaennhgts4d3hpa.ipfs.w3s.link/ucan-log-stream-v2.jpg)

Note that at the time of writing Event Archival flow is still to be implemented.

### Buckets

UCAN Invocation Stack contains 3 buckets so that it can keep an audit of the entire system, while allowing this information to be queried in multiple fashions.

Firstly, the **`workflow-store` bucket** stores the entire encoded file containing one or more invocations to be executed. It is stored as received from UCAN services interacting with UCAN Invocation Stream. It is keyed as `${workflow.cid}/${workflow.cid}` and its value is likely in CAR format. However, CID codec should tell if it is something else.

At the invocation level, the **`invocation-store` bucket** is responsible for storing two types of values related to UCAN invocations:
- a pseudo symlink to `/${workflow.cid}/${workflow.cid}` via key `${invocation.cid}/${workflow.cid}.workflow` to track where each invocation lives in a Workflow file. As a pseudo symlink, it is an empty object.
- a receipt block issued for a specific task invocation via key `${invocation.cid}/${invocation.cid}.receipt` with block bytes as value.

In the tasks context, the **`task-store` bucket** stores two types of values related to executed tasks:
- a pseudo symlink to `/${invocation.cid}/${invocation.cid}` via `${task.cid}/${invocation.cid}.invocation` to enable looking up invocations and receipts by a task. As a pseudo symlink, it is an empty object.
- a block containing the `out` field of the receipt. So that when we get an invocation with the same task we can read the result and issue receipt without rerunning a task. Could be written on first receipt. It is keyed with `${task.cid}/${task.cid}.result`.

### Consumers

We keep 365 days of data history in the stream that can be replayed as needed when new consumers are added. A consumer of AWS Kinesis is a lambda function that will receive a batch of stream events and handle them.

UCAN Stream Consumers can be added as needed. Each consumer must perform atomic operations and be independent, so that we tolerate failures and can easily replay the stream if needed.

### Databases

Consumers might need other infrastructure resources to track state based on the events that go through the stream. For instance, to track system wide metrics we have the `admin-metrics` table and to track space metrics we have the `space-metrics` table.

The `admin-metrics` table has a partition key `name` with the metric name we keep track. With this, we can easily update and query each of the `admin` metrics we care about.

In the context of `space-metrics` table, a partition key with `space` is used together with a sort key `name` with the metric name. This way, we are able to track and query each metric for a given space.

0 comments on commit 04198ef

Please sign in to comment.