-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addressing individual task results as opposed to invocations #6
Comments
Additionally that would imply that I can pass task reference that |
I would love to know more about this! I had assumed that the Merkle proof made these interchangable relative to a stable root CID.
Because actions can be run multiple times with different results (because they occur over time), they merely need to be made unique. Updating a counter multiple times needs to be counted as independent actions, for example. One possible solution is to move the nonce to each task rather than on the wrapper. This gives you a stable identifier for that specific call. Another is that you could send many single invocations wrapped in an array, used this for routing multiple invocations. This has a clean separation of concerns, but is an extra level of wrapping. Perhaps to explain how we're thinking about this stuff in IPVM: a coordinator (possible the initial job creator, maybe external) takes the job, splits it up into new subtasks that are semantically equivalent, and creates subinvocations. These can reference each others results as promises, so there's no need to return the value to the coordinator — they can communicate directly, or even gossip about receipts on a public pubsub channel. |
That is our current design pretty much, because our tasks are basically delegations that you nominate as tasks in the unsigned envelope. |
Perhaps it helps describing our current design, because I think something in between two would provide a best compromise. Our notion of a tasks is basically a UCAN delegation which MUST delegate only one capability. Our notion of an invocation is Signals to the It is also clear that tuple is a poor signaling mechanism, so we would love to adopt ideas from this spec to address current limitations and improve on this. That said having a way to link to specific task just by CID had been very useful and we use it a lot in our infrastructure. Additionally it is nice that batch can be taken apart and forwarded simply by repacking it in a different tuple. |
I talked through this with @QuinnWilton a bit earlier today. We both have this intuition that One way that we may be able to have both the friendly names and be able to refer to tasks directly by CID would be to put nonces on every Task instead of only on the outer signature envelope. This gives each Task a unique CID, and thus can be addressed directly: type Closure struct {
with URI
do String
inputs Any
}
type Task struct {
with URI
do String
inputs Any
meta Any
nonce String -- NEW
} I guess the downside is that you'd need to track the associated wrapper if you wanted to check the signature or other metadata about the invocation. I guess that could be managed via a reverse lookup table? In the use case from DAG House as I understand it, you may not need this. A CAR file can contain multiple disjoint graphs. You can use this to pack several Invocations that each contain a single task, and thus have a single identifier in their array: index {
sig: "somebytes",
nnc: "abcdef",
meta: {"dev/comments": /* ... */},
run: [
{
with: "https://exmaple.com/posts",
do: "crud/create",
inputs: {/* ... */},
}
]
} ...is a special case that's fully isomorphic to... {
sig: "somebytes",
nnc: "abcdef",
meta: {"dev/comments": /* ... */},
with: "https://exmaple.com/posts",
do: "crud/create",
inputs: {/* ... */}
}
You and I talked a bit about this on our call, but I'd be interested to hear more about how addressing by CID has been useful. You also alluded to how paths inside CIDs has been problematic, and it would be really useful to know more! Is it that the paths can be to something not under that CID? |
I think there are two things bundled here, so let me address them individually:
I am not arguing for |
That's cool, but that still does not makes them self contained as in I have to carry that envelope around as additional authorization context. This also applies to receipts, linked tasks there don't include proof of authorization to invoke the task (although that may be desired there). |
It less about about how do we manage it and more about the self-containment as design principal. E.g. we should be able to anchor things in blockchain simply via CID of the task, however in order to be verifiable we need to also publish extra context or wrap the thing in an envelope to provide it all of which introduces which are difficult to justify give that only justification (unless I've missed some) is reduced number of signatures (which could also be addressed differently). If I've missed a rational for why current structure is better, please let me know. |
We could do that. However I'm not yet convinced we'll be making right tradeoffs there. We can receive invocations that do not ad-hence to single task per invocation constraint forcing us to either refuse them or having to cope with "additional context on the side". If we do former than we're not really implementing a spec and likely will run into problems with toolchains (build around spec) and etc... If we go with later, we're back to square one. Perhaps implementing a subset of the spec is better than not at all, yet I wish we could arrive to the composable set of specs so we could implement some while not the others. |
Following is not unreasonable {
sig: "somebytes",
nnc: "abcdef",
meta: {"dev/comments": /* ... */},
run: {
task: {
with: "https://exmaple.com/posts",
do: "crud/create",
inputs: {/* ... */},
}
},
prf: [/* ... */]
} Except now promises aren't simply CIDs, instead they need to carry additional "task" sub-selector everywhere. Which is not terrible just really uncomfortable that simpler cases are more verbose than more complex ones. |
We use CIDs to namespace things with-in storage buckets. We considering to utilize CIDs for sharding as well across various buckets. In way CID is serialized reference so it is tends to be useful anywhere you want to associate or lookup information with a specific reference. However references are not nearly as useful if what they refer to aren't self-contained e.g. if I need we need extra context (like invocation here) they are no longer verifiable as in I can't publish such CID into blockchain or even a gossip network. Imperfect analogy I'll try to make here is languages with 1st class functions vs languages like Java which did not have those. Later end up forcing people to develop silly classes with a single method so it could be passed around, closures were also bit trickier now you had another silly less universal class to capture context. Tasks as they stand in spec feel like lack of first-class functions in Java, they're like methods that you can't pass around so you're forced to wrap them in invocations. In fact even proposed solution is equivalent, you just create single task invocations (single method classes) and pass those around instead. Which works but is short of first-class functions never the less.
Most problems we've faced with CID + path are from IPFS and some may not necessarily apply here, yet it taught us to be cautious around them. Here are few things I can think of:
I'm sure ppl that had been longer at PL will have longer lists. Overall sentiment is: while bunch of things on the list could be addressed in various subsystems added complexity seems have to prevented it from happening. Paths tend to introduce problems in various places in the stack and avoiding them often led to simple and more robust design so if you see path 🚨 |
This is the composition i’d like to have, don’t care that much what things are named:
Everything is a “Task” with a CID, batches are collections of these addresses and, preferably, nothing else so they maintain hash consistency. This can be recursively composed and executions can be compared and deduplicated across batches and executions by comparing the CID’s in the Arrays and Sets. Receipts (task results) should compose identical to this but in reverse. |
Current draft requires that you address individual tasks via pointer inside the invocation. I find this unfortunate, because it implies that I can not address a specific task without revealing the batch it was part of.
In our design tasks are explicitly self sufficient so you could take a batch apart and hand some tasks to other actor without revealing everything user intended to do. Unfortunately I don't have a concrete user story to justify need for this and our design choice had been mostly informed by intuition & some experiences in IPFS where addressing by paths inside DAGs as opposed to by CID had proven to be inadequate (I may want to share with you subgraph without revealing outer graph or the fact that it exists).
I realize this approach allows reducing number of signatures, yet I wonder if we could optimize that without without having to sacrifice task independence. e.g. perhaps we could attach bloom filter to each task and sign that instead, that way signature could be used with any task in the bloom filter without revealing what's in it (I realize there are false positives, but there ways around it). There might be some other ways to address signatures, my point is perhaps number of signatures should be considered separately
The text was updated successfully, but these errors were encountered: