Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster processing of Data objects #6225

Open
effectfully opened this issue Jun 20, 2024 · 8 comments
Open

Faster processing of Data objects #6225

effectfully opened this issue Jun 20, 2024 · 8 comments
Assignees
Labels

Comments

@effectfully
Copy link
Contributor

effectfully commented Jun 20, 2024

Preface

This issue is a continuation of #5777 (I'm going to call it "the previous issue") and offers a different perspective. Reading the previous issue is a prerequisite for reading this one.

The previous issue explained how it'd be beneficial to have a way of matching on Data objects by converting them to SOPs first: this would make processing of Data much more efficient. The issue previous also explaines how there doesn't appear to exist any safe and efficient way to perform the conversion. Hence here we're going to investigate alternatives.

The goal

We want to be able to express the following (pseudocode):

inspectConstr (Constr 2 [d0, d1, d2])
    [ \[v0, v1]         -> body0
    , \[v0, v1, v2, v3] -> body1
    , \[v0, v1, v2]     -> body2
    , \[]               -> body3
    ]

where inspectConstr is some function that takes a Data object and a bunch of branches (in the form of a list or an array or something), picks the right branch depending on the index of the constructor (i.e. the first argument of Constr) and applies the multi-argument function of that branch to the arguments of the constructor (i.e. what's stored in the second argument of Constr). So the example above evaluates to

(\[v0, v1, v2] -> body2) [d0, d1, d2]

where \[v0, v1, v2] is a multi-lambda binding three variables and [d0, d1, d2] is some kind of a spine (a list or an array or etc, can be the same thing as what we store the branches in, can be different). That expression is equivalent to

(\v0 v1 v2 -> body2) d0 d1 d2

and evaluates the same way.

Note that regardless of how we're going to represent this expression in the AST we want to only allow exactly-saturated applications for multi-lambdas, i.e. neither undersaturation (a multi-lambda binds more variables than the number of arguments it's applied to) nor oversaturation (a multi-lambda binds fewer variables than the number of arguments it's applied to) are allowed. The reason for this restriction is precisely what allows us to avoid all the problems described in the "What exactly breaks if we add the unsafe constrTermFromConstrData?" section of the previous issue.

Is there a straightforward way to make inspectConstr work?

The answer is "yes", I believe. The following pseudocode definition would cut it:

inspectConstr : forall r. array (array data -> r) -> data -> r
inspectConstr branches (Constr i args) = apply (branches ! i) (toArray args)

To make it work we need to

(1) add array to the language and not as a built-in type, because values of built-in types can only be constants but here we need to have an array of functions
(2) teach the builtins machinery about array
(3) add multi-lambdas and multi-applications to the language (or something along these lines)
(4) allow for returning applications from within the builtins machinery

(2) and (4) are simple. (4) has a prototype implementation here and (2) is very similar to teaching builtins about SOPs, which has a prototype implementation here

(1) and (3) are hairy but doable.

The real question however is whether we can avoid doing a part of this work by piggy-backing on some existing machinery like SOPs.

Can we use SOPs to avoid extending the language as much?

Instead of having multi-lambdas and multi-applications we could use SOPs to bind a bunch of variables at once, but the problem with that is that it doesn't give us exact saturation, hence SOPs don't seem to be of any help here. In retrospect, we should've probably made SOPs exactly saturated and it's probably too late to do it now. But maybe we could make multi-lambdas play nicely with SOPs and recover exact saturation for SOPs this way, but that requires adding multi-lambdas to the language in the first place.

But let's assume we somehow have exact saturation for SOPs and the following is expressible in the AST (note that it's Term.Constr where it was Data.Constr before):

case (Term.Constr 2 [d0, d1, d2])
    [ \[v0, v1]         -> body0
    , \[v0, v1, v2, v3] -> body1
    , \[v0, v1, v2]     -> body2
    , \[]               -> body3
    ]

we still don't know how to convert a Data.Constr to a Term.Constr as per the "Typing issue" section of the previous issue.

Moreover, even if wanted to only use SOPs to pick the right branch, without applying the function in that branch, binding multiple variables etc -- just to pick the branch, we don't know how to give a type even to that built-in function. In a dependently typed system it would actually be easy:

constrTermFromConstrArgsData : (n : Nat) -> data -> SOP (replicate n [data]))

for example

constrTermFromConstrArgsData 0 : data -> SOP [[]]
constrTermFromConstrArgsData 1 : data -> SOP [[data]]
constrTermFromConstrArgsData 2 : data -> SOP [[data], [data]]
constrTermFromConstrArgsData 3 : data -> SOP [[data], [data], [data]]

At runtime this builtin would check that the given n matches the number of arguments of the given Data.Constr and either fail if it doesn't or return the SOP of the appopriate shape. This would be an entirely safe builtin to have, it would throw immediately on incorrect input and return a well-typed thing for correct input. The problem of course is that we don't have dependent types in Plutus.

We could emulate them using singletons as per the "Singletons" section of the previous issue like this:

constrTermFromConstrArgsData : forall sop. SopOfListData sop -> data -> sop

This would require us to extend the language with whatever SopOfListData is (a constructor of Type? A built-in type?), as well as teach the builtins machinery about it, at which point it's probably just easier to add arrays to the language and be done with it.

I therefore cannot think of any way to leverage SOPs for faster processing of Data objects. And it makes sense: SOPs are richly typed in that they reflect the structure of their values at the type level, but this is precisely what we don't want when we're dealing with the "untyped" Data objects that have unknown structure. Reflecting that unknown structure at the type level just to give up on it immediately after in a case expression is a wasted effort, matching directly without jumping through typing hoops is more natural and has to be more efficient -- and efficiency is what motivates this whole issue.

Summarizing, (2), (3) and (4) all seem to be necessary. (1) may or may not be replaceable with the singletons approach to constrTermFromConstrArgsData, maybe it's worth investigating that but if we're going to add spines (lists/arrays/whatever) for (3) anyway, then we can reuse them for (1) without bothering with SOPs at all.

@MicroProofs
Copy link

So switching gears to multi-apply and multi-lambda and multi-apply and also adding arrays as a plc term right? Just to summarize this PR.

@effectfully
Copy link
Contributor Author

So switching gears to multi-apply and multi-lambda and multi-apply and also adding arrays as a plc term right? Just to summarize this PR.

Yes, so far this seems like the most plausible solution to me (just to clarify, it's an issue, not a PR, I haven't written any code for that quite yet), but others may have better ideas and I'm not entirely confident the idea is gonna work out and be worth it.

@effectfully
Copy link
Contributor Author

@IntersectMBO/plutus-core please review this issue carefully and check whether there are any flaws in my reasoning or whether there's something missing. If I don't get any negative feedback, I'm going to proceed with shipping pattern matching builtins instead of the SOPs approach (see this issue for details on what the two approaches are).

@kwxm
Copy link
Contributor

kwxm commented Sep 16, 2024

If I don't get any negative feedback, I'm going to proceed with shipping pattern matching builtins

OK, go ahead. I don't think I fully appreciate all of the nuances of the different approaches, but we've talked about his quite a lot and I think you've convinced me that your plan is sensible. It may complicate the specification and the metatheory, but I suppose we'll just have to deal with that.

I suppose we'll have to show that this gives decent gains on realistic code, so we should probably try it out on some big examples. Maybe once you have a working implementation we can get some community members to try it out for us and see how much help it is. Do we know how complicated the objects are that people are working with? If you're trying to get at something buried deeply inside a Data object then you'd need multiple applications of this stuff. Thinking randomyl, would it make any sense to think about some atomic operation that takes path through a Data object and extracts the thing that it leads to? Maybe that wouldn't be helpful if you need to get at multiple things.

@effectfully
Copy link
Contributor Author

(this issue only tangentially concerns pattern matching builtins, that one is specifically about those, but I'll respond here)

It may complicate the specification and the metatheory,

I'm sure it'll complicate the former, but for the latter I feel like the difference isn't very substantial?

Maybe once you have a working implementation we can get some community members to try it out for us and see how much help it is.

That makes sense, but it'll further delay shipment. @zliu41 what do you think? We do know that pattern matching builtins speed up list-processing microbenchmarks by 25-30% and data-processing microbenchmarks by 15-20%, both in the presence of recursion, i.e. these are quite realistic microbenchmarks. And our "pickiness standards" were lower for, say, SOPs, for which we were fine with less pronounced microbenchmarking results and a 4% slowdown of existing code not using the feature, while pattern matching builtins don't introduce any measurable slowdown.

(BTW, if anybody's interested why lists are optimized more, my conjecture is that it's the cost of lists being a polymorphic built-in type, so calling list-handling builtins fewer times gives a larger speedup than calling data-handling builtins fewer times. And the reason why polymorphic built-in types are slower is because I designed the machinery the wrong way and can't get to fixing it).

Thinking randomyl, would it make any sense to think about some atomic operation that takes path through a Data object and extracts the thing that it leads to? Maybe that wouldn't be helpful if you need to get at multiple things.

... in which case we may just choose to return a list or even a subdata. The user will then still need to process the result, but at least they won't pay much for skipping the redundant bits with that "subdata" approach. Sounds interesting, we should perhaps look into it. It also sounds a bit like a specialized version of folding builtins (assuming we can make those lazy enough somehow).

@zliu41
Copy link
Member

zliu41 commented Sep 16, 2024

Have you considered updating all UnsafeFromData instances to use matchData? That should lead to cheaper decoding of things, especially ScriptContext, and it would be interesting to see how much difference it makes.

@effectfully
Copy link
Contributor Author

Have you considered updating all UnsafeFromData instances to use matchData? That should lead to cheaper decoding of things, especially ScriptContext, and it would be interesting to see how much difference it makes.

(I've done that in the pattern matching builtins PR)

@effectfully
Copy link
Contributor Author

effectfully commented Oct 25, 2024

I've been thinking about this and one thing I didn't consider before is that in

inspectConstr (Constr 2 [d0, d1, d2])
    [ \[v0, v1]         -> body0
    , \[v0, v1, v2, v3] -> body1
    , \[v0, v1, v2]     -> body2
    , \[]               -> body3
    ]

the array of functions must store them lazily, otherwise lookup is linear, which is pointless. But making arrays lazy is really weird, particularly given that we don't have call-by-need, just call-by-name.

On the other hand, case-expressions are already appropriately lazy...

UPD: a clarification from a Slack discussion:

How is lookup in a strict array linear?

That's because the array is inlined, so I should've said that the lookup is linear in the most common case, because the mandatory initialization phase is linear. But it's in fact an understatement, because the whole point of branching is that one doesn't evaluate all the branches. Some of the branches may not even start with lambdas and then you're evaluating arbitrary code for no reason (and that code may potentially throw), so strict arrays just don't work for branching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants