Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Builtins] Add 'equalsCanonical.md' #5440

Closed

Conversation

effectfully
Copy link
Contributor

@effectfully effectfully commented Jul 28, 2023

This proposes equality checking and serialization builtins of SOPs.

Rendered.

@effectfully effectfully added Builtins No Changelog Required Add this to skip the Changelog Check labels Jul 28, 2023
@effectfully effectfully requested a review from a team July 28, 2023 23:47
Copy link
Contributor

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is possible. It seems pretty tricky.

  • It leans worryingly close to letting us break parametricity in various ways. I don't think we actually want type-case, and I would not be confident that someone won't turn up in a year or two and show how to use this to do bad things.
  • The costing problem seems really ugly. In principle I guess I can see how we would cost it, but it will be pretty ugly to specify and implement.
  • I'm not sure I understand the "canonical form" argument. Or more generally, what terms we should allow this on and why.

IDK. I guess it's probably possible, but I definitely don't love it.

Now we of course didn't resolve any problem by introducing `canonical`, we just moved it from `equalsCanonical` to whatever builtin is going to create `canonical` values. However moving the problem this way opens up some interesting opportunities. A reasonable way of constructing a `canonical` value is a builtin that takes some kind of a proof that the value can indeed be represented in canonical form, here's one option on how to achieve that:

```
mkCanonical : all a. (a -> data) -> a -> canonical a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not seem to provide any reassurance. Construct any (d : data), then
mkCanonical (const d) x constructs a canonical a for any x : a.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not seem to provide any reassurance. Construct any (d : data), then
mkCanonical (const d) x constructs a canonical a for any x : a.

I can construct unsafeCoerce : all a b. a -> b, I can even do it without using error (and without any extra builtins of course). Does it mean that our type system doesn't provide any reassurance?

Correct typing does not guarantee correct semantics and not every rule needs to be reflected at the type level. Those that aren't reflected at the type level are called "invariants" and "laws" and are very common in functional programming, including our own language. For a random example, users of verifyEcdsaSecp256k1Signature need to understand that they have to compute the hash themselves:

-- While this primitive /accepts/ a hash, any caller should only pass it hashes
-- that they computed themselves: specifically, they should receive the
-- /message/ from a sender and hash it, rather than receiving the /hash/ from
-- said sender. Failure to do so can be
-- [dangerous](https://bitcoin.stackexchange.com/a/81116/35586).

Good news are that users of mkCanonical are compilers authors, not script writers. The compiler authors need to make sure that the user cannot easily forge a bogus a -> data function and feed it to the builtin, which is stupidly easy to achieve in our case: don't expose the builtin directly and instead expose

mkCanonical :: ToData a => a -> Canonical a

in Plutus Tx. Given that ToData is derived by our tooling, it's about as safe as it gets when it comes to developing smart contracts.

I can assure you that smart contract developers don't worry about us not saving them enough from themselves, they worry about us not providing them with ways to write reasonable code that they need to write to achieve their goals. I'm not saying we should implement every request, but all risk is relative and in this case it's negligible compared to basically everything else that compiler and smart contract developers do daily.

And if we don't trust in ourselves and other compiler developers enough to be confident about being able to feed the right argument to a function in a single place, we should go into the woods are reunite with the nature or something instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is what is the point of the function (a -> data)?
It's not evidence of anything.
Why not make the type mkCanonical : all a. a -> canonical a directly?

Copy link
Contributor

@mjaskelioff mjaskelioff Aug 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can construct unsafeCoerce : all a b. a -> b, I can even do it without using error (and without any extra builtins of course). Does it mean that our type system doesn't provide any reassurance?

BTW, How can you construct this from within the language?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question is what is the point of the function (a -> data)?
It's not evidence of anything.
Why not make the type mkCanonical : all a. a -> canonical a directly?

These are reasonable and important questions, I'll elaborate on them in the doc itself.

BTW, How can you construct this from within the language?

fix id. You get to review the PR adding that example and the type dumping functionality (don't actually go through hundreds of golden files, only review the changes in .hs files). The full unsafeCoerce term is here and its type is the expected one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem that I have with mkCanonical is that canonical itself was introduced to avoid having a function
equalsCanonical : all a. a -> a -> bool, but that function is definable from mkCanonical with the expected parametricity-breaking semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with Mauro's argument here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mjaskelioff

The problem that I have with mkCanonical is that canonical itself was introduced to avoid having a function
equalsCanonical : all a. a -> a -> bool

it wasn't. The original and the current text say

Now we of course didn't resolve any problem by introducing canonical, we just moved it from equalsCanonical to whatever builtin is going to create canonical values. However moving the problem this way opens up some interesting opportunities.

equalsCanonical : all a. a -> a -> bool
```

but that would be a parametricity breaking lie: the type signature says that the builtin can compare two values of an arbitrary type for equality and return a conclusive `True` or `False`, but this isn't of course possible if the values are, say, functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a lie given that we have error? The signature in fact says "I can compare any two values for equality, or possibly blow up". So blowing up for functions (and non-canonical things in general) seems fine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you're willing to accept the possibility of blowing up, then you can violate parametricity, e.g. you can write

isSOPOrConstant :: a -> bool
isSOPOrConstant a = a `eqCanonical` a

if that succeeds then you learned something about the type which you probably shouldn't be able to. But it seems like we retain some property like "parametricity so long as you don't blow up", or something. Not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a lie given that we have error? The signature in fact says "I can compare any two values for equality, or possibly blow up". So blowing up for functions (and non-canonical things in general) seems fine?

It's not what the type signature says! It, very literally, says "give me two values of the same type and I'll give you a bool (or fail)". Now if add parametricity into the picture, we'll get the following law out of this type signature:

forall (f :: a -> b) (x y :: a). equalsCanonical {a} x y = equalsCanonical {b} (f x) (f y)

In plain English, either all functions in your language are injective (it's a simplified law, in the actual one that would probably still not be sufficient) or equalsCanonical cannot use any of its arguments to influence the result of the operation.

I.e. by parametricity what equalsCanonical finishes with (successfully or not doesn't matter) must not be determined by any of the arguments, meaning equalsCanonical either always fails or always returns the same bool, irrespective of its arguments.

By allowing equalsCanonical to take arguments into account when computing the result we'd break parametricity thus making the type signature a lie.

"parametricity so long as you don't blow up"

I mean, even if had a language where equality of any two values was decidable, that builtin would still be breaking parametricity, bottoms are irrelevant here. You're not supposed to be able to branch using the arguments to equalsCanonical regardless of whether you fail occasionally or not.

We could give up on parametricity, but I do not yet see a reason for that, given that with the current proposal we can keep it while still not constraining the efficiency of the new builtins as a result of imposing typing limitations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like calling the type a lie. The type doesn't promise a parametric model. That you need to break parametricity in order to implement the function that you want is another thing. Also I don't like that the way the text is written it suggests that we are going to solve the problem, when we don't. Your alternative might make it more difficult to shoot yourself in the foot, but you can still do it.

VCon _ -> EqCanonicalFalse
_ -> EqCanonicalUnexpectedInput
eqCanonical (VCon con1) = \case
VCon con2 -> toEqCanonicalResult $ con1 == con2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that eqCanonical would subsume all our existing equalsInteger etc. primitives.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Not sure it it would make sense to deprecate them, though.

ConsStack arg2 args2 -> eqCanonical arg1 arg2 <> eqCanonical args1 args2
_ -> EqCanonicalFalse

instance (GEq uni, Closed uni, uni `Everywhere` Eq) => EqCanonical (CekValue uni fun ann) where
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty strong new constraint on all builtin types!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does this work fine for the polymorphic builtin types too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty strong new constraint on all builtin types!

It's only new in the sense that we used not to have it in the CEK machine, otherwise it existed from the start.

Also, does this work fine for the polymorphic builtin types too?

Discovering the magic of builtins? :)

>>> :set -XTypeApplications
>>> someValue @_ @DefaultUni [(True,[42 :: Integer])] == someValue [(True,[42 :: Integer])]
True
>>> someValue @_ @DefaultUni [(True,[42 :: Integer])] == someValue True
False

instance (GEq uni, Closed uni, uni `Everywhere` Eq) => EqCanonical (CekValue uni fun ann) where
```

We may want to require ``uni `Everywhere` EqCanonical`` instead, in case some built-in types don't have decidable equality (currently all do).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, and the weakening is that EqCanonical can return "unknown", not definitely true or false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Not sure if it's worth it though, given that we do generally expect our terms to be testable for equality and we can always retroactively replace Eq with EqCanonical in a backwards-compatible manner if for some reason we decide to give up on

-- | Check equality of two 'Term's.
eqTermM
    :: (GEq uni, Closed uni, uni `Everywhere` Eq, Eq fun, Eq ann, HasUnique name TermUnique)
    => Term name uni fun ann -> Term name uni fun ann -> EqRename (Renaming TermUnique)


We may want to require ``uni `Everywhere` EqCanonical`` instead, in case some built-in types don't have decidable equality (currently all do).

Instead of having an `EqCanonical` class we may want to have a class for extracting either a SOP constructor (`VConstr` in `CekValue`) or a constant (`VCon` in `CekValue`) from a value, so that we can reuse it for both `equalsCanonical` and `serialiseCanonical` as otherwise we'd have to introduce another `SerialiseCanonical` class or something. Adding a new type class for every operation definable on values in canonical form would be silly. Maybe we could even just expand the today's `HasConstant` class that currently only allows one to extract a constant out of a value? Whatever answer to this question is going to be we should add `equalsCanonical` and `SerialiseCanonical` simultaneously in order to ensure we're not missing a part of the picture.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that does seem better. I wonder if what we're groping towards is something like this:

-- evaluator-independent values, as they might appear in the semantics
data Value uni fun = VCon ... | VConstr ... | ???

class Reflect value uni fun where
   reflect :: value -> Value uni fun

since it seems to me that what we're saying is:

  • There is some evaluator-independent way of looking at a value
  • Builtins have the ability to inspect this form

As written, though, that really is too powerful. Then we could probably write

isFunction :: a -> bool

as a builtin, which seems bad. But it's hard to see how to avoid that if we're going down this path.

Copy link
Contributor Author

@effectfully effectfully Jul 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-- evaluator-independent values, as they might appear in the semantics
There is some evaluator-independent way of looking at a value
Builtins have the ability to inspect this form

Hm, we could do that I guess. Except it should be

class Reflect value uni fun where
   reflect :: value -> EvaluationResult (InspectableValue uni fun)

or something with InspectableValue currently only containing equivalents of VConstr and VCon, because as you say we definitely don't want isFunction.

But it's hard to see how to avoid that if we're going down this path.

By only adding new constructors to InspectableValue when we're sure it's worth it? Something horrible needs to happen for us to decide that allowing the user to branch on whether the given value is a function or not is OK.


It would be rather weird however to only allow for, say, checking equality of `Integer`s and `ByteString`s when they are wrapped in a constructor and fail within the same builtin if the constants are not wrapped, so it makes sense to handle constants and SOPs using the same builtin. We'll discuss later how to decide what constants to handle and what should cause evaluation failure regardless of whether they appear inside of a constructor or not.

A reasonable design then is one that only allows for checking equality of types whose _evaluated_ values are guaranteed to be in [canonical form](https://ncatlab.org/nlab/show/canonical+form):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little weird to me, in that it seems like canonical form is relative to some set of constructors. Here we are restricting it to just particular value constructors (VCon and VConstr), but that doesn't seem entirely principled to me. Or am I missing something?

Or looking at it another way: if you're working in a system with functions and you have a type with constructors, then function values are clearly not built up of those constructors. But here our objects are values and terms in the language, and lambda values are perfectly good values in that context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the meaning of "evaluated values". Can a value not be already evaluated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @michaelpj on the notion of canonical form. Perhaps you can define your own notion of types with equality?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a function type is in canonical form is if it's built from the constructor of lambdas (as opposed to, say a partially-applied builtin function). Perhaps you could define a notion of ground value?

> For example, the terms of type `ℕ` (a natural numbers type) of canonical form are the numerals
`S(S(S(⋯(0)⋯)))`.

Hypothetically, we could allow for handling `VDelay`, `VLamAbs` and `VBuiltin` too by returning `Dunno` when checking, say, two `VLamAbs` for equality instead of error (or `True`/`False` at random), however that would be a very weird builtin to have, which we'll discuss more later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which bit later explains why this would be weird? It seems mildly better to me to have improper equalities result in an error rather than a value that's potentially inspectable, since it seems like it leaks less information to the program, but I'm not really sure I can justify that fully.

@@ -0,0 +1,203 @@
# Equality and serialization built-in functions for SOPs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing about costing? :lolsob:

Seriously though, costing seems like an absolute nightmare. We'd need to have an actual workable size measure on terms, and of course the equalities on constants should not be free either.

I think it's worth discussing since it might be a killer problem all by itself...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to have an actual workable size measure on terms

So 2022... :)

All you need to know is what it costs to check equality of two VConstr AST nodes and what it costs to check equality of two constants of the given type (all of which can be calibrated separately), then you just stream those individual costs to the outside as you process the arguments to equalsCanonical. The only non-obvious part are how we scale the individual costs (do we have a ton of parameters to scale costs for checking equality of integers, of bytestrings etc or do we hardcode a single parameter and pray that it scales all kinds of costs correctly?) and whether we should just reuse the already existing costing functions for equalsInteger, equalsByteString etc. None of which is new, because we have these same questions for equalsData too.

Really, tell me how equalsCanonical is that different from equalsData that it becomes an absolute nightmare (assuming you don't believe costing of equalsData is an absolutely nightmare too, in which case we might have made a bit of a mistake by introducing it).

I really feel like the majority of "costing is complicated" arguments is caused merely by not having the right view at costing. I need to return to Operational semantics for builtins some day... Particularly given that lazy costing was a huge step in the right direction and we already seem to have all the necessary infrastructure, we just need to shift the view towards costing a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only non-obvious part are how we scale the individual costs (do we have a ton of parameters to scale costs for checking equality of integers, of bytestrings etc or do we hardcode a single parameter and pray that it scales all kinds of costs correctly?)

Right, and also "how do we describe the costing of this function?". At the moment, all of our costing functions can be described statically as one of a family of simple functions based on the sizes of the inputs. I think that's a nice property!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only non-obvious part are how we scale the individual costs (do we have a ton of parameters to scale costs for checking equality of integers, of bytestrings etc or do we hardcode a single parameter and pray that it scales all kinds of costs correctly?)

Right, and also "how do we describe the costing of this function?". At the moment, all of our costing functions can be described statically as one of a family of simple functions based on the sizes of the inputs. I think that's a nice property!

I seriously don't understand how costing of equalsCanonical is any different to costing of equalsData. It may be in some way, but costing-wise they look identical to me (particularly given that we literally presuppose that any canonical value can be converted to data which can subsequently be fed to equalsData), so if you can enlighten me on that, I'd appreciate it a lot.

The part of my comment that you quote ends with

None of which is new, because we have these same questions for equalsData too.

btw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, so in fact the problematic part of equalsData is not the costing function itself, but the size measure. The costing function for any such equality function is probably always going to be "minimum of the two sizes". I do agree that our size measures are much more of a mess than our costing functions. The size measure for Data includes the sizes of the internal integers and bytestrings, which is a bit awkward but I think makes some sense for functions that uniformly operate over its structure like equality and serialization (it would be pretty bad if for some reason, say, serializing integers was non-linear!).

Now, coming up with a size measure for an arbitrary value is awkward (functions etc.), but at least somewhat makes sense for a canonical value. That does mean that we would need to rely semantically on lazy costing, since we would need to fail the evaluation if the term is not canonical, so we can't even in principle compute the size and cost up front (which we can still in principle do for equalsData, even if the implementation chooses to be more efficient).

The other way in which things are more complicated for equalsCanonical is that Data contains known other types at leaf nodes, whereas canonical values can contain leaf nodes of arbitrary builtin types. So while we might reason that the costing behaviour of equalsData is reasonable, based on the two other types it contains, it's much harder to reason about equalsCanonical.

@effectfully
Copy link
Contributor Author

I don't think we actually want type-case,

So do I, hence this proposal.

I would not be confident that someone won't turn up in a year or two and show how to use this to do bad things.

That's why we have computer science folks! If they can be convinced that the builtin is fine, it is perhaps fine indeed. And if they find something wrong about it, then we know for sure it's not fine. Anything in-between is worrisome, I agree. We've done worrisome things though.

The costing problem seems really ugly. In principle I guess I can see how we would cost it, but it will be pretty ugly to specify and implement.

How is it so much different from costing for equalsData? Anyways, I agree that

  1. we should definitely discuss costing issues
  2. we shouldn't commit to implementing the builtin before we properly lay out costing of the new builtins (including the implementation of the costing functions, not sure about specification, maybe we could do that one as a follow-up)

I just thought we should first focus on the type theory aspects of things. If we fail to settle on those, we won't need to discuss costing stuff.

I'm not sure I understand the "canonical form" argument. Or more generally, what terms we should allow this on and why.

Whatever SOPs and constants compute to. It can be VConstr and VCon like in the machines or it can be some tagged memory location like in some imaginary evaluator. As long as you can distinguish an evaluated Constr/Constant from other values within your evaluator and extract relevant pieces of info from those evaluated constructors, you should be fine. Can you tell if such a strategy would work for the bytecode interpreter?

IDK. I guess it's probably possible, but I definitely don't love it.

I should've probably discussed the alternatives more. Do you love the idea of encoding data types using Data, since it's apparently more efficient despite being, well, less efficient? Complicating the interface between Plutus and the ledger, so that the ledger can provide ScriptContext in both SOP and Data forms? Implementing various new fancy builtins (pattern matching, faster navigation within Data, whatever) to deal with Data, because it's what the community will keep requesting?

There's currently a split in the community. We recommend folks to use SOPs, but they don't, because they don't think it's a good idea and they have numbers to show. We need to address the split somehow. Either by evolving both ways to encode the ScriptContext, which would be the worst option in my opinion, or by making one of the ways a clear winner. This proposal is an attempt to make SOPs a clear winner, but I'd be perfectly fine with letting Data win instead.

So basically I don't love it too, but we need to pick one of the poisons.

@mjaskelioff
Copy link
Contributor

I understand the motivation behind this, but I'm not happy with it.

SOPs have the advantage of dealing with a whole range of problems uniformly with a clear semantics and clear costs. By introducing a special builtin, it seems we are going backwards: for the special case of equality we want to do it this other way. Then in the future there are going to be other requests for making some "special" SOP processing cheaper.

I really hope we can find some other way out.

Now we of course didn't resolve any problem by introducing `canonical`, we just moved it from `equalsCanonical` to whatever builtin is going to create `canonical` values. However moving the problem this way opens up some interesting opportunities. A reasonable way of constructing a `canonical` value is a builtin that takes some kind of a proof that the value can indeed be represented in canonical form, here's one option on how to achieve that:

```
mkCanonical : all a. (a -> data) -> a -> canonical a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not seem to provide any reassurance. Construct any (d : data), then
mkCanonical (const d) x constructs a canonical a for any x : a.

Moreover, once the grand universe split is complete (see [this](https://input-output.atlassian.net/browse/PLT-810) internal Jira ticket), we won't even need to add a runtime type tag for `Canonical` to the term-level universe, limiting its existence only to the type-level universe.

Long story short, `Canonical` and `canonical` are merely for ensuring that the type checker still prevents nonsensical programs from being created, they don't affect evaluation time (as long as `mkCanonical` invocations are properly erased) and are entirely opt-in (i.e. people who don't care about type safety don't need to know anything about the `canonical` type and can use `equalsCanonical` directly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If what we are interested in is things that can be compared, why bother with canonical and not have an Eq type constructor, and choose just the types that we want?

mkInteger : integer -> Eq Integer
mkPair : all a b. Eq a -> Eq b -> Eq (pair a b)
mkList : all a. list (Eq a) -> Eq (list a)
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your examples are all built-in types, how would you choose all appropriate SOPs via built-in functions in addition to those? What types would such builtins even have?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SOPs, it can't be a builtin, since builtins can't handle SOPs. There are different alternatives. Perhaps the closest one to the proposed one above would need a language construct:
mkSOP : sop [(Eq a) (Eq b) ] ... [ Eq c] -> Eq (sop [a b] ... [c])
(I don't quite remember the concrete syntax for SOP types)
Note the informal .... These are the same as those in the description of the constr type.

Another alternative would be a constrEq construct that behaves like constr but only accepts Eq types.

Another alternative is to have implicit castings, by modifying the typing rules.

There is a whole range of possible solutions, each with their own advantages and disadvantages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SOPs, it can't be a builtin, since builtins can't handle SOPs.

Then it can't be fast, which defeats the purpose.

There is a whole range of possible solutions,

If you're going to propose any of those, could you do that in the form of the "A bird's-eye view" section of the updated proposal?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a new language construct be slower than a builtin?

@effectfully
Copy link
Contributor Author

By introducing a special builtin, it seems we are going backwards: for the special case of equality we want to do it this other way.

Equality and serialization. They are indeed very special, because they are preventing folks from the community to move from Data to SOPs, which is something that everybody would win from as it resolves the community split and makes contracts cheaper for everyone.

And that train of special-casing equality and serialization had left the station long before we started talking about SOPs, namely when we decided it was a good idea to introduce equalsData and serialiseData. Now we're merely playing catch-up with our own past selves.

Then in the future there are going to be other requests for making some "special" SOP processing cheaper.

And I'll be keen to promote those too! Perhaps less than equality and serialization ones, because future ones hopefully won't divide the community to "us" and "them", but still, it's the point of any tool to satisfy the needs of its users. It's not that our users are asking for something unreasonable, I do believe we can save everyone time and money while still providing builtins with sensible semantics.

I'm happy to be convinced that what I outlined here isn't sensible, but so far I remain unconvinced.

@mjaskelioff
Copy link
Contributor

And I'll be keen to promote those too! Perhaps less than equality and serialization ones, because future ones hopefully won't divide the community to "us" and "them", but still, it's the point of any tool to satisfy the needs of its users. It's not that our users are asking for something unreasonable, I do believe we can save everyone time and money while still providing builtins with sensible semantics.

I'm happy to be convinced that what I outlined here isn't sensible, but so far I remain unconvinced.

I'm not saying that what users are asking is unreasonable, the motivation for the request is clear. Also I'm not saying that you are proposing something that is not sensible, I pointed out some possible drawbacks.

My point is that I'd like to be convinced that this is the best viable approach to this problem, and so far I remain unconvinced (for example, the proposed mkCanonical is useless).

There is an underlying assumption about cost which is true now, but the cost relation between the builtin equalsData and the SOPs one may change over time. So I want to be careful about adding new stuff and discuss alternatives before committing to a new language feature that will have to be maintained forever.

@effectfully
Copy link
Contributor Author

effectfully commented Aug 1, 2023

My point is that I'd like to be convinced that this is the best viable approach to this problem, and so far I remain unconvinced

That is fair enough and I do agree we should investigate other approaches too.

(for example, the proposed mkCanonical is useless).

I don't agree with that, I'll respond in the respective thread.

There is an underlying assumption about cost which is true now, but the cost relation between the builtin equalsData and the SOPs one may change over time. So I want to be careful about adding new stuff and discuss alternatives before committing to a new language feature that will have to be maintained forever.

Absolutely. I'm not proposing to commit to this proposal right away and I expect it to be closed/merged with the conclusion that we're not going to do it for now. This is for example what happened to my ancient type checking PIR proposal, which was later implemented anyway for the lack of a better one.

I believe we need to address the community split. How we are going to do that is very much up in the air and I'm not even convinced myself that what I'm proposing here is the best way forward.

@effectfully effectfully marked this pull request as draft August 2, 2023 01:45
@effectfully
Copy link
Contributor Author

Also I'm not saying that you are proposing something that is not sensible, I pointed out some possible drawbacks.

Eek, sorry, I didn't express myself properly. To provide some context, when I was writing this whole thing I was worried that the moment I publish it you or Michael come up with an immediately convincing and straightforward argument why it simply can't work and then several days of work will go down the toilet, so this is the kind of "isn't sensible" that I meant and it's what I'm not seeing so far.

Disagreeing on trade-offs and choosing not to pick the proposed solution in the end is completely fine of course, it's still useful to have stuff documented as long as it's not completely nonsensical.

I do appreciate your comments! I'm going to address them in the doc and come back to you and Michael once I've done that.

@michaelpj
Copy link
Contributor

Yeah, to be clear, the goal here is to come up with a coherent proposal that isn't obviously broken, even if we hate it for other reasons. That's valuable to have in the bank even if we decide not to use it.

If we can also come up with a less hateful version that's also great obviously 😂

@mjaskelioff
Copy link
Contributor

It does break parametricity , so I'd consider it broken. ☹️

@effectfully effectfully force-pushed the effectfully/builtins/propose-equalsCanonical branch 6 times, most recently from d786ce7 to eb01ed8 Compare August 24, 2023 16:18
@effectfully effectfully force-pushed the effectfully/builtins/propose-equalsCanonical branch from eb01ed8 to 42b4d82 Compare August 24, 2023 16:39
@effectfully
Copy link
Contributor Author

effectfully commented Aug 24, 2023

I've amended the Preface section (mentioning that the proposal doesn't really address the community split properly) and added three new sections:

  • Parametricity issues
  • Security of the system as a whole
  • A bird's-eye view

Other parts of the proposal were only very slightly polished, so feel free not to review them.

I propose to merge the proposal to keep the doc checked in, but not to consider implementing it. It may come in handy in future, but at the moment it doesn't seem to help us resolve the Data-vs-SOP issue because of datums and redeemers having to stay Data.

Copy link
Contributor

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm convinced by your arguments that the breaking of parametricity is something we could live with. But then I'm very unconvinced by the motivation for "guarding" that with various easily-circumventable mechanisms. We can put guardrails in place in Plutus Tx or whatever, but I think we might as well just accept the raw equalsCanonical :: a -> a -> Bool in PIR and below.

@@ -0,0 +1,203 @@
# Equality and serialization built-in functions for SOPs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only non-obvious part are how we scale the individual costs (do we have a ton of parameters to scale costs for checking equality of integers, of bytestrings etc or do we hardcode a single parameter and pray that it scales all kinds of costs correctly?)

Right, and also "how do we describe the costing of this function?". At the moment, all of our costing functions can be described statically as one of a family of simple functions based on the sizes of the inputs. I think that's a nice property!

Now we of course didn't resolve any problem by introducing `canonical`, we just moved it from `equalsCanonical` to whatever builtin is going to create `canonical` values. However moving the problem this way opens up some interesting opportunities. A reasonable way of constructing a `canonical` value is a builtin that takes some kind of a proof that the value can indeed be represented in canonical form, here's one option on how to achieve that:

```
mkCanonical : all a. (a -> data) -> a -> canonical a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with Mauro's argument here.


## Security of the system as a whole

If `equalsCanonical : all a. a -> a -> bool` is still admissible, do we win anything at all by introducing the `mkCanonical` indirection? Yes: we introduce a hoop that the user needs to jump through to be able to use `equalsCanonical` and that hoop acts as explicit guidance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this convincing. Plutus Core is the low-level language, it's okay for people to have to know what they're doing. If we want to put in safer ways of using it at a higher level, we can do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this convincing. Plutus Core is the low-level language, it's okay for people to have to know what they're doing.

We assume that "people" here are compiler developers. In our case these are the same compiler developers who introduced the "pointless" Normalized newtype wrapper that easily can be and regularly does get bypassed. canonical is very much just like Normalized, it's too is a thin wrapper that one can create and remove willy-nilly if they wish so, except in the case of canonical there's also a trans-language ghost of a departed proof making it harder to shoot oneself in the foot. Do you think Normalized wasn't a good idea? Do you think canonical is different?

it's okay for people to have to know what they're doing.

I mean, one can go pretty far with this argument. Why do we need static types if we expect people to know what they're doing? The answer is "explicit guidance is helpful". And it is helpful with Normalized the same way as it's helpful with canonical: it helps when you write, code, it helps even more when you refactor it, it helps even morer when you need to change code that you are not familiar with etc etc. I really fail to understand how that is not completely obvious.

Even without the a -> data ghost, I'd still find canonical useful. Creating canonical values and using them are two different things and I'd rather have this distinction explicitly and annotate calls to mkCanonical with a comment on why it's safe there than do that to calls of equalsCanonical potentially deep down the road where all relevant context is lost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think Normalized wasn't a good idea? Do you think canonical is different?

Yes and yes. canonical is built into the language. If you (a compiler writer) are worried about getting it wrong in your generated code, you can very easily define such wrapper functions (or we can provide them in the PLC stdlb). I don't see what advantage we get from building it into the builtins themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think Normalized wasn't a good idea? Do you think canonical is different?

Yes and yes.

I'm really shocked by the first "yes". I guess we won't be able to agree on anything here then. Which is fine, since we aren't planning to implement the proposal anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Er, sorry, I just read it backwards. I think it was a good idea!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @michaelpj that the rail-guard it's not needed at the core level, but it's definitely a good idea to add it at higher levels. I don't see a reason to support it at the runtime level. Having canonical adds unnecessary complexity to the implementation but more importantly it complicates the metatheory, which needs to explain it. Unlike this Normalized newtype of which I don't know anything about (meaning it's not needed for the metatheory). The trade-off of the little-security we gain by adding a predicate to the theory of the language versus the complexity it adds doesn't look beneficial to me.


As an option we could introduce a `MkCanonical` type class whose purpose is to ensure that every call to `mkCanonical` is safe, instead of relying on `toBuiltinData` being injective for all types. Using `ToData` has the advantage that with a flag one can switch from structural equality to equality-via-`Data` to test that the two agree.

Whatever we choose in the end to help the user not to shoot themselves in the foot with Plutus Tx version of `mkCanonical`, it needs to come with some sort of a "soft proof" like a satisfied instance constraint. By converting this "soft proof" to a `a -> data` function we create a true ghost of a departed proof, except rather than being a ghost within the same system like in the paper, in our case it is a ghost from a higher-level one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the advantage of using a -> data over a "soft" satisfied instance constraint.

AFAICT, we basically want:

instance TypeError => MkCanonical (a -> b)
instance MkCanonical BuiltinInteger

plus ideally some kind of deriving that gives you an instance iff all your fields have one. This doesn't seem particularly easy to accidentally abuse, and I'm not sure we can or should stop people who really try to break it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or even further: on the Haskell side, since the class constraint is taking the place of the canonical argument, we could just have

class Canonical a where
   equalsCanonical :: a -> a -> Bool

with a similar setup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the advantage of using a -> data over a "soft" satisfied instance constraint.

There's no advantage to a -> data when the satisfied instance constraint exists. The point of a -> data is exactly to be reminiscent of that instance down the language tower.

instance TypeError => MkCanonical (a -> b)

Yes, but you can't catch, say, type variables.

plus ideally some kind of deriving that gives you an instance iff all your fields have one.

It should be in the proposal, yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you can't catch, say, type variables.

I'm not sure I see the problem there. MkCanonical a => ... is problematic if a can be a function type, but we're not going to allow instances for function types, so it can never be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see the problem there. MkCanonical a => ... is problematic if a can be a function type, but we're not going to allow instances for function types, so it can never be?

Nevermind, I got confused for a moment, I agree it's not a problem.

Copy link
Contributor

@mjaskelioff mjaskelioff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you did a good job, but I don't like that you need to change the language (add the canonical type) for an ineffectual guard. IMHO the cost is too high for the benefits.

In my opinion the way forward is to admit from the beginning that we are going to break parametricity for PLC and use equalsCanonical. Provide your justification that the same decision (mistake I would say) was done in other languages, and that it can be guarded effectively in Plutus TX (or other higher-level languages), and show how to implement it.
I think that would be a sound way to document a possible solution.

In any, case if you would still like to leave the guard, (after all "ghost of departed proof" is a hell of a phrase) please rewrite the text, so that it is clear that you are choosing not to use equalsCanonical in order to add some optional guard, and not because of the "parametricity breaking lie".


It would be rather weird however to only allow for, say, checking equality of `Integer`s and `ByteString`s when they are wrapped in a constructor and fail within the same builtin if the constants are not wrapped, so it makes sense to handle constants and SOPs using the same builtin. We'll discuss later how to decide what constants to handle and what should cause evaluation failure regardless of whether they appear inside of a constructor or not.

A reasonable design then is one that only allows for checking equality of types whose _evaluated_ values are guaranteed to be in [canonical form](https://ncatlab.org/nlab/show/canonical+form):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the meaning of "evaluated values". Can a value not be already evaluated?


It would be rather weird however to only allow for, say, checking equality of `Integer`s and `ByteString`s when they are wrapped in a constructor and fail within the same builtin if the constants are not wrapped, so it makes sense to handle constants and SOPs using the same builtin. We'll discuss later how to decide what constants to handle and what should cause evaluation failure regardless of whether they appear inside of a constructor or not.

A reasonable design then is one that only allows for checking equality of types whose _evaluated_ values are guaranteed to be in [canonical form](https://ncatlab.org/nlab/show/canonical+form):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @michaelpj on the notion of canonical form. Perhaps you can define your own notion of types with equality?


It would be rather weird however to only allow for, say, checking equality of `Integer`s and `ByteString`s when they are wrapped in a constructor and fail within the same builtin if the constants are not wrapped, so it makes sense to handle constants and SOPs using the same builtin. We'll discuss later how to decide what constants to handle and what should cause evaluation failure regardless of whether they appear inside of a constructor or not.

A reasonable design then is one that only allows for checking equality of types whose _evaluated_ values are guaranteed to be in [canonical form](https://ncatlab.org/nlab/show/canonical+form):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a function type is in canonical form is if it's built from the constructor of lambdas (as opposed to, say a partially-applied builtin function). Perhaps you could define a notion of ground value?

```

This builtin says that as long as any value of type `a` can be converted to a `data`, it's safe to say the value is going to be in canonical form when we reach it during evaluation. Here's its implementation:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what you would like the builtin to say, but It doesn't really say that.


## Security of the system as a whole

If `equalsCanonical : all a. a -> a -> bool` is still admissible, do we win anything at all by introducing the `mkCanonical` indirection? Yes: we introduce a hoop that the user needs to jump through to be able to use `equalsCanonical` and that hoop acts as explicit guidance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @michaelpj that the rail-guard it's not needed at the core level, but it's definitely a good idea to add it at higher levels. I don't see a reason to support it at the runtime level. Having canonical adds unnecessary complexity to the implementation but more importantly it complicates the metatheory, which needs to explain it. Unlike this Normalized newtype of which I don't know anything about (meaning it's not needed for the metatheory). The trade-off of the little-security we gain by adding a predicate to the theory of the language versus the complexity it adds doesn't look beneficial to me.

> The existence of this function and several similar ones in OCaml, can break the guarantees provided by abstraction and parametricity. However, not all guarantees are broken, for instance the preservation of invariants is not affected by any of the built-in functions available in OCaml.
>
> In practice, this means that such functions should only be used at known types. Using them on abstract types produces non-parametric code whose correctness may rely on details of an implementation that are not exposed in its interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I read here is that this is going to be difficult to reason about PLC programs. Unlike more general languages like OCaml and Haskell, we have a formal specification. And being able to reason about the semantics of programs in is far more important in PLC than in other general purpose languages. So even though other languages made some decisions, our context is different. I think for us, breaking parametricity in this way could be very costly.


They use the same structural equality function as a "canonical example of run-time type analysis".

Overall, a structural equality operator does complicate metatheory, but it doesn't destroy it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It doesn't destroy it but it does complicate it a lot. The benefits should be rather marvelous in order to pay the price of this complexity.

Moreover, once the grand universe split is complete (see [this](https://input-output.atlassian.net/browse/PLT-810) internal Jira ticket), we won't even need to add a runtime type tag for `Canonical` to the term-level universe, limiting its existence only to the type-level universe.

Long story short, `Canonical` and `canonical` are merely for ensuring that the type checker still prevents nonsensical programs from being created, they don't affect evaluation time (as long as `mkCanonical` invocations are properly erased) and are entirely opt-in (i.e. people who don't care about type safety don't need to know anything about the `canonical` type and can use `equalsCanonical` directly).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would a new language construct be slower than a builtin?

equalsCanonical : all a. a -> a -> bool
```

but that would be a parametricity breaking lie: the type signature says that the builtin can compare two values of an arbitrary type for equality and return a conclusive `True` or `False`, but this isn't of course possible if the values are, say, functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like calling the type a lie. The type doesn't promise a parametric model. That you need to break parametricity in order to implement the function that you want is another thing. Also I don't like that the way the text is written it suggests that we are going to solve the problem, when we don't. Your alternative might make it more difficult to shoot yourself in the foot, but you can still do it.

@effectfully
Copy link
Contributor Author

to change the language (add the canonical type)

It's not a change to the language, just another built-in type (unless you consider that to be a change to the language).

In my opinion the way forward is to admit from the beginning that we are going to break parametricity for PLC and use equalsCanonical. Provide your justification that the same decision (mistake I would say) was done in other languages, and that it can be guarded effectively in Plutus TX (or other higher-level languages), and show how to implement it.
I think that would be a sound way to document a possible solution.

I agree that having an actual implementation in Plutus Tx would be very helpful here, but that's a bunch more work and as outlined in the doc there are options there, plus I do provide pieces of the implementation of the new builtins in UPLC/TPLC (I wonder if I ever reference anywhere the actual code) and Plutus Tx bits seem simple enough not to worry about writing them out before creating a proposal... All in all, I think that part of the proposal is fine, particularly given that it's going to be discarded. If we were to implement it, we of course wouldn't release any version of UPLC with support for equalsCanonical such that there's no support to that builtin in Plutus Tx.

It's just a proposal after all, not a specification.

In any, case if you would still like to leave the guard, (after all "ghost of departed proof" is a hell of a phrase) please rewrite the text, so that it is clear that you are choosing not to use equalsCanonical in order to add some optional guard, and not because of the "parametricity breaking lie".

Fair enough. I do say this in the doc after "parametricity breaking lie":

Now we of course didn't resolve any problem by introducing canonical, we just moved it from equalsCanonical to whatever builtin is going to create canonical values.

It's how I actually look at it: equalsCanonical is no longer a lie with canonical, but mkCanonical is if you wire TPLC/PIR incorrectly or don't account for the ghost of the departed proof in the semantics of the language. And any reference to mkCanonical is the exact place where the person implementing / formalizing the feature has to be careful, which is worse with equalsCanonical, because it's more suitable to ensure properties at creation site rather than use site.

In any, case if you would still like to leave the guard

I think I'm too outnumbered here. With you and Michael not being fans, I don't think there would be any chance of settling on my design, even if we were to implement the proposal in some form or shape, which isn't the intention here (due to it not addressing datums and redeemers at all, which renders the whole thing nearly pointless).

Given the latter point, I'm going to close this PR without merging it (I'm still gonna leave some more comments in certain threads here).

If we ever want to implement something like what is described here, we'll need to address the reviews and rewrite the proposal not to have canonical stuff (unless the sentiment changes in future).

Thanks a lot @mjaskelioff and @michaelpj for your comments and patience!

@effectfully effectfully closed this Sep 8, 2023
@effectfully effectfully deleted the effectfully/builtins/propose-equalsCanonical branch September 8, 2023 21:17
@michaelpj
Copy link
Contributor

I'd mildly prefer to merge it, I think it's easier to find notes like this if they're in-tree. Still useful even if it's incomplete (we have lots of incomplete stuff in there).

@effectfully
Copy link
Contributor Author

I'd mildly prefer to merge it, I think it's easier to find notes like this if they're in-tree. Still useful even if it's incomplete (we have lots of incomplete stuff in there).

Well maybe if we have a big disclaimer at the beginning pointing to the reviews and specifying that they need to be addressed... OK, I'll do that.

@effectfully effectfully restored the effectfully/builtins/propose-equalsCanonical branch September 11, 2023 12:47
@effectfully effectfully reopened this Sep 11, 2023
@effectfully effectfully self-assigned this Apr 22, 2024
@effectfully effectfully deleted the effectfully/builtins/propose-equalsCanonical branch June 11, 2024 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builtins No Changelog Required Add this to skip the Changelog Check
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants