FVM native calling convention proposal (FRC-0042) #382

anorth · 2022-06-01T02:35:41Z

anorth
Jun 1, 2022
Maintainer

Edit: In draft as FRC-0042

This is a proposal for a calling convention for user-programmed native WASM actors on the Filecoin VM.

A calling convention enables syntactic composability between actors: the general ability of components of a system to be recombined into larger structures and for the output of one to be the input of another (Aragon: What is Composability). We need such a convention for actors programmed by different teams to interoperate. A convention is also necessary for the adoption of standards such as token and wallet APIs to provide morphological composability—application-level interoperability. A calling convention can also provide a helpful abstraction for actor developers, ideally a familiar one. A convention that is widely adopted provides a base for a stable development environment and justifies investments in tooling.

Many ideas here are thanks to collaboration with @Kubuxu.

Background

The FVM dispatches messages to actors, both top-level invocations rooted in blockchain messages, and internal invocations between actors. The essential fields comprising a message include:

A method number (CBOR integer, variable length up to 8 bytes plus sign, u64 in WASM)
Method parameters (byte array expected to be IPLD, for now always CBOR)
Receiver address
Token quantity transferred from sender to receiver

The VM implements internal dispatch, which means there is a single entry point to each receiving actor that receives messages. It is expected (but not necessary) that an actor then uses information from each message to dispatch to some method implementing the requested functionality as a simple function call.

The built-in actors dispatch based on a message’s method number, and expect the parameters to decode into a fixed schema corresponding to each method. Methods are numbered sequentially, partly in order to enjoy a dense representation for top-level chain messages (method numbers <24 cost zero additional bytes).

The FVM does not further constrain dispatch, and the development team wish for the FVM to remain neutral with respect to such conventions.

Proposal

This proposal uses method numbers, but provides a deterministic mapping from method names to those numbers.

Goals

Provide a familiar dispatch-by-name abstraction to programmers
Compact, uniform-size on-chain encoding
No change to blockchain message schema
Permit specification of standard actor interfaces, including retro-actively in recognition of widely-adopted patterns
Permit extending the interface of an extant actor to implement methods defined in a new standard
Independent of programming language

This proposal assumes that method names are defined statically, that all actors will have the interface definitions of any other actors they call available at compile-time. However, it can support dynamic resolution for future reflection capabilities.

Method number computation

The method number for a symbolic method name is calculated as the first four bytes of hash(salt+methodname) interpreted as an unsigned 32-bit integer.

Zero is an invalid method number while Filecoin and the FVM continue to treat is as a special-case bare send of the native Filecoin token.

The salt is chosen so that hash(salt + "Constructor") == 1, the method number currently reserved for construction by the built-in actors.

The hash function is blake2b, which is already implemented in the VM and available as a syscall. This permits easy dynamic method number calculation, though build tools will typically compute a method number statically at compile time.

Method names

A method is exported when a method number is computed for it and the actor will recognise this number for internal dispatch. Conventions on exported method names are independent of any programming language conventions. Note that these conventions only apply to methods exported to the VM for inter-actor dispatch. Actor developers can continue to use relevant programming language conventions for simple internal function calls.

These conventions encode the loose guidelines set by the built-in actors.

Exported method names should:

Use only the ASCII characters in [a-zA-Z0-9_] (the same set as the C programming language). Other characters, including unicode beyond this set, are excluded in order to reduce the opportunity for misleading spelling of names in user interfaces.
Have an initial capital letter, and use CamelCase to identify word boundaries.
Capitalize all letters in acronyms.

Discussion

Collisions

Two different method names will collide on their method number with probability 1/(2^32). In the rare case of collision within a single actor, build tooling should identify the collision and prompt one of the methods to be renamed.

A harder-to-resolve collision will occur in case of a collision between names in two standard interfaces to be implemented by a single actor. The probability of a collision grows with the square root of the number of methods in each interface. Two ten-method interfaces will find a collision with probability approximately (10/2^32)*10 ~= 1/50,000,000. This is judged to be rare enough that developer tooling will be able to detect collision with widely used standard names before significant adoption makes renaming one a significant burden.

Compatibility with built-in actors

This proposal is compatible with the existing built-in actors. All they need to do is compute the prescribed method number corresponding to each existing method and add these to their existing dispatch tables. Calls to the old, sequential method numbers can continue to be supported indefinitely.

Compatibility with Ethereum

This scheme is inspired by the Solidity ABI for the Ethereum VM. It differs in the choice of hash function and exclusion of method parameter types (no overloading), so is not transparently compatible.

Note that removing both those differences would not automatically make this scheme compatible with the Solidity ABI, since method parameter types have fundamentally different schemas between the two environments. A future FVM/EVM embedding will need to translate method selectors along with the rest of the EVM ABI.

Network effects

While the FVM may remain independent of calling conventions, strong network effects are likely to result in a single convention dominating, with perhaps a few others for niche use cases. It is thus worthwhile establishing a good standard now.

Alternatives considered

Overloading

We could overload method names by including a description of their parameter type in the hash payload. This is rejected for simplicity. There is little evidence that method overloading has been of great utility in, e.g., Solidity. Since payloads are expected to be IPLD structures, overloading would require defining a standard serialization of an arbitrary IPLD type schema.

Excluding the parameter type from the method names means that methods may receive dynamically typed parameter payloads. This permits a kind of overloading if some part of the payload is used for dispatch in addition to the method number. A future convention could define a standard for this.

Interface prefix

In the case of defining contract API standards such as tokens, we could prefix the interface/standard name to the method name, thus preventing the collision of short/common names between different interfaces. E.g. hash("FRC20:" + methodname). This is rejected because it would make it impossible to retroactively declare some widely used method as such a standard without making all the existing implementations non-compliant with the standard they have created.

Where authors are designing a standard ahead of its wide adoption, they are encouraged to identify interface method names with such a prefix in any case, E.g. FRC20ReceiveToken.

Removing method numbers

There has previously been some enthusiasm for removing the method number from Filecoin, since it’s external to the VM and ABI. As this would require a change to the blockchain message structure, this is a difficult and hence unlikely change. This proposal instead exploits it, and for its original purpose.

An alternative that could promote the eventual removal of method numbers is to define a standard envelope IPLD structure for message parameters, embedding the method number there. Such an envelope would be a 2-item array, with the first item being the method number as an integer and the second item being a byte array encoding the resolved method’s parameters. Actors could extract the method number before decoding the remaining parameters, in order to determine the appropriate schema.

References

The solidity ABI spec: https://docs.soliditylang.org/en/v0.8.13/abi-spec.html

Stebalien · 2022-06-01T03:08:52Z

Stebalien
Jun 1, 2022
Collaborator

I want to quickly draw a distinction between "dynamic" and "static" calling:

Dynamic: We may need to call a method on an actor without knowing the actor's exact type ahead of time.
Static: In many cases, we'll know the target actor's exact type before we deploy.

This proposal seems to be targeting the dynamic case (e.g., ERC20 tokens, etc.). For the static case, we (@raulk and I) have been discussing how we might use IDLs to describe actor interfaces. IDLs could also be used for runtime reflection, but I assume most users won't want to pay the cost of runtime reflection.

3 replies

anorth Jun 1, 2022
Maintainer Author

I think you're using dynamic and static differently to how I did in the proposal. Let me try to clarify:

Your static: the exact code of the called actor is known at compile time. Like C++ non-virtual dispatch
Your dynamic / my static: the interface definitions are known at compile time, but the concrete receiver code isn't. Like C++ virtual dispatch (still type-checked)
My dynamic: the callee can accept different types of parameters, behave differently depending on what it gets. Like using void* or Javascript (no type checking).

Yes this proposal targets the middle one, but notes that the third one is also possible.

I'm not sure conventions would be so useful for the first case. Reflection to discover the methods and interfaces implemented by an actor is a whole 'nother topic (also with prior art in other chains).

Stebalien Jun 1, 2022
Collaborator

Yep, agree with all points.

Kubuxu Aug 2, 2022
Collaborator

One possibility would be to reserve some method numbers for the non-virtual dispatch, we could easily expand the rejection sampling to the range of 0-128 or something like that.

Stebalien · 2022-06-01T03:11:45Z

Stebalien
Jun 1, 2022
Collaborator

There has previously been some enthusiasm for removing the method number from Filecoin, since it’s external to the VM and ABI. As this would require a change to the blockchain message structure, this is a difficult and hence unlikely change. This proposal instead exploits it, and for its original purpose.

It turns out this isn't that difficult, as long as we do it before M2.

Externally, we'd bump the message version number and start allowing version 2 messages with no method number. Eventually, we'd stop allowing version 1 messages (or just allow them indefinitely, it doesn't really matter all that much).
Internally, we'd "upgrade" version 1 messages to version 2 by wrapping the parameters in a CBOR tuple containing the method number and the params.

0 replies

Schwartz10 · 2022-06-01T18:00:14Z

Schwartz10
Jun 1, 2022

@anorth I like your proposal to compute method numbers based on human readable method names! It makes sense to me, and it also would ease the burden of trying to treat built-in actors separate from custom programmed actors when building something like a block explorer.

One question (that might be unrelated) would be how to handle getters. One of the nice composable aspects of solidity in the EVM world is being able to easily fetch storage data from a contract at an address. This is necessary to compute things like balanceOf in an ERC20 token.

Could (and should?) this proposal cover how composability would work for getters? Should that be handled separately and something we shouldn't care about when designing calling conventions for methods that mutate state?

2 replies

Kubuxu Jun 2, 2022
Collaborator

Handling of getters, as far as I know, is not undergoing any changes. The state of other actors will continue to be accessible only by sending a message to them. This proposal is adjacent to it by allowing us to specify stable interfaces for cross-actor communication.

Schwartz10 Jun 4, 2022

Hey @Kubuxu we had talked about this on the last FVM early builders call, and @raulk suggested I start a discussion around it. Here it is! #383

PhilippeMts · 2022-06-03T13:29:25Z

PhilippeMts
Jun 3, 2022

Thank you @anorth for this great and comprehensive proposal.

If we take a moment to reconsider the alternative approach to overloading method names, aren't you afraid that you are pushing for the creation of a new set of conventions, allowing such overloading at the edge, but a set that would be different from the one coming from the Solidity ABI and with no significant benefit?

All the arguments you mention are of course true, and one choice does not prevail over the other because they have no compatibility or performance implications.
But don't you think that an informed decision could be worthwhile putting together the pros and cons of both choices?

I fear that as it stands the ease of understanding provided by keeping a convention similar to the Solidity ABI's might be underestimated.

2 replies

anorth Jun 8, 2022
Maintainer Author

I would welcome a stronger argument about why we should take on the extra complexity (an e.g. from Solidity, plus the need for canonical serialisation of IPLD schemas).

Something that would be somewhat persuasive would be evidence that overloading is used widely in EVM/Solidity (which, as far as I know, is the only widely-used VM environment that permits it at all). If it's not widely used, the convention isn't worth much and there would be little problem understanding a non-overloading convention.

An example would be if overloads appeared in many EIP standards. But from my (admittedly limited) experience, it's rare. EIP-721 uses it to effect an optional parameter, but we can do that much better in IPLD/Rust already.

I would note that I don't think this will effect embedded EVM contracts running in FVM: they'll still use the Ethereum standard mechanism, including overloading, for calls between themselves and inbound from native actors. The translation/bridge between native actors and hosted contracts is still TBD (@raulk any insight whether there'd be problems here?)

PhilippeMts Jun 9, 2022

Thanks @anorth for these arguments. With this type of discussion where several options may compete, it can indeed prove constructive to crowd-evaluate the value of our pros and cons.

I personally probably underestimated the burden of defining a canonical serialization of IPLD schemas and overestimated the simplifying benefit of staying close to the Solidity conventions.

I agree, the most direct and common use of the overloading feature is to enable optional parameters and default value. As long as it is enabled in Solidity, one may not foresee the spread of its use in the future, but this certainly does not make for a solid case for integrating this feature today. 👍

PhilippeMts · 2022-06-03T13:30:00Z

PhilippeMts
Jun 3, 2022

I find the proposition around the salt pretty smart.

If the choice of salt so that hash(salt + "Constructor") == 1 is mandatory for this proposal to hold, perhaps we should add, at least for the record, that the practical feasibility of such a finding is intimately linked to the 32-bit size of method numbers?

Finding such a salt if at some point method numbers were to be longer in size could prove to be an altogether different story.

1 reply

Kubuxu Jun 3, 2022
Collaborator

Alternative to the salt, we could "define" a hash function Hw(x) using the underlying hash function H(x) (Blake, SHA, doesn't matter), which returns 1 when called with x == "Constructor" and H(x) otherwise. This is quite similar to what Ethereum is doing for hashing 0 IIRC.

anorth · 2022-07-07T02:52:26Z

anorth
Jul 7, 2022
Maintainer Author

I have drafted an FRC codifying this. #399

Changes since initial draft:

Adopted the suggestion to just define hash("Constructor") to be 1
Mapped hash values of 0 to 0xffffffff

6 replies

anorth Jul 15, 2022
Maintainer Author

Thanks, I appreciate you engaging here with a concrete alternative.

Your proposal does not enjoy a compact, uniform-size on-chain encoding, because both the method name and parameter name strings appear in each invocation message. Message bytes are important. They cost gas now, and in the long run may become a fundamental limiting factor of the Filecoin L1's throughput. If the Ethereum community's scaling plans are any guide, then we can hope a large amount of execution will move off to Layer-2 execution environments like rollups. In order to benefit from the L1's security, rollups need to commit the transaction inputs back to L1. So the bandwidth of data becomes the limiting factor. Today, message bytes are underpriced (@Kubuxu can give more detail) – they will probably become more expensive. If they're expensive, this would exert a very unfortunate pressure on developers to use short method and parameter names for efficiency. This would be bad for developer experience but also make collisions dramatically more likely, which is counter to a fundamental goal of establishing a convention.

Your proposal has some things in common with earlier ideas I had but rejected, so let me walk through to them:

Firstly, I don't think the VM has any plans to enforce something like read-only methods (but see https://github.com/filecoin-project/fvm-specs/issues/102) so this use of method numbers won't add anything. Let's just think about it as "stop using the method number", which is actually where I started too.
The built-in actors use so-called tuple-encoding for parameter lists: the parameter structs are serialised as a CBOR array. The parameter names are not part of the serialization, for space, bandwidth and encoding efficiency. So your example looks like {"balance_of": ["alice"]}
In fact Filecoin does not use CBOR maps anywhere at present. So a consistent format would be a tuple-encoded ["balance_of", ["alice"]]. This was in fact my first idea.
But the method name can be removed too. I came round to the point of view that FIlecoin has method numbers built in, it would be a very long shot and minimal value to try to remove them, so let's exploit it. The method hashing idea works even better in Filecoin than Ethereum because the method number is already a concept for dispatch. All we need is some scheme for collision-avoidance between independently developed APIs. Which leads to the proposal posted and ["alice"] (gWVhbGljZQ==) (plus 4 bytes for the method number).

Tom-OriginStorage Jul 15, 2022

Yep, I understand the point of invocation message bloating and the importance of keeping message size as small as possible. However, I do not see a problem with message size being the limiting factor of network resources, I think the success of Filecoin Network is orthogonal to this message size bloating issue. While I admit there is a trade off as the parameter message is now slightly more bloated, this is not something that is addressable even if we utilize Ethereum convention. I can specify a string parameter for Ethereum functions and we will be back to square one with very long strings params bloating the network.

And there are very obvious advantages to these kind of conventions:

Easy indexing of smart contracts interaction by block explorers (no decoding needed or requiring an ABI)
Lowers the barrier of entry for people to build infrastructure around Filecoin Network (similar to above point)
Friendlier Rust code (more readable and easier to learn)
People are way more familiar with JSON than CBOR
While the VM has no plans to enforce read-only methods, there will be people out there interested to implement such a functionality by themselves for obvious reasons (to make their DApps work the same way they expect to)
Open to extensions such as pre-paid function calls (similar to Biconomy)

Stebalien Jul 15, 2022
Collaborator

So, we can split the difference and use numbers when sending messages from off-chain, and strings on-chain. I've written the proposal in #409 so we can discuss it with threads instead of one big thread.

anorth Jul 17, 2022
Maintainer Author

I can specify a string parameter for Ethereum functions and we will be back to square one with very long strings params bloating the network

I'm not sure I quite get where you're coming from, but I think a reasonable assumption is that a parameter value is carrying useful, invocation-specific information, and if it's large then that's because a lot of information is needed. A rollup will probably submit long byte arrays (internally compressed method invocations). A parameter or method name is not carrying any information that's useful at execution time, and is wasted bytes.

I concur there would be some introspection advantages to a more verbose protocol, but I don't think they'll hold when actor authors are optimising gas usage to provide a better (cheaper) product to their users.

People are way more familiar with JSON than CBOR

This is a bit beside the point. UIs etc can translate IPLD to/from JSON if required. The whole chain and VM are already IPLD/CBOR and that isn't gunna change here (would require a core FIP, not just a recommendation).

Open to extensions such as pre-paid function calls

You'll have to give a more detailed idea of what you're describing here and what features of the original proposal lock it out.

Tom-OriginStorage Jul 19, 2022

Replied in #409

jennijuju · 2022-07-26T04:21:02Z

jennijuju
Jul 26, 2022
Maintainer

@anorth why do we want to keep method numbers?

1 reply

anorth Jul 26, 2022
Maintainer Author

The base protocol already has method numbers, and it's quite unlikely we'll remove them in the future.
They allow compression of arbitrary message names to a fixed size.

Even if we didn't have enshrined method numbers, I would probably propose adding them (as part of params payload) here.

The arguments against (more self-describing, transparent messages) are legitimate too.

anorth · 2022-08-18T09:25:35Z

anorth
Aug 18, 2022
Maintainer Author

From @Stebalien

Can we increase this even more? I.e., reserve 0x7FFFFFFF? (or even just 0xFFFF). There's really no reason to go small here

Well the reason is to preserve good collision resistance against a birthday "attack" of many independently-developed multi-method standard APIs

8 replies

Stebalien Aug 19, 2022
Collaborator

I'm happy any which way. I just don't want to regret not reserving a larger range alter if there's no real reason not to.

Kubuxu Aug 19, 2022
Collaborator

Setting the high bit to 1 results in half as much space which significantly increases the probability of collision. I would argue that there is no use case for 2^31 static method IDs. I can see us deciding on alternative dynamic proposal in future but there are way we can squeeze that in using either CBOR encoding tricks or by going to 64bit.

I agree that moving away from rejection sampling to flat bitmask would make the implementation marginally simpler but I would argue that rejection sampling gives us better tradeoffs.

Stebalien Aug 22, 2022
Collaborator

but I would argue that rejection sampling gives us better tradeoffs.

What's the tradeoff?

2^24 or 2^16 is probably enough anyways.

Kubuxu Aug 22, 2022
Collaborator

What's the tradeoff?

Space for static method IDs vs the expected number of interface methods before bday collision.

anorth Aug 30, 2022
Maintainer Author

In #445 I've increased the rejection sampling to 24 bits, while requiring the larger Blake2b-512 digest, which keeps us crypto-safe from failing to find a viable method number in the hash digest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FVM native calling convention proposal (FRC-0042) #382

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments 23 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

FVM native calling convention proposal (FRC-0042) #382

anorth Jun 1, 2022 Maintainer

Background

Proposal

Goals

Method number computation

Method names

Discussion

Collisions

Compatibility with built-in actors

Compatibility with Ethereum

Network effects

Alternatives considered

Overloading

Interface prefix

Removing method numbers

References

Replies: 8 comments · 23 replies

Stebalien Jun 1, 2022 Collaborator

anorth Jun 1, 2022 Maintainer Author

Stebalien Jun 1, 2022 Collaborator

Kubuxu Aug 2, 2022 Collaborator

Stebalien Jun 1, 2022 Collaborator

Schwartz10 Jun 1, 2022

Kubuxu Jun 2, 2022 Collaborator

Schwartz10 Jun 4, 2022

PhilippeMts Jun 3, 2022

anorth Jun 8, 2022 Maintainer Author

PhilippeMts Jun 9, 2022

PhilippeMts Jun 3, 2022

Kubuxu Jun 3, 2022 Collaborator

anorth Jul 7, 2022 Maintainer Author

anorth Jul 15, 2022 Maintainer Author

Tom-OriginStorage Jul 15, 2022

Stebalien Jul 15, 2022 Collaborator

anorth Jul 17, 2022 Maintainer Author

Tom-OriginStorage Jul 19, 2022

jennijuju Jul 26, 2022 Maintainer

anorth Jul 26, 2022 Maintainer Author

anorth Aug 18, 2022 Maintainer Author

Stebalien Aug 19, 2022 Collaborator

Kubuxu Aug 19, 2022 Collaborator

Stebalien Aug 22, 2022 Collaborator

Kubuxu Aug 22, 2022 Collaborator

anorth Aug 30, 2022 Maintainer Author

anorth
Jun 1, 2022
Maintainer

Replies: 8 comments 23 replies

Stebalien
Jun 1, 2022
Collaborator

anorth Jun 1, 2022
Maintainer Author

Stebalien Jun 1, 2022
Collaborator

Kubuxu Aug 2, 2022
Collaborator

Stebalien
Jun 1, 2022
Collaborator

Schwartz10
Jun 1, 2022

Kubuxu Jun 2, 2022
Collaborator

PhilippeMts
Jun 3, 2022

anorth Jun 8, 2022
Maintainer Author

PhilippeMts
Jun 3, 2022

Kubuxu Jun 3, 2022
Collaborator

anorth
Jul 7, 2022
Maintainer Author

anorth Jul 15, 2022
Maintainer Author

Stebalien Jul 15, 2022
Collaborator

anorth Jul 17, 2022
Maintainer Author

jennijuju
Jul 26, 2022
Maintainer

anorth Jul 26, 2022
Maintainer Author

anorth
Aug 18, 2022
Maintainer Author

Stebalien Aug 19, 2022
Collaborator

Kubuxu Aug 19, 2022
Collaborator

Stebalien Aug 22, 2022
Collaborator

Kubuxu Aug 22, 2022
Collaborator

anorth Aug 30, 2022
Maintainer Author