You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue explains the background of a problem that we currently have about generating rust data types from a PortableRegistry.
Goals:
Deduplication
A PortableRegistry is part of a substrate node's metadata and contains all the types necessary to interact with the node. Only concrete types can be registered into a PortableRegistry. Each type registered gets its own type id. When registering a struct, the types of all its field are registered recursively as well. Imagine we register a struct Bar into the registry:
struct Bar {
a: Foo<u8>,
b: Foo<u32>
}
struct Foo<T>{
t: T
}
The types Foo<u8> and Foo<u32> will be distinct types in the registry with their own type ids. But: they can be represented with just one rust type: Foo<T>. So they are duplicates. When generating rust code, we want to deduplicate them and just generate one Foo<T> struct. We want to deduplicate all types that have the same type path and same structure.
Comparing the type path (e.g. my::module::Foo) is easy, comparing the structure not so much, more on that later.
Ensuring unique type paths
Another thing we can encounter is that the type registry contains two types that have the same type path but a different structure. In these cases we cannot generate a single rust type. Instead we can give each of the two types a distinct name (changing the ident of the type path) and generate distincts rust types. E.g. Foo -> Foo1 and Foo2.
There are 3 cases when this happens AFAIK:
feature flags that change the structure of a type. In this case the type path will still be the same.
manual implementation of the scale_info::TypeInfo trait, specifying the same type path for two types.
associated types: Imagine you have a type like this:
trait X{
type Assoc;
}
struct A;
impl X for A { type Accoc = u8; }
struct B;
impl X for B { type Accoc = u32; }
#[scale_info(skip_type_params(T))] // this is optional
struct Foo<T: X>{
field: X::Assoc,
}
Then if Foo<A> and Foo<B> get registered, there is no way for us to generate a single rust type Foo<T> anymore. The registry does not contain the trait bounds, and we cannot generic types with trait bounds and associated types as fields.
So because there is simply no way to generate struct Foo<T: X>{ field: X::Assoc } we generate Foo1 { field: u8 } and Foo2 { field: u32 } instead.
But: We do not want to tamper with the users PortableRegistry by default. So if a case like this is encountered we currently return an Error. The user can run the scale_typegen::utils::ensure_unique_type_paths fn on their PortableRegistry to remap duplicate type paths Foo -> Foo1 and Foo2 where the type structure differs. This makes the type generation succeed.
What is the problem?
The issue is that we are currently not very good at "deduplicating" types because it is not so easy to determine if the structure of two types is the same considering their type params. The logic for this is in the scale_typegen::utils::types_equal_extended_to_params function.
To see if two types A and B are equal all of their fields need to equal: For each field we compare the type ids. If they are the same, we consider the types equal. If they are not the same, but they are the type id of a type param for A and B, and this type params has the same name, then we also consider them equal.
There are a few problems with that: Imagine we have these two types:
Then when comparing the inner fields, their ids are 1 (Vec<u64>) and 4 (Vec<u32>) respectively, but the generic param T has ids 2 (u64) for Foo<T=u64> and 5 (u32) for Foo<T=u32>, so they are deemed to be different.
This means we cannot detect that Foo and Foo could be one rust struct. We deduplicate them into Foo1 and Foo2 or the codegen will return an error.
Solution: We need to recursively go down into the Vec and Vec types as well. Just calling types_equal_extended_to_params on them though is not enough, we need to pass down more information about type_id -> type param mapping from the parent struct.
Here we also run into the issue that only some fields with said type id might map to the generic and others are actually fixed:
Consider Foo<T=u32>{ a: T, b:u32 }.
Another case where we currently return false, is when comparing these two types, suppose they have the same type path:
struct Foo<T> {
inner: T
}
struct Foo<B> {
inner: B
}
This is because the type_name of the field inner does not match even though structurally they are the same.
A hashing solution?
One solution I could think of is that we do not compare two types directly. Instead we calculate a hash for each type and insert them into a bin for each hash. All types that land in the same bin can be considered equal and we can pick any one of them to generate the rust type.
This hashing just needs to be aware of type params, and hash the same type params in the same way.
Here the hash should be based on the position of the type params in the type definition and not on their name.
First: build up a table of params and their position: [(A, 0), (B,1), (C,2), ...]
When iterating over the fields we can check for each field if it maps to a type param or not:
if type's type_name and type_id match with a type param: -> hasher.hash(position of type param); hasher.hash(GENERIC_FLAG)
else if type's type_name contains angle brackets e.g. Vec<B>: calculate hash of Vec<B>:
here, we pass [(B, 1)] to Vec<B> and use it instead of [(B, 0)] that we would get by just looking at Vec<B> in a vacuum.
when recursion down done, hasher.hash(hash of Vec<B>).
else: type does not refer to type param: -> hasher.hash(type id); hasher.hash(NOT_GENERIC_FLAG);
The text was updated successfully, but these errors were encountered:
This issue explains the background of a problem that we currently have about generating rust data types from a
PortableRegistry
.Goals:
Deduplication
A
PortableRegistry
is part of a substrate node's metadata and contains all the types necessary to interact with the node. Only concrete types can be registered into aPortableRegistry
. Each type registered gets its own type id. When registering a struct, the types of all its field are registered recursively as well. Imagine we register a structBar
into the registry:The types
Foo<u8>
andFoo<u32>
will be distinct types in the registry with their own type ids. But: they can be represented with just one rust type:Foo<T>
. So they are duplicates. When generating rust code, we want to deduplicate them and just generate oneFoo<T>
struct. We want to deduplicate all types that have the same type path and same structure.Comparing the type path (e.g.
my::module::Foo
) is easy, comparing the structure not so much, more on that later.Ensuring unique type paths
Another thing we can encounter is that the type registry contains two types that have the same type path but a different structure. In these cases we cannot generate a single rust type. Instead we can give each of the two types a distinct name (changing the ident of the type path) and generate distincts rust types. E.g.
Foo -> Foo1 and Foo2
.There are 3 cases when this happens AFAIK:
scale_info::TypeInfo
trait, specifying the same type path for two types.Then if
Foo<A>
andFoo<B>
get registered, there is no way for us to generate a single rust typeFoo<T>
anymore. The registry does not contain the trait bounds, and we cannot generic types with trait bounds and associated types as fields.So because there is simply no way to generate struct
Foo<T: X>{ field: X::Assoc }
we generateFoo1 { field: u8 }
andFoo2 { field: u32 }
instead.But: We do not want to tamper with the users
PortableRegistry
by default. So if a case like this is encountered we currently return anError
. The user can run thescale_typegen::utils::ensure_unique_type_paths
fn on theirPortableRegistry
to remap duplicate type pathsFoo -> Foo1 and Foo2
where the type structure differs. This makes the type generation succeed.What is the problem?
The issue is that we are currently not very good at "deduplicating" types because it is not so easy to determine if the structure of two types is the same considering their type params. The logic for this is in the
scale_typegen::utils::types_equal_extended_to_params
function.To see if two types A and B are equal all of their fields need to equal: For each field we compare the type ids. If they are the same, we consider the types equal. If they are not the same, but they are the type id of a type param for A and B, and this type params has the same name, then we also consider them equal.
There are a few problems with that: Imagine we have these two types:
There are now 6 types in the type registry, let's give them arbitrary ids:
Then when comparing the
inner
fields, their ids are1 (Vec<u64>)
and4 (Vec<u32>)
respectively, but the generic param T has ids2 (u64)
forFoo<T=u64>
and5 (u32)
forFoo<T=u32>
, so they are deemed to be different.This means we cannot detect that Foo and Foo could be one rust struct. We deduplicate them into Foo1 and Foo2 or the codegen will return an error.
Solution: We need to recursively go down into the Vec and Vec types as well. Just calling
types_equal_extended_to_params
on them though is not enough, we need to pass down more information about type_id -> type param mapping from the parent struct.Here we also run into the issue that only some fields with said type id might map to the generic and others are actually fixed:
Consider
Foo<T=u32>{ a: T, b:u32 }
.Another case where we currently return false, is when comparing these two types, suppose they have the same type path:
This is because the
type_name
of the fieldinner
does not match even though structurally they are the same.A hashing solution?
One solution I could think of is that we do not compare two types directly. Instead we calculate a
hash
for each type and insert them into a bin for eachhash
. All types that land in the samebin
can be considered equal and we can pick any one of them to generate the rust type.This hashing just needs to be aware of type params, and hash the same type params in the same way.
Here the hash should be based on the position of the type params in the type definition and not on their name.
First: build up a table of params and their position:
[(A, 0), (B,1), (C,2), ...]
When iterating over the fields we can check for each field if it maps to a type param or not:
type_name
andtype_id
match with a type param: ->hasher.hash(position of type param); hasher.hash(GENERIC_FLAG)
type_name
contains angle brackets e.g.Vec<B>
: calculate hash ofVec<B>
:Vec<B>
and use it instead of[(B, 0)]
that we would get by just looking atVec<B>
in a vacuum.hasher.hash(hash of Vec<B>)
.hasher.hash(type id); hasher.hash(NOT_GENERIC_FLAG);
The text was updated successfully, but these errors were encountered: