Type deduplication and ensuring unique type paths #19

tadeohepperle · 2024-03-20T09:57:07Z

This issue explains the background of a problem that we currently have about generating rust data types from a PortableRegistry.

Goals:

Deduplication

A PortableRegistry is part of a substrate node's metadata and contains all the types necessary to interact with the node. Only concrete types can be registered into a PortableRegistry. Each type registered gets its own type id. When registering a struct, the types of all its field are registered recursively as well. Imagine we register a struct Bar into the registry:

struct Bar {
    a: Foo<u8>,
    b: Foo<u32>
}

struct Foo<T>{
    t: T
}

The types Foo<u8> and Foo<u32> will be distinct types in the registry with their own type ids. But: they can be represented with just one rust type: Foo<T>. So they are duplicates. When generating rust code, we want to deduplicate them and just generate one Foo<T> struct. We want to deduplicate all types that have the same type path and same structure.
Comparing the type path (e.g. my::module::Foo) is easy, comparing the structure not so much, more on that later.

Ensuring unique type paths

Another thing we can encounter is that the type registry contains two types that have the same type path but a different structure. In these cases we cannot generate a single rust type. Instead we can give each of the two types a distinct name (changing the ident of the type path) and generate distincts rust types. E.g. Foo -> Foo1 and Foo2.

There are 3 cases when this happens AFAIK:

feature flags that change the structure of a type. In this case the type path will still be the same.
manual implementation of the scale_info::TypeInfo trait, specifying the same type path for two types.
associated types: Imagine you have a type like this:

trait X{
     type Assoc;
}
struct A;  
impl X for A {  type Accoc = u8;  }
struct B;  
impl X for B {  type Accoc = u32;  }

#[scale_info(skip_type_params(T))] // this is optional
struct Foo<T: X>{
    field: X::Assoc,
}

Then if Foo<A> and Foo get registered, there is no way for us to generate a single rust type Foo<T> anymore. The registry does not contain the trait bounds, and we cannot generic types with trait bounds and associated types as fields.
So because there is simply no way to generate struct Foo<T: X>{ field: X::Assoc } we generate Foo1 { field: u8 } and Foo2 { field: u32 } instead.

But: We do not want to tamper with the users PortableRegistry by default. So if a case like this is encountered we currently return an Error. The user can run the scale_typegen::utils::ensure_unique_type_paths fn on their PortableRegistry to remap duplicate type paths Foo -> Foo1 and Foo2 where the type structure differs. This makes the type generation succeed.

What is the problem?

The issue is that we are currently not very good at "deduplicating" types because it is not so easy to determine if the structure of two types is the same considering their type params. The logic for this is in the scale_typegen::utils::types_equal_extended_to_params function.

To see if two types A and B are equal all of their fields need to equal: For each field we compare the type ids. If they are the same, we consider the types equal. If they are not the same, but they are the type id of a type param for A and B, and this type params has the same name, then we also consider them equal.

There are a few problems with that: Imagine we have these two types:

struct Foo<T = u64> {
    inner: Vec<u64>
}
struct Foo<T = u32> {
    inner: Vec<u32>
}

There are now 6 types in the type registry, let's give them arbitrary ids:

Foo<T=u64>: 0
Vec<u64>: 1
u64: 2
Foo<T=u32>: 3
Vec<u32>: 4
u32: 5

Then when comparing the inner fields, their ids are 1 (Vec<u64>) and 4 (Vec<u32>) respectively, but the generic param T has ids 2 (u64) for Foo<T=u64> and 5 (u32) for Foo<T=u32>, so they are deemed to be different.
This means we cannot detect that Foo and Foo could be one rust struct. We deduplicate them into Foo1 and Foo2 or the codegen will return an error.

Solution: We need to recursively go down into the Vec and Vec types as well. Just calling types_equal_extended_to_params on them though is not enough, we need to pass down more information about type_id -> type param mapping from the parent struct.
Here we also run into the issue that only some fields with said type id might map to the generic and others are actually fixed:
Consider Foo<T=u32>{ a: T, b:u32 }.

Another case where we currently return false, is when comparing these two types, suppose they have the same type path:

struct Foo<T> {
    inner: T
}
struct Foo<B> {
    inner: B
}

This is because the type_name of the field inner does not match even though structurally they are the same.

A hashing solution?

One solution I could think of is that we do not compare two types directly. Instead we calculate a hash for each type and insert them into a bin for each hash. All types that land in the same bin can be considered equal and we can pick any one of them to generate the rust type.

This hashing just needs to be aware of type params, and hash the same type params in the same way.
Here the hash should be based on the position of the type params in the type definition and not on their name.
First: build up a table of params and their position:
[(A, 0), (B,1), (C,2), ...]
When iterating over the fields we can check for each field if it maps to a type param or not:

if type's type_name and type_id match with a type param: -> hasher.hash(position of type param); hasher.hash(GENERIC_FLAG)
else if type's type_name contains angle brackets e.g. Vec: calculate hash of Vec:
- here, we pass [(B, 1)] to Vec and use it instead of [(B, 0)] that we would get by just looking at Vec in a vacuum.
- when recursion down done, hasher.hash(hash of Vec).
else: type does not refer to type param: -> hasher.hash(type id); hasher.hash(NOT_GENERIC_FLAG);

The text was updated successfully, but these errors were encountered:

tadeohepperle mentioned this issue Mar 21, 2024

Fix bug with ensure_unique_type_paths and prep for patch release. #22

Merged

This was referenced May 22, 2024

Types duplicated when upgrading to subxt 0.36.0 paritytech/subxt#1599

Closed

Recursive type equality check for de-duplicating #30

Merged

jsdw closed this as completed in #30 May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type deduplication and ensuring unique type paths #19

Type deduplication and ensuring unique type paths #19

tadeohepperle commented Mar 20, 2024 •

edited

Loading

Type deduplication and ensuring unique type paths #19

Type deduplication and ensuring unique type paths #19

Comments

tadeohepperle commented Mar 20, 2024 • edited Loading

Goals:

Deduplication

Ensuring unique type paths

What is the problem?

A hashing solution?

tadeohepperle commented Mar 20, 2024 •

edited

Loading