Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type deduplication and ensuring unique type paths #19

Closed
tadeohepperle opened this issue Mar 20, 2024 · 0 comments · Fixed by #30
Closed

Type deduplication and ensuring unique type paths #19

tadeohepperle opened this issue Mar 20, 2024 · 0 comments · Fixed by #30

Comments

@tadeohepperle
Copy link
Contributor

tadeohepperle commented Mar 20, 2024

This issue explains the background of a problem that we currently have about generating rust data types from a PortableRegistry.

Goals:

Deduplication

A PortableRegistry is part of a substrate node's metadata and contains all the types necessary to interact with the node. Only concrete types can be registered into a PortableRegistry. Each type registered gets its own type id. When registering a struct, the types of all its field are registered recursively as well. Imagine we register a struct Bar into the registry:

struct Bar {
    a: Foo<u8>,
    b: Foo<u32>
}

struct Foo<T>{
    t: T
}

The types Foo<u8> and Foo<u32> will be distinct types in the registry with their own type ids. But: they can be represented with just one rust type: Foo<T>. So they are duplicates. When generating rust code, we want to deduplicate them and just generate one Foo<T> struct. We want to deduplicate all types that have the same type path and same structure.
Comparing the type path (e.g. my::module::Foo) is easy, comparing the structure not so much, more on that later.

Ensuring unique type paths

Another thing we can encounter is that the type registry contains two types that have the same type path but a different structure. In these cases we cannot generate a single rust type. Instead we can give each of the two types a distinct name (changing the ident of the type path) and generate distincts rust types. E.g. Foo -> Foo1 and Foo2.

There are 3 cases when this happens AFAIK:

  1. feature flags that change the structure of a type. In this case the type path will still be the same.
  2. manual implementation of the scale_info::TypeInfo trait, specifying the same type path for two types.
  3. associated types: Imagine you have a type like this:
trait X{
     type Assoc;
}
struct A;  
impl X for A {  type Accoc = u8;  }
struct B;  
impl X for B {  type Accoc = u32;  }

#[scale_info(skip_type_params(T))] // this is optional
struct Foo<T: X>{
    field: X::Assoc,
}

Then if Foo<A> and Foo<B> get registered, there is no way for us to generate a single rust type Foo<T> anymore. The registry does not contain the trait bounds, and we cannot generic types with trait bounds and associated types as fields.
So because there is simply no way to generate struct Foo<T: X>{ field: X::Assoc } we generate Foo1 { field: u8 } and Foo2 { field: u32 } instead.

But: We do not want to tamper with the users PortableRegistry by default. So if a case like this is encountered we currently return an Error. The user can run the scale_typegen::utils::ensure_unique_type_paths fn on their PortableRegistry to remap duplicate type paths Foo -> Foo1 and Foo2 where the type structure differs. This makes the type generation succeed.

What is the problem?

The issue is that we are currently not very good at "deduplicating" types because it is not so easy to determine if the structure of two types is the same considering their type params. The logic for this is in the scale_typegen::utils::types_equal_extended_to_params function.

To see if two types A and B are equal all of their fields need to equal: For each field we compare the type ids. If they are the same, we consider the types equal. If they are not the same, but they are the type id of a type param for A and B, and this type params has the same name, then we also consider them equal.

There are a few problems with that: Imagine we have these two types:

struct Foo<T = u64> {
    inner: Vec<u64>
}
struct Foo<T = u32> {
    inner: Vec<u32>
}

There are now 6 types in the type registry, let's give them arbitrary ids:

Foo<T=u64>: 0
Vec<u64>: 1
u64: 2
Foo<T=u32>: 3
Vec<u32>: 4
u32: 5

Then when comparing the inner fields, their ids are 1 (Vec<u64>) and 4 (Vec<u32>) respectively, but the generic param T has ids 2 (u64) for Foo<T=u64> and 5 (u32) for Foo<T=u32>, so they are deemed to be different.
This means we cannot detect that Foo and Foo could be one rust struct. We deduplicate them into Foo1 and Foo2 or the codegen will return an error.

Solution: We need to recursively go down into the Vec and Vec types as well. Just calling types_equal_extended_to_params on them though is not enough, we need to pass down more information about type_id -> type param mapping from the parent struct.
Here we also run into the issue that only some fields with said type id might map to the generic and others are actually fixed:
Consider Foo<T=u32>{ a: T, b:u32 }.

Another case where we currently return false, is when comparing these two types, suppose they have the same type path:

struct Foo<T> {
    inner: T
}
struct Foo<B> {
    inner: B
}

This is because the type_name of the field inner does not match even though structurally they are the same.

A hashing solution?

One solution I could think of is that we do not compare two types directly. Instead we calculate a hash for each type and insert them into a bin for each hash. All types that land in the same bin can be considered equal and we can pick any one of them to generate the rust type.

This hashing just needs to be aware of type params, and hash the same type params in the same way.
Here the hash should be based on the position of the type params in the type definition and not on their name.
First: build up a table of params and their position:
[(A, 0), (B,1), (C,2), ...]
When iterating over the fields we can check for each field if it maps to a type param or not:

  • if type's type_name and type_id match with a type param: -> hasher.hash(position of type param); hasher.hash(GENERIC_FLAG)
  • else if type's type_name contains angle brackets e.g. Vec<B>: calculate hash of Vec<B>:
    • here, we pass [(B, 1)] to Vec<B> and use it instead of [(B, 0)] that we would get by just looking at Vec<B> in a vacuum.
    • when recursion down done, hasher.hash(hash of Vec<B>).
  • else: type does not refer to type param: -> hasher.hash(type id); hasher.hash(NOT_GENERIC_FLAG);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant