Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent bug when reusing type ids in hashing #1075

Merged

Conversation

tadeohepperle
Copy link
Contributor

@tadeohepperle tadeohepperle commented Jul 17, 2023

fixes #1066.

I first thought a simple restructuring approach like @jsdw mentioned in #1066 would work, but correct me if I'm wrong, I think it cannot solve the recursive type problem, because creating a new HashSet in get_type_id() will eventually run into an infinite loop.

Instead of using a HashSet for keeping track of the visited ids, we can use a HashMap that saves some info about if the type has been visited yet and also if a complete hash has been calculated yet.

I added a test to make sure it works (Without my code changes, it fails).

@tadeohepperle tadeohepperle requested a review from a team as a code owner July 17, 2023 15:07
@tadeohepperle tadeohepperle changed the title Tadeo hepperle prevent reuse of type ids in hashing Prevent bug when reusing type ids in hashing Jul 17, 2023
@jsdw
Copy link
Collaborator

jsdw commented Jul 17, 2023

because creating a new HashSet in get_type_id() will eventually run into an infinite loop.

What I meant was that get_type_id could be exposed to the rest of the module to get type IDs, and this would internally create a new HashSet or whatever.

And then "internally" (and not exposed to the rest of the crate) one could use a get_type_id_with_cache or whatever that looks like the current one does.

Maybe some stuff would end up in a different module or something to support that.

And the point of doing this would just be to guarantee that get_type_id always starts from a fresh state and doesn't re-use anything from other invocations which could change the result or make it worse :)

But anyways, maybe that's irrelevant now and I'll have a gander at the approach you took!

}

cache.insert(id, 0);
Copy link
Collaborator

@jsdw jsdw Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, I like the idea of caching the hashes like this in general; good thinking!

There's maybe a small chance that the first 4 bytes of type_hash are actually [0,0,0,0], which would make that type look like the special recursive type that leads to an early return.

I also wonder; is it better to store the full [u8; HASH_LEN] type hash in the cache? We use more memory with this, but then don't need to hash it, and it means that the cache is truly caching the computed hashes for various types rather than caching something that's not actually the original hash we computed for a type. It feels "safer" to reuse the cache in other places if we know that it is storing the actual computed hashes over something a bit different.

Combining the two thoughts above, I wonder whether the cache should look like:

type Cache = HashMap<u32, CachedValue>;

enum CachedValue {
    Recursive,
    Hash([u8; HASH_LEN])
}

Where we insert CachedValue::Recursive initially and update that to a final hash when done. (if get_type_hash looks in the cache and sees a CachedValue::Recursive it can return with eg MAGIC_RECURSIVE_HASH_VALUE, which is functionally the same as right now where it would return with hash(0)).

I'm suggesting a new enum above over using Option just because it would be a little more self-documenting, but either works really!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could definitely do that, thought about something like that as well, it is more secure at the cost of a little bit more memory. I will change it! :)

Copy link
Collaborator

@jsdw jsdw Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be great, and then perhaps also we could add a test which uses the cache across a couple of invocations of hashing the same type to comfirm that indeed you get the same hash back both times (first time caches the hash, second time just returns it straight off). This should even work if the type has a recursive definition in which is cool :)

Ultimately whether we pass a new cache or the same cache in to multiple calls, the results should all be the same.

// if the cache has an entry for the id, just return a hash derived from it.
// if the type has not been seen yet, mark it with `CachedHash::Recursive` in the cache and proceed to `get_type_def_hash()`.
// -> During the execution of get_type_def_hash() we might get into get_type_hash(id) again for the original id
// -> in this case the `CachedHash::Recursive` provokes an early return.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it's more like "in this case, the CachedHash::Recursive returns a fixed "magic" hash rather than recursing into the same type again" :)

Copy link
Collaborator

@jsdw jsdw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Just one tiny suggestion to add a test showing that the cache is cahcing actual hashes and thus can safely be reused across multiple calls to hash the same types or whatever, but otherwise looks good to me!

Comment on lines +813 to +817
let CachedHash::Hash(a_cache_hash) = cache[&a_type_id] else { panic!() };
let CachedHash::Hash(b_cache_hash) = cache[&b_type_id] else { panic!() };

assert_eq!(a_hash, a_cache_hash);
assert_eq!(b_hash, b_cache_hash);
Copy link
Collaborator

@jsdw jsdw Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also check that a_hash == b_hash; imo that's more iomportant than looking into the cached stuff :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want the hashes to be the same, because A and B are different structs, with different fields:

struct A { pub b: Box<B> }
struct B { pub a: Box<A> }

Maybe I understood you wrong, what did you mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry yeah, I mean, it'd be nice to have a test like:

let a_type_id = registry.register_type(&meta_type::<A>()).id;

let a_hash1 = get_type_hash(&registry, a_type_id, &mut cache);
let a_hash2 = get_type_hash(&registry, a_type_id, &mut cache);

assert_eq!(a_hash1, a_hash2);

ie, we want to check that sharing the cache across different uses doesn't change the hashes that are produced for some type.

@tadeohepperle tadeohepperle merged commit 475a141 into master Jul 19, 2023
@tadeohepperle tadeohepperle deleted the tadeo-hepperle-prevent-reuse-of-type-ids-in-hashing branch July 19, 2023 17:49
tadeohepperle added a commit that referenced this pull request Jul 20, 2023
* practice TDD

* implement a hashmap 2-phases approach

* use nicer types

* add test for cache filling

* adjust test

---------

Co-authored-by: James Wilson <[email protected]>
tadeohepperle added a commit that referenced this pull request Jul 20, 2023
* practice TDD

* implement a hashmap 2-phases approach

* use nicer types

* add test for cache filling

* adjust test

---------

Co-authored-by: James Wilson <[email protected]>
tadeohepperle added a commit that referenced this pull request Jul 20, 2023
* remove config, doc tests are expected to fail (book not adjusted yet)

* make doc tests pass

* Prevent bug when reusing type ids in hashing (#1075)

* practice TDD

* implement a hashmap 2-phases approach

* use nicer types

* add test for cache filling

* adjust test

---------

Co-authored-by: James Wilson <[email protected]>

* remove the unnecessary intos

---------

Co-authored-by: James Wilson <[email protected]>
@jsdw jsdw mentioned this pull request Jul 24, 2023
tadeohepperle added a commit that referenced this pull request Aug 2, 2023
* routing and signing example

* cliipy fix

* submitting extrinsics

* change order of lines

* Skip call variants if there aren't any (#980)

Co-authored-by: Niklas Adolfsson <[email protected]>

* Tidy up some metadata accessing (#978)

* Reduce some repetition when obtaining metadata pallets/runtime_traits

* make them pub

* fix docs and clippy

* Bump tokio from 1.28.1 to 1.28.2 (#984)

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.28.1 to 1.28.2.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](tokio-rs/tokio@tokio-1.28.1...tokio-1.28.2)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump regex from 1.8.2 to 1.8.3 (#986)

Bumps [regex](https://github.com/rust-lang/regex) from 1.8.2 to 1.8.3.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](rust-lang/regex@1.8.2...1.8.3)

---
updated-dependencies:
- dependency-name: regex
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump quote from 1.0.27 to 1.0.28 (#983)

Bumps [quote](https://github.com/dtolnay/quote) from 1.0.27 to 1.0.28.
- [Release notes](https://github.com/dtolnay/quote/releases)
- [Commits](dtolnay/quote@1.0.27...1.0.28)

---
updated-dependencies:
- dependency-name: quote
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump proc-macro2 from 1.0.58 to 1.0.59 (#985)

Bumps [proc-macro2](https://github.com/dtolnay/proc-macro2) from 1.0.58 to 1.0.59.
- [Release notes](https://github.com/dtolnay/proc-macro2/releases)
- [Commits](dtolnay/proc-macro2@1.0.58...1.0.59)

---
updated-dependencies:
- dependency-name: proc-macro2
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* restrict sign_with_address_and_signature interface (#988)

* changing js bridge

* dryrunresult ok

* submitting extrinsic working

* tiny up code and ui

* formatting

* remove todos

* support tip and mortality

* Prevent bug when reusing type ids in hashing (#1075)

* practice TDD

* implement a hashmap 2-phases approach

* use nicer types

* add test for cache filling

* adjust test

---------

Co-authored-by: James Wilson <[email protected]>

* small adjustment

* Merge branch 'master' into tadeo-hepperle-browser-extension-signing-example

* fix lock file

* tell users how to add Alice account to run signing example

* adjust to PR comments

* fmt

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: James Wilson <[email protected]>
Co-authored-by: Niklas Adolfsson <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata Validation: prevent reuse of visited_type_ids
3 participants