-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] Zero-copy DAG_CBOR into Vec<Cid> parser. #3379
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - the title says zero-copy serialization of Vec<Cid>
, but I'd expect to see a lifetime in that case, deserializing into a &'serial [Cid]
or something.
Plus, the in-memory representation of a CID is different to the CBOR serialization, so I'd expect to see a new struct, but I don't - what am I missing?
The CIDs are copied (which can be done without any heap allocations.) Other data, such as bytes and strings, are not copied. |
The title does not say anything about serialization whatsoever. It's deserialization of a
|
Single-run results
|
Co-authored-by: David Himmelstrup <[email protected]>
Yeah, zero-copy here is a bit misleading, I'll make sure this is documented properly. |
I don't think it's inaccurate to call this a zero-copy parser. The PR is doing two things: a) parsing an ipld structure, b) extracting CIDs from |
Ahh I understand now!
Zero-copy in my head means holding references into buffers. Copying CIDs out of it is not zero copy ;) I think |
src/utils/encoding/cid_de_cbor.rs
Outdated
} | ||
|
||
#[inline] | ||
fn visit_str<E>(self, _value: &str) -> Result<Self::Value, E> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nonblocking: can you macro these away with a comment? Or at least lift the most relevant fns to the top?
"the default visitor errors when visiting these, but we want to skip over them, so return Ok(())
"
Out of curiosity why have you only done that for some of the visited types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only care about list, map and newtype_struct, because a list and a map could contain more of those and the only type that converts to a CID is the newtype_struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that introducing a macro here is going to help anything, it's just going to obfuscate the obvious. I'll re-arrange the order, however.
src/utils/encoding/cid_de_cbor.rs
Outdated
impl Deref for CidVec { | ||
type Target = Vec<Cid>; | ||
|
||
fn deref(&self) -> &Self::Target { | ||
&self.0 | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing Deref for smart pointers makes accessing the data behind them convenient, which is why they implement Deref. On the other hand, the rules regarding Deref and DerefMut were designed specifically to accommodate smart pointers. Because of this, Deref should only be implemented for smart pointers to avoid confusion.
- https://doc.rust-lang.org/std/ops/trait.Deref.html (emphasis not mine)
Not a smart pointer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a smart pointer, but since we have a thin wrapper here strictly for deserialization that also explicitly states it is a ‘Vec’ - it’s not surprising that we want to use a ‘CidVec’ as it’s underlying ‘Vec’, without any explicit conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it’s a “choose your poison” kind of thing and not using helpful abstractions in this case seems wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have gone with into_inner
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, into_inner
it is then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not using helpful abstractions in this case seems wasteful.
It's the wrong abstraction, and a commonly misused one at that, which is why it's called out in the docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the wrong abstraction, and a commonly misused one at that, which is why it's called out in the docs
Sure is, though I'm still of the opinion that thin wrappers could benefit a lot from this. I would go further and say that the need for such wrappers to override certain behaviours calls for something magic/transparent to make the wrapper adhere to the same interface as whatever is wrapped. I wish there was a more eloquent way of implementing custom ser/de for types, e.g. the ability to have an implementation for a type alias.
Personal opinions aside - this has been fixed.
Co-authored-by: Aatif Syed <[email protected]>
Imagine a zero-copy parser that returns an |
CHANGELOG.md
Outdated
- [#3379](https://github.com/ChainSafe/forest/pull/3379): Zero-copy DAG_CBOR | ||
into Vec<Cid> parser. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not user-visible addition. I don't think it should be in the CHANGELOG. If we want a CHANGELOG entry, it should be under Changed
, and it should say the performance was improved when walking the state graph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
src/utils/encoding/cid_de_cbor.rs
Outdated
/// [`CidVec`] allows for efficient zero-copy de-serialization of `DAG_CBOR`-encoded nodes into a | ||
/// vector of [`Cid`]. | ||
#[derive(Default)] | ||
pub struct CidVec(Vec<Cid>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this name -it's too general - it was the original name for what is now FrozenCids
, for example.
Can you think of a better name, or maybe different API? pub fn filter_cids(_: ???)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a good point. My idea behind this name is exactly that - this represents a Vec<Cid>
. If we had a better way of dealing with ser/de without a wrapper - this wouldn't be needed.
Perhaps a good middleground is to do something like: pub fn extract_cids(cbor_blob: &bytes): Vec<Cid>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this feature covered by unit tests?
Good point, it actually isn't. |
src/utils/encoding/cid_de_cbor.rs
Outdated
// Cleaning up Integer and Float in order to avoid parser mistakes that result | ||
// in tag detection and a subsequent Cid decoding failure. | ||
// See https://github.com/ipld/serde_ipld_dagcbor/blob/master/src/de.rs#L178 and | ||
// https://github.com/ipld/serde_ipld_dagcbor/blob/master/src/de.rs#L119 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use more explanation, for someone fairly fresh to this subject it's a bit shady to replace all numeric occurrences with zeroes.
Also, let's use permalinks, otherwise those links will get quickly outdated and point to random pieces of code (or 404).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Follow-up: create an issue to tackle roundtrip ser/de issues mentioned here: forest/src/utils/encoding/cid_de_cbor.rs Lines 218 to 228 in ec7e65b
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rock-solid
Summary of changes
Changes introduced in this pull request:
Reference issue to close (if applicable)
Closes
Other information and links
Change checklist