-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement BCS deserialization from_reader #6
Conversation
** Stragety ** The commit aims to maintain a minimal diff over the previous implementation. To that end, the following changes have been made. - Make `Deserializer` generic over an inner type `R` - Factor as much of the implementation as possible into a new trait `BcsReader`, which implements BCS in terms of a few primitive operations. Implement `serde::Deserialize<'de> for Deserializer<R> where Deserializer<R>: BcsReader<'de>` - Implement `BcsReader<'de>` for `Deserializer<&'de [u8]>` using existing code - Implement `BcsReader` generically for (a version of) `Deserializer<R: Read>`. The existing implementation is very close to supporting deserialization from readers. There are only two places in which its behavior relied on access to a byte slice was in the implementation of `de::MapAccess`. The first case is `map` deserialization. `bcs` needs access to the serialized representation of map keys in order to enforce canonicity. When deserializing from a slice, this is straightforward - the implementation can simply deserialize a map key, and then "look back" at the original input slice to determine its serialized representation. When deserializing from a reader, however, this kind of rewinding is not generally possible. To solve this problem, I introduce a the `TeeReader` struct, which wraps a `Read`er and optionally copies its bytes into a `capture_buffer` for later retrieval. Then, I simply implement `BcsReader` for `Deserializer<TeeReader<...>>`. The second case is the `end` method, which checks that all input bytes have been consumed. To handle this case, I simply attempt to read one extra byte from the `Read`er after deserialization and assert that an EOF error is returned from the underlying `Read`er. ** Testing ** All existing unit tests except for `zero_copy_parse` have been modified to test that deserialization `from_bytes` and `from_reader` yield the same output. That test is not applicable since readers are not capable of zero-copy deserialization.
Thanks @preston-evans98 for the changes. Hopefully, @bmwill and I will find time for a proper code review soon. |
It looks like CI failed initially due to an outdated clippy |
Hmm, looks like the MSRV failing due to unrelated changes. Are you opposed to checking in a |
@preston-evans98 Thanks for investigating. I fixed the CI on the main branch. Can you rebase your PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank for the new feature. This is looking pretty good. Mostly style comments. I found an issue with a corner case and proposed a tentative fix (see other comment).
src/lib.rs
Outdated
pub use de::{from_bytes, from_bytes_seed, from_bytes_seed_with_limit, from_bytes_with_limit}; | ||
pub use de::{ | ||
from_bytes, from_bytes_seed, from_bytes_seed_with_limit, from_bytes_with_limit, from_reader, | ||
from_reader_seed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't suppose anything is preventing from_reader_seed_with_limit
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I can see, no. Will add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added from_reader_with_limit
for completeness.
src/de.rs
Outdated
} | ||
|
||
impl<'de, R> TeeReader<'de, R> { | ||
/// Wrapse the provided reader in a new [`TeeReader`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Wraps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing.
src/de.rs
Outdated
&mut self, | ||
seed: K, | ||
) -> Result<(K::Value, Self::MaybeBorrowedBytes), Error> { | ||
self.input.capture_buffer = Some(Vec::new()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, this code won't work if a map is itself keyed by a map like BTreeMap<BTreeMap<u8, u8>, u32>
(which does compile, see https://gist.github.com/rust-play/4f4819775a567501e2dfd8e41699bbce )
We could easily return a error, say UnsupportedMapInKeyposition
in the case where self.input.capture_buffer.is_some()
before line 298, but I believe that there is a simple fix:
- Make
capture_buffer
a vector of bytes. Let's rename it in the process, saycaptured_keys: Vec<Vec<u8>>
- Add an empty vector at the end to start recording a new key.
- When the current key is finished, pop the vector to take the bytes, but do not forget to also append the bytes at the end of the previous (still ongoing) recorded key, if any.
I made a branch pursuing this idea here: #7
@preston-evans98 I didn't write a test so I could be missing something. Your help with finalizing this PR is still much appreciated! Can you review and pick the suggested changes, add a unit test for BTreeMap<BTreeMap<u8, u8>, u32>
, then update this PR?
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good catch! Here's a quick test case to detect the issue:
#[test]
fn test_nested_map() {
use std::collections::BTreeMap as Map;
let mut m = Map::new();
m.insert(Map::from_iter([(1u8, 0u8); 5]), 3);
let bytes = to_bytes(&m).unwrap();
assert_eq!(from_bytes(&bytes).as_ref(), Ok(&m));
assert_eq!(from_bytes_via_reader(&bytes), Ok(m));
}
Working on a fix now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your changes look good to me! I also tested that they worked with an additional layer of nesting. Pushing the changes for review 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this! There's been times in the past where i've wanted this but haven't ever had the time to implement it myself so thank you :)
@preston-evans98 @bmwill This was just released in bcs v0.1.6 |
This Pull Request implements BCS deserialization from implementers of
std::io::Read
.Strategy
The PR aims to maintain a minimal diff over the previous implementation. To that end, the following changes have been made.
Deserializer
generic over an inner typeR
BcsReader
, which implements BCS in terms of a few primitive operations. Implementserde::Deserialize<'de> for Deserializer<R> where Deserializer<R>: BcsReader<'de>
BcsReader<'de>
forDeserializer<&'de [u8]>
using existing codeBcsReader
generically for (a version of)Deserializer<R: Read>
.Implementing
BcsReader
genericallyThe existing implementation is very close to supporting deserialization from readers. There are only two places in which its behavior relied on access to a byte slice was in the implementation of
de::MapAccess
.The first case is
map
deserialization.bcs
needs access to the serialized representation of map keys in order to enforce canonicity. When deserializing from a slice, this is straightforward - the implementation can simply deserialize a map key, and then "look back" at the original input slice to determine its serialized representation. When deserializing from a reader, however, this kind of rewinding is not generally possible. To solve this problem, I introduce a theTeeReader
struct, which wraps aRead
er and optionally copies its bytes into acapture_buffer
for later retrieval. Then, I simply implementBcsReader
forDeserializer<TeeReader<...>>
.The second case is the
end
method, which checks that all input bytes have been consumed. To handle this case, I simply attempt to read one extra byte from theRead
er after deserialization and assert that an EOF error is returned from the underlyingRead
er.Testing
All existing unit tests except for
zero_copy_parse
have been modified to test that deserializationfrom_bytes
andfrom_reader
yield the same output. That test is not applicable since readers are not capable of zero-copy deserialization.