-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add serialization using serde and bincode as well as macros #667
Conversation
I automatically use |
Types should be fixed width if all values of that type can be serialized to a slice of the same width. More broadly though, I'll need to think a bit about whether allowing people to derive this is a good idea. A couple concerns I have: changing the name of a struct that uses this macro will now be a breaking change, due to the |
So I don't think redb should be treating bincode in any special way. |
log = {version = "0.4.17", optional = true } | ||
pyo3 = {version = "0.19.0", features=["extension-module", "abi3-py37"], optional = true } | ||
log = { version = "0.4.17", optional = true } | ||
pyo3 = { version = "0.19.0", features = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
The type names are for type safety. If you try to open a table using a different key or value type that that with which it was created, it will fail |
I see. Hmmm... What if ... I lie? Change the type without changing the name? Does it technically need to be E.g store a |
I just added that because the procedural macro inserts the literal code, so without it, the user has to import bincode manually. I just thought it inconvenient, but I see your point. |
BTW. I have went with the same path in my code, and I wonder if the macros should be more generic, My code: impl RedbValue for ItemValue {
type SelfType<'a> = ItemValue;
type AsBytes<'a> = Vec<u8>;
fn fixed_width() -> Option<usize> {
None
}
fn from_bytes<'a>(data: &'a [u8]) -> Self::SelfType<'a>
where
Self: 'a,
{
bincode::deserialize(data).expect("bincode deserialization error")
}
fn as_bytes<'a, 'b: 'a>(value: &'a Self::SelfType<'b>) -> Self::AsBytes<'a>
where
Self: 'a,
Self: 'b,
{
bincode::serialize(value).expect("bincode serialization error")
}
fn type_name() -> redb::TypeName {
redb::TypeName::new("item-data")
}
} could be just:
So far I have 2 structs using bincode, and two newtypes (that just delegate to inner-value), so I was planning to |
So the arguments to derive macros the way you wrote them are apparently not possible. That's why I added those additional arguments like #[derive(Clone, Debug, PartialEq, Deserialize, Serialize, RedbValue)]
#[fixed_width]
#[type_name("test-type")]
struct Test(usize); I'd love if someone could prove me otherwise, as I'm worried if too many arguments would get crowded. Especially if I added |
Though like serde I'd do |
#[derive(Clone, Debug, PartialEq, Deserialize, Serialize, RedbValue)]
#[redb(fixed_width, type_name = "test-type", with="bincode")]
struct Test(usize); Is the destiny. ;) |
type AsBytes<'a> = Vec<u8> where Self: 'a; | ||
|
||
fn fixed_width() -> Option<usize> { | ||
Some(std::mem::size_of::<$t>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right. Does bincode::serialize
guarantee that the result is the same length as size_of
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, and the v2 RCs of bincode
don't even have support for determining what the size of the value would be after serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Well types that return Some
from fixed_width()
need to return a slice of the same length from as_bytes()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one concern I have with this is that users who use this macro may expect it to handle schema changes for them, and I'm guessing it doesn't.
The existing types in redb are safe against this because the tuple type encodes all the fields in the TypeName
. Is there a way to do with these proc macros, so that all the types of the fields are included?
The case I have in mind is. Someone writes code like:
#[derive(Clone, Debug, PartialEq, Deserialize, Serialize, RedbValue)]
#[fixed_width]
#[type_name("test-type")]
struct Test(usize);
and then changes it to: struct Test(isize)
. This should cause an error at runtime, when trying to read the previously stored data.
That is not something I would personally expect. Also not something I would necessarily want to waste bytes on. One application I immediately see for redb is involving billions of tiny records, where every byte adds up. |
Ah, to clarify, I don't think it should support schema changes. But I do
think there should be an error if the user tries to change the schema
…Sent from my phone
On Thu, Oct 12, 2023, 7:09 PM Dawid Ciężarkiewicz ***@***.***> wrote:
So one concern I have with this is that users who use this macro may
expect it to handle schema changes for them, and I'm guessing it doesn't.
That is not something I would personally expect. Also not something I
would necessarily waste bytes on.
One application I immediately see for redb is involving billions of tiny
records, where every byte adds up.
—
Reply to this email directly, view it on GitHub
<#667 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGNXQBMEHKR4Z24WWRV5PDX7CPENAVCNFSM6AAAAAA4DPGYLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRQGY2TQNRZGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
} | ||
|
||
fn type_name() -> redb::TypeName { | ||
redb::TypeName::new(#type_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never written a proc macro before. Is it possible to change this to include the types of all the fields in the struct? For example, the way I did it for tuples. This would resolve my concern about users changing the type of a field and then getting corrupted data instead of an error. Something like #type_name {#field_1_type, #field_2_type, #field_3_type...}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely possible.
BTW. This is stored only once per table, right? I guess my worries were needless if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, once per table. And it's only checked once when the table is opened, so shouldn't add much overhead if it's long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case a must have.
I left some comments. Also, this needs tests :) |
I'm going to be honest, it seems like I vastly underestimated the implications of this PR. I'd love for these changes to somehow be implemented, but I neither have the time nor the required skill level. Therefore, I will close this PR for now. |
As described in #666, I added an automatic serialization using
serde
andbincode
.The simple option is the macro
be_value_serialize!
after activating the featureserialize
. This, however, does not feature the same level of comfort as the derive macro in the featureserialize-derive
. Derive macros require a second crate, though, so I create it. You would probably have to add it to crates.io yourself.