Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialization using serde and bincode as well as macros #667

Closed
wants to merge 5 commits into from

Conversation

ChHecker
Copy link

As described in #666, I added an automatic serialization using serde and bincode.

The simple option is the macro be_value_serialize! after activating the feature serialize. This, however, does not feature the same level of comfort as the derive macro in the feature serialize-derive. Derive macros require a second crate, though, so I create it. You would probably have to add it to crates.io yourself.

@ChHecker
Copy link
Author

I automatically use None in fixed_width(). This obviously doesn't work for types like Vec.
I added a macro #[fixed_width] that replaces this with Some(std::mem::size_of::<T>()). Should a variable width or a fixed width be the default? In any case, what should the macro be named?

@cberner
Copy link
Owner

cberner commented Sep 2, 2023

Types should be fixed width if all values of that type can be serialized to a slice of the same width.

More broadly though, I'll need to think a bit about whether allowing people to derive this is a good idea. A couple concerns I have: changing the name of a struct that uses this macro will now be a breaking change, due to the stringify use. And adding or removing a field will be also

@dpc
Copy link

dpc commented Sep 4, 2023

So width should return None? For a convenience functionality like this I think it's the safest route. A type_name should be provided by an argument to macro? What are these type names for anyway?

I don't think redb should be treating bincode in any special way. serialize and deserialize from bincode should not be re-exported as some default, I think.

log = {version = "0.4.17", optional = true }
pyo3 = {version = "0.19.0", features=["extension-module", "abi3-py37"], optional = true }
log = { version = "0.4.17", optional = true }
pyo3 = { version = "0.19.0", features = [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

@cberner
Copy link
Owner

cberner commented Sep 4, 2023

So width should return None? For a convenience functionality like this I think it's the safest route. A type_name should be provided by an argument to macro? What are these type names for anyway?

I don't think redb should be treating bincode in any special way. serialize and deserialize from bincode should not be re-exported as some default, I think.

The type names are for type safety. If you try to open a table using a different key or value type that that with which it was created, it will fail

@dpc
Copy link

dpc commented Sep 5, 2023

I see. Hmmm...

What if ... I lie? Change the type without changing the name?

Does it technically need to be unsafe as it must uphold a contract or otherwise risk undefined behavior?

E.g store a u8 key with a value 0, then change the type to NonZeroU8 yet keep the name the same?

@ChHecker
Copy link
Author

ChHecker commented Sep 5, 2023

So width should return None? For a convenience functionality like this I think it's the safest route. A type_name should be provided by an argument to macro? What are these type names for anyway?

width is currently set to None unless a macro argument is added. The type name by argument is a good idea. I'm new to procedural macros, but I can look into it.

I don't think redb should be treating bincode in any special way. serialize and deserialize from bincode should not be re-exported as some default, I think.

I just added that because the procedural macro inserts the literal code, so without it, the user has to import bincode manually. I just thought it inconvenient, but I see your point.

@dpc
Copy link

dpc commented Sep 5, 2023

BTW. I have went with the same path in my code, and I wonder if the macros should be more generic, serde ones (with = ..., serialize_with = , deserialize_with = ) instead of coming up with a macro for every supported serde-backend.

My code:

impl RedbValue for ItemValue {
    type SelfType<'a> = ItemValue;

    type AsBytes<'a> = Vec<u8>;

    fn fixed_width() -> Option<usize> {
        None
    }

    fn from_bytes<'a>(data: &'a [u8]) -> Self::SelfType<'a>
    where
        Self: 'a,
    {
        bincode::deserialize(data).expect("bincode deserialization error")
    }

    fn as_bytes<'a, 'b: 'a>(value: &'a Self::SelfType<'b>) -> Self::AsBytes<'a>
    where
        Self: 'a,
        Self: 'b,
    {
        bincode::serialize(value).expect("bincode serialization error")
    }

    fn type_name() -> redb::TypeName {
        redb::TypeName::new("item-data")
    }
}

could be just:

#[derive(RedbValue(serialize_with=bincode::serialize, deserialize_with=bincode::deserialize, name="item-value"))]
struct ItemValue { /* ... */ }

So far I have 2 structs using bincode, and two newtypes (that just delegate to inner-value), so I was planning to macro_rules-them as it's easy. But derive macro would be even better.

@ChHecker
Copy link
Author

ChHecker commented Sep 5, 2023

So the arguments to derive macros the way you wrote them are apparently not possible. That's why I added those additional arguments like

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize, RedbValue)]
#[fixed_width]
#[type_name("test-type")]
struct Test(usize);

I'd love if someone could prove me otherwise, as I'm worried if too many arguments would get crowded. Especially if I added serialize_with arguments as you mentioned.

@dpc
Copy link

dpc commented Sep 5, 2023

serde does it similiarly so you're probably right.

Though like serde I'd do #[redb(fixed_with)], etc. instead of "naked" versions.

@dpc
Copy link

dpc commented Sep 5, 2023

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize, RedbValue)]
#[redb(fixed_width, type_name = "test-type", with="bincode")]
struct Test(usize);

Is the destiny. ;)

type AsBytes<'a> = Vec<u8> where Self: 'a;

fn fixed_width() -> Option<usize> {
Some(std::mem::size_of::<$t>())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. Does bincode::serialize guarantee that the result is the same length as size_of?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, and the v2 RCs of bincode don't even have support for determining what the size of the value would be after serialization.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Well types that return Some from fixed_width() need to return a slice of the same length from as_bytes()

Copy link
Owner

@cberner cberner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one concern I have with this is that users who use this macro may expect it to handle schema changes for them, and I'm guessing it doesn't.
The existing types in redb are safe against this because the tuple type encodes all the fields in the TypeName. Is there a way to do with these proc macros, so that all the types of the fields are included?

The case I have in mind is. Someone writes code like:

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize, RedbValue)]
#[fixed_width]
#[type_name("test-type")]
struct Test(usize);

and then changes it to: struct Test(isize). This should cause an error at runtime, when trying to read the previously stored data.

@dpc
Copy link

dpc commented Oct 13, 2023

So one concern I have with this is that users who use this macro may expect it to handle schema changes for them, and I'm guessing it doesn't.

That is not something I would personally expect. Also not something I would necessarily want to waste bytes on.

One application I immediately see for redb is involving billions of tiny records, where every byte adds up.

@cberner
Copy link
Owner

cberner commented Oct 13, 2023 via email

}

fn type_name() -> redb::TypeName {
redb::TypeName::new(#type_name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never written a proc macro before. Is it possible to change this to include the types of all the fields in the struct? For example, the way I did it for tuples. This would resolve my concern about users changing the type of a field and then getting corrupted data instead of an error. Something like #type_name {#field_1_type, #field_2_type, #field_3_type...}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely possible.

BTW. This is stored only once per table, right? I guess my worries were needless if so.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, once per table. And it's only checked once when the table is opened, so shouldn't add much overhead if it's long

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case a must have.

@cberner
Copy link
Owner

cberner commented Oct 13, 2023

I left some comments. Also, this needs tests :)

@ChHecker
Copy link
Author

I'm going to be honest, it seems like I vastly underestimated the implications of this PR. I'd love for these changes to somehow be implemented, but I neither have the time nor the required skill level. Therefore, I will close this PR for now.
Sorry if I wasted your time, I'm fairly new to contributing to open source code.

@ChHecker ChHecker closed this Oct 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants