Add serialization using serde and bincode as well as macros #667

ChHecker · 2023-08-29T20:12:39Z

As described in #666, I added an automatic serialization using serde and bincode.

The simple option is the macro be_value_serialize! after activating the feature serialize. This, however, does not feature the same level of comfort as the derive macro in the feature serialize-derive. Derive macros require a second crate, though, so I create it. You would probably have to add it to crates.io yourself.

ChHecker · 2023-08-31T10:42:08Z

I automatically use None in fixed_width(). This obviously doesn't work for types like Vec.
I added a macro #[fixed_width] that replaces this with Some(std::mem::size_of::<T>()). Should a variable width or a fixed width be the default? In any case, what should the macro be named?

cberner · 2023-09-02T20:29:22Z

Types should be fixed width if all values of that type can be serialized to a slice of the same width.

More broadly though, I'll need to think a bit about whether allowing people to derive this is a good idea. A couple concerns I have: changing the name of a struct that uses this macro will now be a breaking change, due to the stringify use. And adding or removing a field will be also

dpc · 2023-09-04T04:04:19Z

So width should return None? For a convenience functionality like this I think it's the safest route. A type_name should be provided by an argument to macro? What are these type names for anyway?

I don't think redb should be treating bincode in any special way. serialize and deserialize from bincode should not be re-exported as some default, I think.

dpc · 2023-09-04T04:04:49Z

Cargo.toml

-log = {version = "0.4.17", optional = true }
-pyo3 = {version = "0.19.0", features=["extension-module", "abi3-py37"], optional = true }
+log = { version = "0.4.17", optional = true }
+pyo3 = { version = "0.19.0", features = [


cberner · 2023-09-04T17:54:01Z

So width should return None? For a convenience functionality like this I think it's the safest route. A type_name should be provided by an argument to macro? What are these type names for anyway?

I don't think redb should be treating bincode in any special way. serialize and deserialize from bincode should not be re-exported as some default, I think.

The type names are for type safety. If you try to open a table using a different key or value type that that with which it was created, it will fail

dpc · 2023-09-05T05:50:35Z

I see. Hmmm...

What if ... I lie? Change the type without changing the name?

Does it technically need to be unsafe as it must uphold a contract or otherwise risk undefined behavior?

E.g store a u8 key with a value 0, then change the type to NonZeroU8 yet keep the name the same?

ChHecker · 2023-09-05T06:09:44Z

So width should return None? For a convenience functionality like this I think it's the safest route. A type_name should be provided by an argument to macro? What are these type names for anyway?

width is currently set to None unless a macro argument is added. The type name by argument is a good idea. I'm new to procedural macros, but I can look into it.

I don't think redb should be treating bincode in any special way. serialize and deserialize from bincode should not be re-exported as some default, I think.

I just added that because the procedural macro inserts the literal code, so without it, the user has to import bincode manually. I just thought it inconvenient, but I see your point.

dpc · 2023-09-05T06:40:14Z

BTW. I have went with the same path in my code, and I wonder if the macros should be more generic, serde ones (with = ..., serialize_with = , deserialize_with = ) instead of coming up with a macro for every supported serde-backend.

My code:

impl RedbValue for ItemValue {
    type SelfType<'a> = ItemValue;

    type AsBytes<'a> = Vec<u8>;

    fn fixed_width() -> Option<usize> {
        None
    }

    fn from_bytes<'a>(data: &'a [u8]) -> Self::SelfType<'a>
    where
        Self: 'a,
    {
        bincode::deserialize(data).expect("bincode deserialization error")
    }

    fn as_bytes<'a, 'b: 'a>(value: &'a Self::SelfType<'b>) -> Self::AsBytes<'a>
    where
        Self: 'a,
        Self: 'b,
    {
        bincode::serialize(value).expect("bincode serialization error")
    }

    fn type_name() -> redb::TypeName {
        redb::TypeName::new("item-data")
    }
}

could be just:

#[derive(RedbValue(serialize_with=bincode::serialize, deserialize_with=bincode::deserialize, name="item-value"))]
struct ItemValue { /* ... */ }

So far I have 2 structs using bincode, and two newtypes (that just delegate to inner-value), so I was planning to macro_rules-them as it's easy. But derive macro would be even better.

ChHecker · 2023-09-05T06:51:43Z

So the arguments to derive macros the way you wrote them are apparently not possible. That's why I added those additional arguments like

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize, RedbValue)]
#[fixed_width]
#[type_name("test-type")]
struct Test(usize);

I'd love if someone could prove me otherwise, as I'm worried if too many arguments would get crowded. Especially if I added serialize_with arguments as you mentioned.

dpc · 2023-09-05T06:59:24Z

serde does it similiarly so you're probably right.

Though like serde I'd do #[redb(fixed_with)], etc. instead of "naked" versions.

dpc · 2023-09-05T07:00:41Z

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize, RedbValue)]
#[redb(fixed_width, type_name = "test-type", with="bincode")]
struct Test(usize);

Is the destiny. ;)

cberner · 2023-10-13T00:21:50Z

src/types.rs

+            type AsBytes<'a> = Vec<u8> where Self: 'a;
+
+            fn fixed_width() -> Option<usize> {
+                Some(std::mem::size_of::<$t>())


This doesn't seem right. Does bincode::serialize guarantee that the result is the same length as size_of?

Nope, and the v2 RCs of bincode don't even have support for determining what the size of the value would be after serialization.

I see. Well types that return Some from fixed_width() need to return a slice of the same length from as_bytes()

cberner

So one concern I have with this is that users who use this macro may expect it to handle schema changes for them, and I'm guessing it doesn't.
The existing types in redb are safe against this because the tuple type encodes all the fields in the TypeName. Is there a way to do with these proc macros, so that all the types of the fields are included?

The case I have in mind is. Someone writes code like:

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize, RedbValue)]
#[fixed_width]
#[type_name("test-type")]
struct Test(usize);

and then changes it to: struct Test(isize). This should cause an error at runtime, when trying to read the previously stored data.

dpc · 2023-10-13T02:08:58Z

So one concern I have with this is that users who use this macro may expect it to handle schema changes for them, and I'm guessing it doesn't.

That is not something I would personally expect. Also not something I would necessarily want to waste bytes on.

One application I immediately see for redb is involving billions of tiny records, where every byte adds up.

cberner · 2023-10-13T02:50:08Z

Ah, to clarify, I don't think it should support schema changes. But I do think there should be an error if the user tries to change the schema

…

Sent from my phone

On Thu, Oct 12, 2023, 7:09 PM Dawid Ciężarkiewicz ***@***.***> wrote: So one concern I have with this is that users who use this macro may expect it to handle schema changes for them, and I'm guessing it doesn't. That is not something I would personally expect. Also not something I would necessarily waste bytes on. One application I immediately see for redb is involving billions of tiny records, where every byte adds up. — Reply to this email directly, view it on GitHub <#667 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGNXQBMEHKR4Z24WWRV5PDX7CPENAVCNFSM6AAAAAA4DPGYLSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRQGY2TQNRZGI> . You are receiving this because you commented.Message ID: ***@***.***>

cberner · 2023-10-13T03:35:22Z

redb-derive/src/lib.rs

+            }
+
+            fn type_name() -> redb::TypeName {
+                redb::TypeName::new(#type_name)


I've never written a proc macro before. Is it possible to change this to include the types of all the fields in the struct? For example, the way I did it for tuples. This would resolve my concern about users changing the type of a field and then getting corrupted data instead of an error. Something like #type_name {#field_1_type, #field_2_type, #field_3_type...}

Definitely possible.

BTW. This is stored only once per table, right? I guess my worries were needless if so.

Yes, once per table. And it's only checked once when the table is opened, so shouldn't add much overhead if it's long

In that case a must have.

cberner · 2023-10-13T16:57:49Z

I left some comments. Also, this needs tests :)

ChHecker · 2023-10-14T15:30:35Z

I'm going to be honest, it seems like I vastly underestimated the implications of this PR. I'd love for these changes to somehow be implemented, but I neither have the time nor the required skill level. Therefore, I will close this PR for now.
Sorry if I wasted your time, I'm fairly new to contributing to open source code.

ChHecker added 3 commits August 29, 2023 22:07

Add serialization and macros

10ebe89

Fix derive macro with generics

a6975c6

Add fixed_width macro to support types with unknown size

62ac558

ChHecker force-pushed the serialize branch from 9cc82d2 to 62ac558 Compare August 31, 2023 10:41

Use local path to redb-derive

d0bfb57

dpc reviewed Sep 4, 2023

View reviewed changes

Add type_name macro

052cdd4

cberner reviewed Oct 13, 2023

View reviewed changes

cberner requested changes Oct 13, 2023

View reviewed changes

cberner reviewed Oct 13, 2023

View reviewed changes

ChHecker closed this Oct 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add serialization using serde and bincode as well as macros #667

Add serialization using serde and bincode as well as macros #667

ChHecker commented Aug 29, 2023

ChHecker commented Aug 31, 2023

cberner commented Sep 2, 2023

dpc commented Sep 4, 2023

dpc Sep 4, 2023

cberner commented Sep 4, 2023

dpc commented Sep 5, 2023 •

edited

Loading

ChHecker commented Sep 5, 2023

dpc commented Sep 5, 2023 •

edited

Loading

ChHecker commented Sep 5, 2023

dpc commented Sep 5, 2023

dpc commented Sep 5, 2023

cberner Oct 13, 2023

brxken128 Oct 13, 2023

cberner Oct 13, 2023

cberner left a comment

dpc commented Oct 13, 2023 •

edited

Loading

cberner commented Oct 13, 2023 via email

cberner Oct 13, 2023

dpc Oct 13, 2023

cberner Oct 13, 2023

dpc Oct 13, 2023

cberner commented Oct 13, 2023

ChHecker commented Oct 14, 2023

Add serialization using serde and bincode as well as macros #667

Add serialization using serde and bincode as well as macros #667

Conversation

ChHecker commented Aug 29, 2023

ChHecker commented Aug 31, 2023

cberner commented Sep 2, 2023

dpc commented Sep 4, 2023

Choose a reason for hiding this comment

cberner commented Sep 4, 2023

dpc commented Sep 5, 2023 • edited Loading

ChHecker commented Sep 5, 2023

dpc commented Sep 5, 2023 • edited Loading

ChHecker commented Sep 5, 2023

dpc commented Sep 5, 2023

dpc commented Sep 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cberner left a comment

Choose a reason for hiding this comment

dpc commented Oct 13, 2023 • edited Loading

cberner commented Oct 13, 2023 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cberner commented Oct 13, 2023

ChHecker commented Oct 14, 2023

dpc commented Sep 5, 2023 •

edited

Loading

dpc commented Sep 5, 2023 •

edited

Loading

dpc commented Oct 13, 2023 •

edited

Loading