Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoiding unnecessary memory allocations in visitors #135

Open
ibokuri opened this issue Sep 10, 2023 · 8 comments
Open

Avoiding unnecessary memory allocations in visitors #135

ibokuri opened this issue Sep 10, 2023 · 8 comments
Assignees
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@ibokuri
Copy link
Contributor

ibokuri commented Sep 10, 2023

Problem

There is no way for a visitor to know if a pointer value it has received is:

  1. Allocated.
  2. Safe to use as part of the return value.

Thus, visitors are forced to play it safe and always make copies, which can result in unnecessary allocations.

Proposal

To fix this, the following things need to be added to Getty:

  1. A way for visitors to know if the pointer value they received from a deserializer is safe to use as part of their return value.

    • There are two ways for a visitor to receive a pointer value from a deserializer: the value parameter in visitString and the return value of access methods (e.g., nextKeySeed, nextElementSeed).
  2. A way for deserializers to know if visitString is using the slice as part of the final value, and how much of that slice is being used.

Part One: The Visitor

How can visitors know if the pointer value they received from a deserializer is safe to use as part of their return value?


To solve this, we can do the following:

  1. Define the following type:

    ⚠️ Edit: See this comment for new Lifetime design. ⚠️

    pub const Lifetime = enum {
        Stack,
        Heap,
        Owned,
    }
    • The type will indicate the lifetime and ownership properties of pointer values passed to visitors:

      1. Stack: The value lives on the stack and its lifetime is shorter than the deserialization process.
        • The value must be copied by the visitor.
      2. Heap: The value lives on the heap and its lifetime is longer than the deserialization process and is independent of any entity.
        • The value can either be copied or returned directly.
      3. Owned: The value lives on the stack or heap and its lifetime is managed by some entity.
        • The value can either be copied by the visitor or returned directly if the visitor understands and deems the value's lifetime as safe.
        • Since Getty's default visitors won't have enough info to determine whether an Owned value's lifetime is safe, they must always copy such values.
    • When should visitors free the pointer values they receive?

      1. Stack or Owned values should never be freed by the visitor.
        • Stack values will be automatically cleaned up by the compiler, obviously.
        • Owned values will be cleaned up eventually after deserialization is finished by the entity that owns them.
      2. Heap values passed to visitString should never be freed by the visitor. This is b/c the value is a Getty value and so the deserializer is responsible for freeing it.
      3. Heap values returned from an access method should be freed by the visitor upon error or if it's not part of the final value. The deserializer will never see these values again, so it's the visitor's responsibility to free them.
  2. Add a lifetime parameter to visitString that specifies the Lifetime of input.

  3. Remove the is*Allocated methods from access interfaces. With Lifetime, we don't need them anymore.

  4. Modify the successful return type of access methods to be:

    struct {
        data: @TypeOf(seed).Value, // This may be optional, depending on the access method.
        lifetime: Lifetime,
    }

With these changes, visitors can do the following:

// in visitString...
switch (lifetime) {
  .Stack => // Make a copy of `input`
  .Owned, .Heap => // Make a copy of `input` or return it directly
}

// in visitMap...
while (try map.nextKey(ally, Key)) |key| {
    switch (key.lifetime) {
      .Stack => // Make a copy of `key.data`
      .Owned => // Make a copy of `key.data` or return it directly
      .Heap  => // Make a copy of `key.data` or return it directly & free it as necessary
    }
}

Part Two: The Deserializer

How does a deserializer know if visitString is using the slice as part of the final value, and how much of that slice is being used?


Before diving in, there are a few things to keep in mind:

  1. Access methods are irrelevant for this part. Deserializers will never see them again so no need to worry about them.
  2. This part only apply to Heap values.
    • Stack values are obviously managed automatically by the compiler.
    • Owned values are managed outside the deserialization process, so functions like deserializeString don't need to worry about them.
  3. The return value of visitString might not be a string at all, so we shouldn't rely solely on visitString's return value. Besides, even if it is a string it'll be very tedious using it in the deserializer to figure out what to free and what to keep.

In any case, to solve this, we can do the following:

⚠️ Edit: See this comment for new solution. ⚠️

  1. Change the return type of visitString to the following:

    const Return = struct {
        value: Value,
        indices: ?struct { start: usize, end: usize } = null,
    };
    
    fn visitString(
        self: Self,
        ally: ?std.mem.Allocator,
        comptime Deserializer: type,
        input: anytype,
    ) Deserializer.Error!Return
    • If indices is null, then that means visitString did not use input as part of its return value. In which case, the deserializer should free value afterwards.
    • If indices is not null, then that means visitString did use input as part of its return value. start and end specifies the starting and ending indices in input at which visitString's return value begins and ends.
  2. With this new indices field, the deserializer now knows 1) if the visitor is using input directly in its return value, and 2) how much of input is being used.

    • If the entirety of input is being used, then the deserializer should not free input after calling visitString.
    • If only a partial slice of input is being used, then the deserializer can use start and end to determine the remaining parts of input that should be freed.
@ibokuri ibokuri added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 10, 2023
@ibokuri ibokuri added this to the 0.5.0 milestone Sep 10, 2023
@ibokuri ibokuri self-assigned this Sep 10, 2023
@ibokuri ibokuri added the accepted This proposal is planned. label Sep 10, 2023
@ibokuri
Copy link
Contributor Author

ibokuri commented Sep 13, 2023

New Lifetimes

After a bit of thought, I've come to the following design for the Lifetime declaration.

For Visitor Methods: StringLifetime

  • Static: The value lives for the lifetime of the program.
  • Stack: The value lives on the stack.
  • Heap: The value lives on the heap.
  • Owned: The value lives on the stack or heap and is managed by some entity.

For Access Methods: ValueLifetime

  • Static: The value lives for the lifetime of the program.
  • Heap: The value lives on the heap.
  • Owned: The value lives on the stack or heap and is managed by some entity.

Visitor Methods

Usage

  • Stack values must be copied.
  • Static and Heap values can be copied or used directly.
  • Owned values must be copied unless the visitor knows that the value's lifetime is safe, in which case it may be used directly.

Deallocation

  • Static and Stack values never need to be deallocated.
  • In the visitor, Heap values never have to be deallocated. In the deserializer, Heap values must be deallocated if the visitor didn't use it.
  • In the visitor, Owned values never have to be deallocated. In the deserializer, Owned values can:
    • Be leaked if the visitor used it.
    • Be deallocated immediately if the visitor did not use it and the deserializer manages it.
    • Be leaked if the visitor did not use it and either the deserializer manages it but wants to defer cleanup until later or some other entity manages the value.

Access Methods

Usage

Same as "Visitor Methods - Usage".

Deallocation

Same as "Visitor Methods - Deallocation" except that in the visitor, Heap values must be deallocated upon error or if the value is not returned.


Edit: Static messes up getty.de.free, so I'm removing it from the proposal. How often does one return a static string directly for deserialization anyways...

@ibokuri
Copy link
Contributor Author

ibokuri commented Sep 13, 2023

New Indices

One problem with the current solution for the deserializer portion of the issue is that it only enables visitors to use a single, contiguous slice from visitString's input. If the visitor wants to use non-contiguous portions of input as part of its final value, there needs to be a way for the visitor to specify the starting and ending positions of each slice it uses.

To fix this, we can modify the return type of visitString to become the following:

// The starting and ending positions of a string.
pub const Range = struct {
    start: usize,
    end: usize,
};

// Indicates to the deserializer which parts of `input` were used directly by
// the visitor.
//
// If a visitor uses the entirety of `input` as part of its final value, then the
// `Single` variant can be used, which doesn't require any extra allocations.
// Otherwise, the visitor will need to allocate a slice for `Multiple`, which the
// deserializer will clean up afterwards.
pub const Used = union(enum) {
    Single: Range,
    Multiple: []const Range,
};

// The new return type of `visitString`.
//
// If `used` is `null`, then the visitor did not use `input` as part of its final
// return value.
pub fn Return(comptime T: type) type {
    return struct {
        value: T,
        used: ?Used = null,
    };
}

For example, suppose a user wants to deserialize "John Doe" into the type struct { first: []u8, last: []u8 }, where first is John and last is Doe. As it is now, they'd be force to either make a copy of both John and Doe, or use one of the names directly and copy the other. However, with this new return type, they can assign input[0..4] to first and input[5..] to last, allocate a slice for the used field in the return value, populate it appropriately, and they're done!

@ibokuri
Copy link
Contributor Author

ibokuri commented Sep 16, 2023

As a sort of sanity check (and to give me a bit of motivation to work on this), I did some very simple deserialization benchmarking with some JSON data that had lots of strings. I deserialized the input data into a struct 100,000 times with both Getty JSON and std.json, and as expected Getty JSON was much slower.

const std = @import("std");
const json = @import("json");

const c_ally = std.heap.c_allocator;

const T = []struct {
    id: []const u8,
    type: []const u8,
    name: []const u8,
    ppu: f64,
    batters: struct {
        batter: []struct {
            id: []const u8,
            type: []const u8,
        },
    },
    topping: []struct {
        id: []const u8,
        type: []const u8,
    },
};

const input =
    \\[
    \\    {
    \\        "id": "0001",
    \\        "type": "donut",
    \\        "name": "Cake",
    \\        "ppu": 0.55,
    \\        "batters": {
    \\            "batter": [
    \\                {
    \\                    "id": "1001",
    \\                    "type": "Regular"
    \\                },
    \\                {
    \\                    "id": "1002",
    \\                    "type": "Chocolate"
    \\                },
    \\                {
    \\                    "id": "1003",
    \\                    "type": "Blueberry"
    \\                },
    \\                {
    \\                    "id": "1004",
    \\                    "type": "Devil's Food"
    \\                }
    \\            ]
    \\        },
    \\        "topping": [
    \\            {
    \\                "id": "5001",
    \\                "type": "None"
    \\            },
    \\            {
    \\                "id": "5002",
    \\                "type": "Glazed"
    \\            },
    \\            {
    \\                "id": "5005",
    \\                "type": "Sugar"
    \\            },
    \\            {
    \\                "id": "5007",
    \\                "type": "Powdered Sugar"
    \\            },
    \\            {
    \\                "id": "5006",
    \\                "type": "Chocolate with Sprinkles"
    \\            },
    \\            {
    \\                "id": "5003",
    \\                "type": "Chocolate"
    \\            },
    \\            {
    \\                "id": "5004",
    \\                "type": "Maple"
    \\            }
    \\        ]
    \\    },
    \\    {
    \\        "id": "0002",
    \\        "type": "donut",
    \\        "name": "Raised",
    \\        "ppu": 0.55,
    \\        "batters": {
    \\            "batter": [
    \\                {
    \\                    "id": "1001",
    \\                    "type": "Regular"
    \\                }
    \\            ]
    \\        },
    \\        "topping": [
    \\            {
    \\                "id": "5001",
    \\                "type": "None"
    \\            },
    \\            {
    \\                "id": "5002",
    \\                "type": "Glazed"
    \\            },
    \\            {
    \\                "id": "5005",
    \\                "type": "Sugar"
    \\            },
    \\            {
    \\                "id": "5003",
    \\                "type": "Chocolate"
    \\            },
    \\            {
    \\                "id": "5004",
    \\                "type": "Maple"
    \\            }
    \\        ]
    \\    },
    \\    {
    \\        "id": "0003",
    \\        "type": "donut",
    \\        "name": "Old Fashioned",
    \\        "ppu": 0.55,
    \\        "batters": {
    \\            "batter": [
    \\                {
    \\                    "id": "1001",
    \\                    "type": "Regular"
    \\                },
    \\                {
    \\                    "id": "1002",
    \\                    "type": "Chocolate"
    \\                }
    \\            ]
    \\        },
    \\        "topping": [
    \\            {
    \\                "id": "5001",
    \\                "type": "None"
    \\            },
    \\            {
    \\                "id": "5002",
    \\                "type": "Glazed"
    \\            },
    \\            {
    \\                "id": "5003",
    \\                "type": "Chocolate"
    \\            },
    \\            {
    \\                "id": "5004",
    \\                "type": "Maple"
    \\            }
    \\        ]
    \\    }
    \\]
;

fn stdJson() !void {
    for (0..100_000) |_| {
        const output = try std.json.parseFromSlice(T, c_ally, input, .{});
        defer output.deinit();
    }
}

fn gettyJson() !void {
    for (0..100_000) |_| {
        const output = try json.fromSlice(c_ally, T, input);
        defer json.de.free(c_ally, output, null);
    }
}

fn gettyJsonArena() !void {
    for (0..100_000) |_| {
        var arena = std.heap.ArenaAllocator.init(c_ally);
        const arena_ally = arena.allocator();
        defer arena.deinit();

        _ = try json.fromSlice(arena_ally, T, input);
    }
}

pub fn main() !void {
    //try gettyJson();
    //try gettyJsonArena();
    //try stdJson();
}
$ hyperfine --warmup 5 ./getty ./getty-arena ./std
Benchmark 1: ./getty
  Time (mean ± σ):     718.4 ms ±   3.7 ms    [User: 713.4 ms, System: 3.5 ms]
  Range (minmax):   715.6 ms727.9 ms    10 runs

Benchmark 2: ./getty-arena
  Time (mean ± σ):     628.7 ms ±   1.5 ms    [User: 622.5 ms, System: 4.6 ms]
  Range (minmax):   626.6 ms631.3 ms    10 runs

Benchmark 3: ./std
  Time (mean ± σ):     482.7 ms ±   1.5 ms    [User: 476.2 ms, System: 4.9 ms]
  Range (minmax):   481.2 ms486.4 ms    10 runs

Summary
  ./std ran
    1.30 ± 0.01 times faster than ./getty-arena
    1.49 ± 0.01 times faster than ./getty

The unnecessary allocations are, surely, the main factor for Getty's slowness. However, I should note that std.json.parseFromSlice uses an arena allocator implicitly whereas json.fromSlice does not, and you can see from the results that using an arena does indeed make a difference.

With or without an arena, though, Getty's still much slower. So it's time to get started on this issue!

@ibokuri ibokuri removed the accepted This proposal is planned. label Sep 17, 2023
@ibokuri
Copy link
Contributor Author

ibokuri commented Sep 17, 2023

Removed accepted label for now since, as pointed out by fredi, there are major issues with implementing this kind of thing, which I've unfortunately had the chance to run into in my own branch.

For one thing, the multiple ranges idea was a total non-starter. I think my brain farted when I came up with that. The input for visitString is always a slice, which is a single allocation. Ya can't just free individual pieces of a single allocation. So now we have no way of letting visitors use only parts of input.

Another issue is getty.de.free. How will it be able to know if the strings of a value, especially if they have different lifetimes, should be freed or not?

@ibokuri ibokuri mentioned this issue Sep 21, 2023
@ibokuri
Copy link
Contributor Author

ibokuri commented Sep 21, 2023

Before I can implement the lifetime optimizations, I had to do a bit of general allocation work beforehand. The above, merged PR implements that allocation work.

In summary, Getty now uses an arena internally for all allocations. This simplifies visitors and deserialization blocks as they no longer have to worry about freeing values and allows end users to free everything whenever they want. Additionally, the arena is passed to the methods of Deserializer implementations so they're simplified a tad as well.

Big thanks to fredi for discussing all this with me and steering me in the right direction :D

@ibokuri
Copy link
Contributor Author

ibokuri commented Sep 21, 2023

The last half consists of the lifetime work, which consists of two parts:

  • Add StringLifetime and ValueLifetime types.
  • Update visitString to return not only the produced value but also an indicator of whether or not the input string was used directly in the final, returned value.

Lifetimes

The lifetime types will be more or less the same as what I've already proposed:

  • For visitString, StringLifetime will have the following variants:

    • Stack: The value is on the stack and its lifetime is shorter than the deserialization process.
      • These values must be copied.
    • Heap: The value is on the heap and its lifetime is tied to the arena allocator returned to the end user.
      • These values can be copied or used directly (Getty's default behavior will be to use them directly).
      • If there's an error or the value is copied, do not free the value in the visitor. Either the deserializer will free it or it will be freed by the arena at the end.
    • Managed: The value is on the stack or heap and its lifetime is managed by an entity that is not the arena allocator returned to the end user.
      • These values should be copied, but can be used directly if the user knows that the value's lifetime is safe or acceptable.
      • If there's an error or the value is copied, do not free the value in the visitor. The managing entity will free it later on.
  • For access methods (e.g., nextKeySeed), ValueLifetime will have the following variants:

    • Heap: The value is on the heap and its lifetime is tied to the arena allocator returned to the end user.
      • These values can be copied or used directly (Getty's default behavior will be to use them directly).
      • If there's an error or the value is copied, free the value in the visitor if you want. It's part of the arena returned to the user so you could also just leak it and it'll be cleaned up at the end.
    • Managed: The value is on the stack or heap and its lifetime is managed by an entity that is not the arena allocator returned to the end user.
      • These values should be copied, but can be used directly if the user knows that the value's lifetime is safe or acceptable.
      • If there's an error or the value is copied, do not free the value in the visitor. The managing entity will free it later on.

visitString's Return Type

Before, I proposed returning from visitString the produced value and a slice indicating what part of the input parameter was used.

With the arena, the slice is no longer necessary. A simple bool will suffice, where true means that input was used directly.

@ibokuri
Copy link
Contributor Author

ibokuri commented Sep 21, 2023

Performance updates after arena changes (using same benchmarking code):

$ hyperfine --warmup 5 ./getty ./std
Benchmark 1: ./getty
  Time (mean ± σ):     688.8 ms ±   1.9 ms    [User: 684.7 ms, System: 3.0 ms]
  Range (minmax):   685.8 ms692.3 ms    10 runs

Benchmark 2: ./std
  Time (mean ± σ):     484.0 ms ±   2.7 ms    [User: 480.7 ms, System: 2.2 ms]
  Range (minmax):   481.3 ms489.1 ms    10 runs

Summary
  ./std ran
    1.42 ± 0.01 times faster than ./getty

Slightly slower than getty-arena was, but faster than the old getty version. Since there's no lifetimes right now, Getty doesn't free anything at all, including struct keys. Perhaps keeping around all that cruft all the time is slowing things down?

@ibokuri
Copy link
Contributor Author

ibokuri commented Sep 22, 2023

Performance update after some optimizations in getty-json (no peeking, always allocating strings, heap branch first).

Note that std's runtime has increased overall b/c we're now correctly passing in .alloc_always. Beforehand, it was returning a slice into the scanner's input which, if we hadn't used a static string input, would be completely invalid.

$ hyperfine --warmup 5 ./getty ./std
Benchmark 1: ./getty
  Time (mean ± σ):     678.8 ms ±   1.3 ms    [User: 675.0 ms, System: 2.9 ms]
  Range (minmax):   676.3 ms680.7 ms    10 runs

Benchmark 2: ./std
  Time (mean ± σ):     573.4 ms ±   2.8 ms    [User: 567.5 ms, System: 4.6 ms]
  Range (minmax):   569.5 ms578.8 ms    10 runs

Summary
  ./std ran
    1.18 ± 0.01 times faster than ./getty

We shaved off around 10ms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

1 participant