-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoiding unnecessary memory allocations in visitors #135
Comments
New LifetimesAfter a bit of thought, I've come to the following design for the For Visitor Methods:
|
New IndicesOne problem with the current solution for the deserializer portion of the issue is that it only enables visitors to use a single, contiguous slice from To fix this, we can modify the return type of // The starting and ending positions of a string.
pub const Range = struct {
start: usize,
end: usize,
};
// Indicates to the deserializer which parts of `input` were used directly by
// the visitor.
//
// If a visitor uses the entirety of `input` as part of its final value, then the
// `Single` variant can be used, which doesn't require any extra allocations.
// Otherwise, the visitor will need to allocate a slice for `Multiple`, which the
// deserializer will clean up afterwards.
pub const Used = union(enum) {
Single: Range,
Multiple: []const Range,
};
// The new return type of `visitString`.
//
// If `used` is `null`, then the visitor did not use `input` as part of its final
// return value.
pub fn Return(comptime T: type) type {
return struct {
value: T,
used: ?Used = null,
};
} For example, suppose a user wants to deserialize |
As a sort of sanity check (and to give me a bit of motivation to work on this), I did some very simple deserialization benchmarking with some JSON data that had lots of strings. I deserialized the input data into a struct 100,000 times with both Getty JSON and const std = @import("std");
const json = @import("json");
const c_ally = std.heap.c_allocator;
const T = []struct {
id: []const u8,
type: []const u8,
name: []const u8,
ppu: f64,
batters: struct {
batter: []struct {
id: []const u8,
type: []const u8,
},
},
topping: []struct {
id: []const u8,
type: []const u8,
},
};
const input =
\\[
\\ {
\\ "id": "0001",
\\ "type": "donut",
\\ "name": "Cake",
\\ "ppu": 0.55,
\\ "batters": {
\\ "batter": [
\\ {
\\ "id": "1001",
\\ "type": "Regular"
\\ },
\\ {
\\ "id": "1002",
\\ "type": "Chocolate"
\\ },
\\ {
\\ "id": "1003",
\\ "type": "Blueberry"
\\ },
\\ {
\\ "id": "1004",
\\ "type": "Devil's Food"
\\ }
\\ ]
\\ },
\\ "topping": [
\\ {
\\ "id": "5001",
\\ "type": "None"
\\ },
\\ {
\\ "id": "5002",
\\ "type": "Glazed"
\\ },
\\ {
\\ "id": "5005",
\\ "type": "Sugar"
\\ },
\\ {
\\ "id": "5007",
\\ "type": "Powdered Sugar"
\\ },
\\ {
\\ "id": "5006",
\\ "type": "Chocolate with Sprinkles"
\\ },
\\ {
\\ "id": "5003",
\\ "type": "Chocolate"
\\ },
\\ {
\\ "id": "5004",
\\ "type": "Maple"
\\ }
\\ ]
\\ },
\\ {
\\ "id": "0002",
\\ "type": "donut",
\\ "name": "Raised",
\\ "ppu": 0.55,
\\ "batters": {
\\ "batter": [
\\ {
\\ "id": "1001",
\\ "type": "Regular"
\\ }
\\ ]
\\ },
\\ "topping": [
\\ {
\\ "id": "5001",
\\ "type": "None"
\\ },
\\ {
\\ "id": "5002",
\\ "type": "Glazed"
\\ },
\\ {
\\ "id": "5005",
\\ "type": "Sugar"
\\ },
\\ {
\\ "id": "5003",
\\ "type": "Chocolate"
\\ },
\\ {
\\ "id": "5004",
\\ "type": "Maple"
\\ }
\\ ]
\\ },
\\ {
\\ "id": "0003",
\\ "type": "donut",
\\ "name": "Old Fashioned",
\\ "ppu": 0.55,
\\ "batters": {
\\ "batter": [
\\ {
\\ "id": "1001",
\\ "type": "Regular"
\\ },
\\ {
\\ "id": "1002",
\\ "type": "Chocolate"
\\ }
\\ ]
\\ },
\\ "topping": [
\\ {
\\ "id": "5001",
\\ "type": "None"
\\ },
\\ {
\\ "id": "5002",
\\ "type": "Glazed"
\\ },
\\ {
\\ "id": "5003",
\\ "type": "Chocolate"
\\ },
\\ {
\\ "id": "5004",
\\ "type": "Maple"
\\ }
\\ ]
\\ }
\\]
;
fn stdJson() !void {
for (0..100_000) |_| {
const output = try std.json.parseFromSlice(T, c_ally, input, .{});
defer output.deinit();
}
}
fn gettyJson() !void {
for (0..100_000) |_| {
const output = try json.fromSlice(c_ally, T, input);
defer json.de.free(c_ally, output, null);
}
}
fn gettyJsonArena() !void {
for (0..100_000) |_| {
var arena = std.heap.ArenaAllocator.init(c_ally);
const arena_ally = arena.allocator();
defer arena.deinit();
_ = try json.fromSlice(arena_ally, T, input);
}
}
pub fn main() !void {
//try gettyJson();
//try gettyJsonArena();
//try stdJson();
} $ hyperfine --warmup 5 ./getty ./getty-arena ./std
Benchmark 1: ./getty
Time (mean ± σ): 718.4 ms ± 3.7 ms [User: 713.4 ms, System: 3.5 ms]
Range (min … max): 715.6 ms … 727.9 ms 10 runs
Benchmark 2: ./getty-arena
Time (mean ± σ): 628.7 ms ± 1.5 ms [User: 622.5 ms, System: 4.6 ms]
Range (min … max): 626.6 ms … 631.3 ms 10 runs
Benchmark 3: ./std
Time (mean ± σ): 482.7 ms ± 1.5 ms [User: 476.2 ms, System: 4.9 ms]
Range (min … max): 481.2 ms … 486.4 ms 10 runs
Summary
./std ran
1.30 ± 0.01 times faster than ./getty-arena
1.49 ± 0.01 times faster than ./getty The unnecessary allocations are, surely, the main factor for Getty's slowness. However, I should note that With or without an arena, though, Getty's still much slower. So it's time to get started on this issue! |
Removed accepted label for now since, as pointed out by fredi, there are major issues with implementing this kind of thing, which I've unfortunately had the chance to run into in my own branch. For one thing, the multiple ranges idea was a total non-starter. I think my brain farted when I came up with that. The input for Another issue is |
Before I can implement the lifetime optimizations, I had to do a bit of general allocation work beforehand. The above, merged PR implements that allocation work. In summary, Getty now uses an arena internally for all allocations. This simplifies visitors and deserialization blocks as they no longer have to worry about freeing values and allows end users to free everything whenever they want. Additionally, the arena is passed to the methods of Deserializer implementations so they're simplified a tad as well. Big thanks to fredi for discussing all this with me and steering me in the right direction :D |
The last half consists of the lifetime work, which consists of two parts:
LifetimesThe lifetime types will be more or less the same as what I've already proposed:
|
Performance updates after arena changes (using same benchmarking code): $ hyperfine --warmup 5 ./getty ./std
Benchmark 1: ./getty
Time (mean ± σ): 688.8 ms ± 1.9 ms [User: 684.7 ms, System: 3.0 ms]
Range (min … max): 685.8 ms … 692.3 ms 10 runs
Benchmark 2: ./std
Time (mean ± σ): 484.0 ms ± 2.7 ms [User: 480.7 ms, System: 2.2 ms]
Range (min … max): 481.3 ms … 489.1 ms 10 runs
Summary
./std ran
1.42 ± 0.01 times faster than ./getty Slightly slower than |
Performance update after some optimizations in getty-json (no peeking, always allocating strings, heap branch first). Note that std's runtime has increased overall b/c we're now correctly passing in $ hyperfine --warmup 5 ./getty ./std
Benchmark 1: ./getty
Time (mean ± σ): 678.8 ms ± 1.3 ms [User: 675.0 ms, System: 2.9 ms]
Range (min … max): 676.3 ms … 680.7 ms 10 runs
Benchmark 2: ./std
Time (mean ± σ): 573.4 ms ± 2.8 ms [User: 567.5 ms, System: 4.6 ms]
Range (min … max): 569.5 ms … 578.8 ms 10 runs
Summary
./std ran
1.18 ± 0.01 times faster than ./getty We shaved off around 10ms. |
Problem
There is no way for a visitor to know if a pointer value it has received is:
Thus, visitors are forced to play it safe and always make copies, which can result in unnecessary allocations.
Proposal
To fix this, the following things need to be added to Getty:
A way for visitors to know if the pointer value they received from a deserializer is safe to use as part of their return value.
value
parameter invisitString
and the return value of access methods (e.g.,nextKeySeed
,nextElementSeed
).A way for deserializers to know if
visitString
is using the slice as part of the final value, and how much of that slice is being used.Part One: The Visitor
How can visitors know if the pointer value they received from a deserializer is safe to use as part of their return value?
To solve this, we can do the following:
Define the following type:
Lifetime
design.The type will indicate the lifetime and ownership properties of pointer values passed to visitors:
Stack
: The value lives on the stack and its lifetime is shorter than the deserialization process.Heap
: The value lives on the heap and its lifetime is longer than the deserialization process and is independent of any entity.Owned
: The value lives on the stack or heap and its lifetime is managed by some entity.Owned
value's lifetime is safe, they must always copy such values.When should visitors free the pointer values they receive?
Stack
orOwned
values should never be freed by the visitor.Stack
values will be automatically cleaned up by the compiler, obviously.Owned
values will be cleaned up eventually after deserialization is finished by the entity that owns them.Heap
values passed tovisitString
should never be freed by the visitor. This is b/c the value is a Getty value and so the deserializer is responsible for freeing it.Heap
values returned from an access method should be freed by the visitor upon error or if it's not part of the final value. The deserializer will never see these values again, so it's the visitor's responsibility to free them.Add a
lifetime
parameter tovisitString
that specifies theLifetime
ofinput
.Remove the
is*Allocated
methods from access interfaces. WithLifetime
, we don't need them anymore.Modify the successful return type of access methods to be:
With these changes, visitors can do the following:
Part Two: The Deserializer
How does a deserializer know if
visitString
is using the slice as part of the final value, and how much of that slice is being used?Before diving in, there are a few things to keep in mind:
Heap
values.Stack
values are obviously managed automatically by the compiler.Owned
values are managed outside the deserialization process, so functions likedeserializeString
don't need to worry about them.visitString
might not be a string at all, so we shouldn't rely solely onvisitString
's return value. Besides, even if it is a string it'll be very tedious using it in the deserializer to figure out what to free and what to keep.In any case, to solve this, we can do the following:
Change the return type of
visitString
to the following:indices
isnull
, then that meansvisitString
did not useinput
as part of its return value. In which case, the deserializer should freevalue
afterwards.indices
is notnull
, then that meansvisitString
did useinput
as part of its return value.start
andend
specifies the starting and ending indices ininput
at whichvisitString
's return value begins and ends.With this new
indices
field, the deserializer now knows 1) if the visitor is usinginput
directly in its return value, and 2) how much ofinput
is being used.input
is being used, then the deserializer should not freeinput
after callingvisitString
.input
is being used, then the deserializer can usestart
andend
to determine the remaining parts ofinput
that should be freed.The text was updated successfully, but these errors were encountered: