-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Result-based error handling optimizes very poorly for large payloads #42047
Comments
For reference, this is basically a simplified version of what webrender parsing its display lists with bincode looks like. |
I expect the bottleneck is that the indirect and direct versions appear to have very different error semantics without a full picture: direct preserves the old value on Err, while indirect partially overwrites the old value on Err. In our case we simply don't care what the value is on Err, so the two are semantically identical. However knowing this requires fairly strong alias information. |
On what hardware were the benchmark results obtained? |
Reverting #31545 improves things for me from: to: test bench_direct ... bench: 92,552 ns/iter (+/- 4,365) |
Assuming we can write directly in the return destination, I came up with different benchmarks: Click for code#![feature(test, box_syntax)]
extern crate test;
use std::io::{Write, Error, ErrorKind};
const DATA_SIZE: usize = 200;
#[derive(Copy, Clone)]
struct Huge {
data: [u8; DATA_SIZE],
}
struct HugeIter<'a> {
cur_val: Huge,
buf: &'a [u8],
}
impl<'a> HugeIter<'a> {
fn next(&mut self) -> Option<&Huge> {
if let Ok(()) = parse_huge(&mut self.buf, &mut self.cur_val) {
Some(&self.cur_val)
} else {
None
}
}
fn next_roundtrip_caller(&mut self) -> Option<&Huge> {
unsafe {
let mut val = Huge { data: ::std::mem::uninitialized() };
let result = match parse_huge(&mut self.buf, &mut val) {
Ok(()) => Ok(val),
Err(e) => Err(e)
};
if let Ok(ref val) = result {
self.cur_val = *val;
Some(&self.cur_val)
} else {
None
}
}
}
}
struct HugeResultIter<'a> {
cur_val: Result<Huge, Error>,
buf: &'a [u8],
}
impl<'a> HugeResultIter<'a> {
fn next(&mut self) -> Option<&Huge> {
parse_huge_result(&mut self.buf, &mut self.cur_val);
self.cur_val.as_ref().ok()
}
fn next_copy(&mut self) -> Option<&Huge> {
parse_huge_result_copy(&mut self.buf, &mut self.cur_val);
self.cur_val.as_ref().ok()
}
fn next_copy_return(&mut self) -> Option<&Huge> {
self.cur_val = parse_huge_result_copy_return(&mut self.buf);
self.cur_val.as_ref().ok()
}
}
fn parse_huge(src: &mut &[u8], dest: &mut Huge) -> Result<(), Error> {
if src.len() < DATA_SIZE { return Err(Error::new(ErrorKind::UnexpectedEof, "OH NO")) }
(&mut dest.data[..]).write_all(&src[..DATA_SIZE])?;
*src = &src[DATA_SIZE..];
Ok(())
}
fn parse_huge_result(src: &mut &[u8], dest: &mut Result<Huge, Error>) {
if src.len() < DATA_SIZE {
*dest = Err(Error::new(ErrorKind::UnexpectedEof, "OH NO"));
return;
}
let mut result = Ok(());
if let Ok(ref mut val) = *dest {
result = (&mut val.data[..]).write_all(&src[..DATA_SIZE]);
*src = &src[DATA_SIZE..];
}
if let Err(e) = result {
*dest = Err(e);
}
}
fn parse_huge_result_copy(src: &mut &[u8], dest: &mut Result<Huge, Error>) {
if src.len() < DATA_SIZE {
*dest = Err(Error::new(ErrorKind::UnexpectedEof, "OH NO"));
return;
}
unsafe {
let mut val = Huge { data: ::std::mem::uninitialized() };
let result = (&mut val.data[..]).write_all(&src[..DATA_SIZE]);
if let Err(e) = result {
*dest = Err(e);
} else {
*dest = Ok(val);
}
*src = &src[DATA_SIZE..];
}
}
fn parse_huge_result_copy_return(src: &mut &[u8]) -> Result<Huge, Error> {
if src.len() < DATA_SIZE { return Err(Error::new(ErrorKind::UnexpectedEof, "OH NO")); }
unsafe {
let mut val = Huge { data: ::std::mem::uninitialized() };
(&mut val.data[..]).write_all(&src[..DATA_SIZE])?;
*src = &src[DATA_SIZE..];
Ok(val)
}
}
#[bench]
fn bench_huge(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeIter { cur_val: Huge { data: [0; 200] }, buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
#[bench]
fn bench_huge_roundtrip_caller(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeIter { cur_val: Huge { data: [0; 200] }, buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next_roundtrip_caller() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
#[bench]
fn bench_huge_result(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeResultIter { cur_val: Ok(Huge { data: [0; 200] }), buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
#[bench]
fn bench_huge_result_copy(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeResultIter { cur_val: Ok(Huge { data: [0; 200] }), buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next_copy() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
#[bench]
fn bench_huge_result_copy_return(b: &mut test::Bencher) {
let data = test::black_box(vec![0; 1_000_000]);
b.iter(|| {
let mut iter = HugeResultIter { cur_val: Ok(Huge { data: [0; 200] }), buf: &data };
let mut total: u8 = 0;
while let Some(val) = iter.next_copy_return() {
total += val.data[..].iter().cloned().sum::<u8>();
}
total
});
}
The difference between EDIT: For doing the EDIT2: I've added |
Yes, that matches what I'd been thinking the solution would be. (not sure on the precise MIR details) To be more concrete: fn callee()->Result<T, E>; would lower to (using an &uninit to represent a pointer that doesn't have a destructor) fn callee_abi(ok: &uninit T, err: &uninit E) -> bool; So that the callee knows that it can freely clobber However the hard part isn't the callee, but rather the caller. It needs to prove that it's semantically valid to feed its locals into such an ABI. Consider for instance: fn caller() -> Result<(), E> {
let mut x = vec![1, 2, 3];
println!("{:?}", x);
x = callee()?;
println!("{:?}", x);
} Naively this desugars to: fn caller_abi(err: &uninit E) -> bool {
let mut x = vec![1, 2, 3];
println!("{:?}", x);
let temp_ok;
if !callee_abi(&uninit temp_ok, err) {
return false;
}
drop(x); // Drop the old value *after* the call!
x = temp_ok;
println!("{:?}", x);
drop(x);
return true;
} But what we really want is: fn caller_abi(err: &uninit E) -> bool {
let mut x = vec![1, 2, 3];
println!("{:?}", x);
drop(x); // Drop the old value *before* the call
if !callee_abi(&uninit x, err) {
return false;
}
println!("{:?}", x);
drop(x);
return true;
} Specifically Rust isn't within its rights to arbitrarily reorder destructors between calls, especially when Currently the real-world use-case that's motivating this issue (webrender's display list deserializing iterator) has several nice properties that make this transform easier:
But the last case isn't something I can fully guarantee in the future. In particular we might add a (rare) case to our top-level enum that has a destructor, making the whole thing have a destructor. It's possible we could do a match to check if the enum is "currently POD" ("dynamically needs_drop") and then take a fast no-copy path or a slow path with a temporary? But that's a really sketchy application-specific optimization. |
Oh you will also need to answer the question of if the fn foo() -> Result<T, E>;
let result = box foo(); to be lowered to let result = malloc();
let is_ok = foo_abi(&uninit result.repr.ok_payload, &uninit result.repr.err_payload);
result.repr.tag = if is_ok { Ok } else { Err }; On balance I think the answer should be "no" (would probably pessimize the callee too much?) |
Potential steps that we could take to achieve this: pub fn foo() -> u64 {
let mut x = 0;
// Doesn't currently propagate destination because of the borrow.
drop(&mut x);
x
} pub struct Newtype<T>(T);
pub fn foo() -> Newtype<u64> {
let mut x = 0;
drop(&mut x);
// Does not currently propagate the destination because of the field projection.
Newtype(x)
} The last example should generalize to all enums (random note: deaggregation doesn't work on tuples for some reason? maybe it doesn't work on arrays either). EDIT: wait, I said "doesn't currently", but we don't seem to actually have any kind of destination propagation MIR optimization. |
@gankro If they can't alias we can't actually use this with a regular |
So I've been working on a |
So here's the best counter-argument I can come up with for aliasing T and E: fn foo() -> Result<Vec<T>, E> {
let mut result = vec![];
for x in values() {
// I forget how placers work but I want to use them for p e r f
result.place_back() <- try_process(x)?;
}
Ok(result)
} lowers to: fn foo_abi(ok: &uninit Vec<T>, err: &uninit E) -> bool {
*ok = vec![];
for x in values() {
let place = result.place_back();
let temp_err;
if !try_process_abi(&place, &temp_err, x) {
// Can't construct in-place because we can't reorder drop of `ok`
let temp_err2 = convert_into_other_type(temp_err);
drop_in_place(ok);
*err = temp_err2;
return false;
}
return true;
} In this example we see |
FWIW I hope not to expose the separate locations which would hamper that optimization indeed. However, that's closer to today than guaranteeing the variants don't overlap, and I don't mind being conservative while trying such a scheme out. |
EDIT: I just realized that the measurements in that comment aren't what I remember, that's weird. |
The overhead seems to mostly be extra memcopies.
Method
I tested three cases:
direct() -> Result<Huge, ()>
direct_boxed() -> Result<Box<Huge>, ()>
indirect(&mut Huge) -> Result<(), ()>
Click For Code
Bench Results:
ASM
ASM for bench_direct_boxed's closure
ASM for bench_direct's closure
ASM for bench_indirect's closure
The text was updated successfully, but these errors were encountered: