-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split suboptimal_flops
into smaller lints
#6867
Comments
Anything other than |
@camsteffen You mean e.g. changing
to
Those also seem like readability wins to me, or at least not making readability worse. In any case I think that It is true though that it's a bit subjective to single out |
|
Besides readability, |
Sure. I'm not arguing that it's hard to understand what it does. That's pretty easy to find, and just from the name it's pretty obvious. But there's a reason people usually prefer infix operators over function calls, they are usually more readable. Especially if you start nesting them. |
I'd argue that |
I took the liberty of writing a small test case. Running this with use std::time::Instant;
#[allow(clippy::float_cmp)]
fn main() {
let mut data = vec![];
for i in 0..1000000 {
// Just some arbitrary data
data.push(((i ^ 243423) as f32, (i as f32 * 242.0) as f32, (i ^ 123953) as f32));
}
for _ in 0..10 {
let t0 = Instant::now();
let a = with_mul_add(&data);
let t1 = Instant::now();
let b = without_mul_add(&data);
let t2 = Instant::now();
println!("Time with mul_add: {:.1}ms", (t1 - t0).as_secs_f64() * 1000.0);
println!("Time without mul_add: {:.1}ms", (t2 - t1).as_secs_f64() * 1000.0);
println!("mul_add result: {}, without mul_add: {}", a, b);
}
}
#[inline(never)]
pub fn with_mul_add(data: &[(f32, f32, f32)]) -> f32 {
let mut s = 0.0;
for &(x,y,z) in data {
s += x.mul_add(y, z);
}
s
}
#[inline(never)]
pub fn without_mul_add(data: &[(f32, f32, f32)]) -> f32 {
let mut s = 0.0;
for &(x,y,z) in data {
s += x * y + z;
}
s
}
However, running with the
I think (@termhn correct me if I'm wrong) |
Yep, you're correct. FMA are generally not inherently slower (in fact, in most cases, they are faster). However like you said the Rust compiler is conservative and doesn't assume. There definitely are cases where you can get in trouble by only using FMAs even when you do turn on |
Interesting analysis. Thank you! I believe we may want to add a note to the lint description that says people should measure carefully when applying this lint for speed. |
Yes thanks for the additional analysis! Here is a comparison of all the functions with godbolt: https://godbolt.org/z/9qvads Different instructions are used for So I think it is out of scope for Clippy to sub-categorize these cases. Maybe we could have a config like |
I ran som additional tests on the other functions:
So while most of these operations are faster (some by a lot, some by a tiny amount), the optimizer can usually not do anything about it because the result will not be exactly the same due to rounding and approximations. One thing that stands out is that log10 is so slow. And definitely not faster than log(10.0). |
suboptimal_flops suggest rewriting code in a style that is less readable (e.g. with mul_add) for very thin benefit (better rounding error *on some platforms*). See for instance EmbarkStudios/puffin@9bd18e6 I find it a net loss, so I make this PR to solicit other opinions. Related: rust-lang/rust-clippy#6867
I'm in agreement here that If you're used to mathematical notation, then
is tolerably close to ax³ + bx² + cx + d, but
isn't, really. |
FWIW It's interesting that But really, given the age-old advice to always benchmark your use case and not rely on micro-benchmarks, I think it's inappropriate for this lint to be making blanket statements about the relative performance of these functions, and the right thing to do is have configuration for them. That would also allow breaking up both Footnotes |
suboptimal_flops
is very useful. Most of its suggestions are wins both for readability and performance, however some significantly reduce readability in my opinion.E.g.
is a lot more readable than
in particular for new developers that may not be familiar with
mul_add
.We'd love to enable
suboptimal_flops
for our code-base, but there is definitely some friction where it makes suggestions that just makes the code less readable in code that doesn't need the absolute maximum performance.I propose splitting
suboptimal_flops
into two smaller lint groups such that one group contains all lints that improve performance and readability, and the other contains those that improve performance, but may impact readability negatively.Drawbacks
More lint types
The text was updated successfully, but these errors were encountered: