stdlib benchmarks regression: DataflowError with stack #10702

Akirathan · 2024-07-29T09:03:43Z

There are some big regressions in *...Map_Id_All_Errors benchmarks. In some cases, even 500%.

Screenshots:

https://enso-org.github.io/engine-benchmark-results/stdlib-benchs.html#org_enso_benchmarks_generated_Map_Error_Benchmark_Vector_ignore_Map_Id_All_Errors

Good commit: f31c084
Bad commit: f0e9616
Range: f31c084...f0e9616

The text was updated successfully, but these errors were encountered:

Akirathan · 2024-07-29T12:49:47Z

Blocked by #10706. Running Enso-only benchmarks gives nonsense values. For example:

$ enso --run test/Benchmarks Map_Id_All_Errors
Benchmarking 'Map_Error_Benchmark_Vector_ignore.Map_Id_All_Errors' with configuration: [warmup={3 iterations, 3 seconds each}, measurement={3 iterations, 3 seconds each}]
Warmup duration:    18805 ms
Warmup invocations: 1
Warmup avg time:    18725.603 ms (+-NaN)
Measurement duration:    17741 ms
Measurement invocations: 1
Measurement avg time:    17734.925 ms (+-NaN)
Benchmark 'Map_Error_Benchmark_Vector_ignore.Map_Id_All_Errors' finished in 36563.609 ms
Benchmarking 'Map_Error_Benchmark_Vector_report_warning.Map_Id_All_Errors' with configuration: [warmup={3 iterations, 3 seconds each}, measurement={3 iterations, 3 seconds each}]
Warmup duration:    15681 ms
Warmup invocations: 1
Warmup avg time:    15674.661 ms (+-NaN)
Measurement duration:    15361 ms
Measurement invocations: 1
Measurement avg time:    15354.849 ms (+-NaN)
Benchmark 'Map_Error_Benchmark_Vector_report_warning.Map_Id_All_Errors' finished in 31045.643 ms
Benchmarking 'Map_Error_Benchmark_Array_ignore.Map_Id_All_Errors' with configuration: [warmup={3 iterations, 3 seconds each}, measurement={3 iterations, 3 seconds each}]
Warmup duration:    16347 ms
Warmup invocations: 1
Warmup avg time:    16342.969 ms (+-NaN)
Measurement duration:    15964 ms
Measurement invocations: 1
Measurement avg time:    15960.464 ms (+-NaN)
Benchmark 'Map_Error_Benchmark_Array_ignore.Map_Id_All_Errors' finished in 32313.34 ms
Benchmarking 'Map_Error_Benchmark_Array_report_warning.Map_Id_All_Errors' with configuration: [warmup={3 iterations, 3 seconds each}, measurement={3 iterations, 3 seconds each}]
Warmup duration:    16355 ms
Warmup invocations: 1
Warmup avg time:    16352.555 ms (+-NaN)
Measurement duration:    16418 ms
Measurement invocations: 1
Measurement avg time:    16414.449 ms (+-NaN)
...

We first need to fix the JMH stdlib benchmark invocation.

Akirathan · 2024-07-29T12:50:48Z

However, the PR that most likely caused the regression is #9625. I cannot verify it at the moment, though.

JaroslavTulach · 2024-07-30T12:42:23Z

However, the PR that most likely caused the regression is #9625. I cannot verify it at the moment, though.

Let's make sure @GregoryTravis is aware of that.

Akirathan · 2024-07-30T17:25:54Z

To me, it seems like #9625 changed the behavior of dataflow errors so that the stack traces are now attached all the time again. They used to be attached only iff JVM assertions were enabled.

GregoryTravis · 2024-07-30T18:53:23Z

@Akirathan it actually isn't the stack trace that is the problem. As @JaroslavTulach suggested, it is the overhead of calling hasContextEnabled each time in DataflowError.withDefaultTrace. In the affected benchmarks ("Map_Id_All_Errors"), it is called 50M times, because it is testing a 100k array/vector with an error on each element. The other benchmarks are not affected in the same way. So the regression was occurring in an extreme test of this code.

I reproduced the regression locally, and it was not adding stack traces at all.

(Of course Stack traces definitely do add overhead, but these will only be added if the user explicitly requests them using Context.Dataflow_Stack_Trace.with_enabled, which is an advanced feature.)

A possible temporary fix for this would be to again require the "-ea" flag to even check the Execution_Environment settings; we could then create a custom flag for this to use instead of "-ea".

However, we might consider leaving it as-is, since this only affects error paths, and seems to only be noticeable in the case of a large number of errors. (In regular usage, we restrict the errors to 100, I believe.)

JaroslavTulach · 2024-07-31T04:50:17Z

it is the overhead of calling hasContextEnabled each time in DataflowError.withDefaultTrace

Speed it up!

JaroslavTulach · 2024-07-31T08:28:59Z

Pavel, please take a look and speed the hasContextEnabled check up.

Akirathan · 2024-10-15T14:14:16Z

Investigating the regression in PR #11153 on latest develop (03369b9) reveals that the problem is not in ExecutionEnvironment.hasContextEnabled, but in EnsoContext.getExecutionEnvironment. To prove this theory, apply the following patch:

diff --git a/engine/runtime/src/main/java/org/enso/interpreter/runtime/EnsoContext.java b/engine/runtime/src/main/java/org/enso/interpreter/runtime/EnsoContext.java
index 1f44b2329..bbb8d8b1a 100644
--- a/engine/runtime/src/main/java/org/enso/interpreter/runtime/EnsoContext.java
+++ b/engine/runtime/src/main/java/org/enso/interpreter/runtime/EnsoContext.java
@@ -874,8 +874,7 @@ public final class EnsoContext {
   }
 
   public ExecutionEnvironment getExecutionEnvironment() {
-    ExecutionEnvironment env = language.getExecutionEnvironment();
-    return env == null ? getGlobalExecutionEnvironment() : env;
+    return ExecutionEnvironment.LIVE;
   }
 
   /** Set the runtime execution environment of this context. */

and then, the score of Map_Error_Benchmark_Vector_ignore.Map_Id_All_Errors benchmarks goes from 1.2 to 0.132.

TL;DR; The most important part to optimize is EnsoContext.getExecutionEnvironment and not ExecutionContext.hasContextEnabled.

enso-bot · 2024-10-15T16:20:58Z

Pavel Marek reports a new STANDUP for today (2024-10-15):

Progress: - Merging and restarting jobs on #11321 and #11217

Found out the real cause of the benchmark regression - stdlib benchmarks regression: DataflowError with stack #10702 (comment)
Created PR for simple fix for A simple inline comment crashes compiler with NPE #11276 It should be finished by 2024-10-17.

JaroslavTulach · 2024-10-16T08:44:56Z

TL;DR; The most important part to optimize is EnsoContext.getExecutionEnvironment and not ExecutionContext.hasContextEnabled.

I am not completely sure.

   public ExecutionEnvironment getExecutionEnvironment() {
-    ExecutionEnvironment env = language.getExecutionEnvironment();
-    return env == null ? getGlobalExecutionEnvironment() : env;
+    return ExecutionEnvironment.LIVE;
   }

By returning a compilation constant from getExecutionEnvironment you can eliminate cost of other computations based on this constant (including hasContextEnabled), but the problem is: getExecutionEnvironment cannot be a compilation constant all the time- that was the whole point of #11173

We can speculate with Assumption on the getExecutionEnvironment value to never be changed during execution. It would:

speed the benchmark up
in majority of cases (when Context.with_enabled & co. isn't used) it would speed the CLI execution up
however the IDE will be using Run node in a different execution environment #11173 and there the speculation wouldn't help at all

We can go with the Assumption route, but we probably also need to think about optimizing hasContextEnabled.

enso-bot · 2024-10-17T08:07:44Z

Pavel Marek reports a new STANDUP for yesterday (2024-10-16):

Progress: - Managed to improve the perf a little bit.

Merged Inline doc comment is a compiler error #11333.
Experimenting with [PoC] Generate IR definitions with annotation processor #11267
- Trying to find a way how to retain 100% backward compatibility.
- Calling Java from Scala with default arguments is not possible - scalac encodes this info into special class annotations.
- What about generating Scala case classes from Java annotation processor? It should be finished by 2024-10-17.

enso-bot · 2024-10-17T17:20:43Z

Pavel Marek reports a new STANDUP for today (2024-10-17):

Progress: - Context handling functionality converted to nodes.

With appropriate caching, the performance improved from 1.179 to 0.218 (https://github.com/enso-org/enso/pull/11153/files#r1804911834)
Starting investigation of Convert Array_Like_Helpers.map to a builtin to reduce stack size #11329
- Doing experiment to see how long is the biggest stack trace. It should be finished by 2024-10-17.

GitHub
Speedup DataflowError.withDefaultTrace by Akirathan · Pull Request #11153 · enso-org/enso
Fixes #10702 Pull Request Description Improves the speed of ExecutionEnvironment.hasContextEnabled. Important Notes Local speedup of Map_Error_Benchmark_Vector_ignore.Map_Id_All_Errors benchmark is...

Akirathan added p-high Should be completed in the next sprint -compiler -libs Libraries: New libraries to be implemented --low-performance --regression Important: regression labels Jul 29, 2024

Akirathan self-assigned this Jul 29, 2024

github-project-automation bot added this to Issues Board Jul 29, 2024

github-project-automation bot moved this to ❓New in Issues Board Jul 29, 2024

Akirathan mentioned this issue Jul 29, 2024

Cannot run stdlib benchmarks locally via JMH #10706

Closed

enso-bot bot mentioned this issue Jul 29, 2024

Migrate WithWarnings to use EnsoHashMap to speed them up significantly #8682

Closed

JaroslavTulach assigned GregoryTravis Jul 30, 2024

JaroslavTulach mentioned this issue Jul 30, 2024

Implement Runtime.Context.Dataflow_Stack_Trace for dataflow errors thrown from Enso #9625

Merged

5 tasks

GregoryTravis moved this from ❓New to 🔧 Implementation in Issues Board Jul 30, 2024

JaroslavTulach unassigned GregoryTravis Jul 31, 2024

JaroslavTulach changed the title ~~stdlib benchmarks regression 2024-07-29~~ stdlib benchmarks regression: DataflowError with stack Aug 23, 2024

enso-bot bot mentioned this issue Sep 9, 2024

Provide engine-runner & language-server as a separate JPMS modules #10157

Closed

JaroslavTulach mentioned this issue Sep 19, 2024

Move execution environment from State to EnsoContext #11075

Merged

3 tasks

Akirathan mentioned this issue Sep 23, 2024

Speedup DataflowError.withDefaultTrace #11153

Merged

4 tasks

mergify bot closed this as completed in #11153 Oct 24, 2024

github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stdlib benchmarks regression: DataflowError with stack #10702

stdlib benchmarks regression: DataflowError with stack #10702

Akirathan commented Jul 29, 2024

Akirathan commented Jul 29, 2024

Akirathan commented Jul 29, 2024

JaroslavTulach commented Jul 30, 2024

Akirathan commented Jul 30, 2024

GregoryTravis commented Jul 30, 2024

JaroslavTulach commented Jul 31, 2024

JaroslavTulach commented Jul 31, 2024

Akirathan commented Oct 15, 2024

enso-bot bot commented Oct 15, 2024

JaroslavTulach commented Oct 16, 2024

enso-bot bot commented Oct 17, 2024

enso-bot bot commented Oct 17, 2024 •

edited by unfurl-links bot

Loading

stdlib benchmarks regression: DataflowError with stack #10702

stdlib benchmarks regression: DataflowError with stack #10702

Comments

Akirathan commented Jul 29, 2024

Akirathan commented Jul 29, 2024

Akirathan commented Jul 29, 2024

JaroslavTulach commented Jul 30, 2024

Akirathan commented Jul 30, 2024

GregoryTravis commented Jul 30, 2024

JaroslavTulach commented Jul 31, 2024

JaroslavTulach commented Jul 31, 2024

Akirathan commented Oct 15, 2024

enso-bot bot commented Oct 15, 2024

JaroslavTulach commented Oct 16, 2024

enso-bot bot commented Oct 17, 2024

enso-bot bot commented Oct 17, 2024 • edited by unfurl-links bot Loading

enso-bot bot commented Oct 17, 2024 •

edited by unfurl-links bot

Loading