Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scheduler exit-early feature to avoid long tail with sparse activity #6959

Closed
derekbruening opened this issue Aug 30, 2024 · 0 comments · Fixed by #7018
Closed

Add scheduler exit-early feature to avoid long tail with sparse activity #6959

derekbruening opened this issue Aug 30, 2024 · 0 comments · Fixed by #7018
Assignees

Comments

@derekbruening
Copy link
Contributor

In PR #6955:

I found a code path where if the runqueue is empty and the current thread is supposed to go unscheduled it would be run again instead. Fixed that here; unclear it has ever happened.

This fix is causing everyone-unscheduled issues at the end of some runs. That "bug" was not all that unreasonable if there no live threads on other cores, in which case it's similar to today's all-unscheduled mechanism. This issue covers addressing the bug (going to revert the fix for now) along with adding an early exit feature.

For the early exit, if we had the remaining record count in the unscheduled threads and knew there wasn't much left, it would be an easier decision. Probably for now we'll put it under a flag.

@derekbruening derekbruening self-assigned this Aug 30, 2024
derekbruening added a commit that referenced this issue Aug 30, 2024
Reverts a fix for a bug in the scheduler where it let a thread going
unscheduled continue running if there are no other non-running-now
scheduleable inputs.  This triggered too-frequent all-unscheduled
cases and the current timeout for those is too high, causing tail
delays.  We'll re-instate the fix once we add an early exit feature
for that scenario.

Issue: #6959
derekbruening added a commit that referenced this issue Aug 30, 2024
Reverts a fix for a bug in the scheduler where it let a thread going
unscheduled continue running if there are no other non-running-now
scheduleable inputs. This triggered too-frequent all-unscheduled cases
and the current timeout for those is too high, causing tail delays.
We'll re-instate the fix once we add an early exit feature for that
scenario.

Issue: #6959
@derekbruening derekbruening changed the title Add exit early feature if all remaining threads are unscheduled Add scheduler exit-early feature to avoid long tail with sparse activity Sep 26, 2024
derekbruening added a commit that referenced this issue Sep 27, 2024
Fixes a bug where an input that just went unscheduled indefinitely
will be resumed if there are no other inputs to run.

Adds a unit test that fails without the fix.

Issue: #6959, #6822
derekbruening added a commit that referenced this issue Sep 28, 2024
Fixes a bug where an input that just went unscheduled indefinitely will
be resumed if there are no other inputs to run.

Adds a unit test that fails without the fix.

Issue: #6959, #6822
derekbruening added a commit that referenced this issue Oct 1, 2024
When using the drmemtrace scheduler in an analyzer or other tool that
does not track simulated time, the scheduler used to use wall-clock
time.  Here we change that to use the instruction count plus a scaled
idle count.  An idle counter is added and a new scale option
scheduler_options_t.time_units_per_idle (and CLI
-sched_time_units_per_idle) defaulting to 5.

The time_units_per_us and sched_time_units_per_us defaults are set to
1000, reflecting a gHz machine with IPC=0.5

Using counters provides a more reproducible result across different
runs and machines.

Adds a test of the new option.

The default values of the options were tested on a large trace and
found to produce a representative level of idle time during the main
execution (and the whole run when combined with the forthcoming
exit-early feature for #6959).

This means that the clock going backward problem (#6966) is no longer
seen in default runs.  The analyzer still supports wall-clock with the
-sched_time option so a check to avoid underflow is added.

Fixes #6971
Fixes #6966
derekbruening added a commit that referenced this issue Oct 2, 2024
Adds a new scheduler feature and CLI option exit_if_fraction_left.
This applies to -core_sharded and -core_serial modes.  When an input
reaches EOF, if the number of non-EOF inputs left as a fraction of the
original inputs is equal to or less than this value then the scheduler
exits (sets all outputs to EOF) rather than finishing off the final
inputs.  This helps avoid long sequences of idles during staggered
endings with fewer inputs left than cores and only a small fraction of
the total instructions left in those inputs.

The default value in scheduler_options_t is 0 as simulators are
typically already choosing to stop at some even point.  For analyzers,
however, via the command-line option, the default is 0.05 (i.e., 5%),
which when tested on an large internal trace helps eliminate much of
the final idle time from the cores (just about any value over 0.05
works well: it is not overly sensitive).

Compare the numbers below for today's default with a long idle time
and so distinct differences between the "cpu busy by time" and "cpu
busy by time, ignoring idle past last instr" stats on a 39-core
schedule-stats run of a moderately large trace, with key stats and the
1st 2 cores (for brevity) shown here:

  1567052521 instructions
   878027975 idles
       64.09% cpu busy by record count
       82.38% cpu busy by time
       96.81% cpu busy by time, ignoring idle past last instr
Core #0 schedule: CccccccOXHhUuuuuAaSEOGOWEWQqqqFffIiTETENWwwOWEeeeeeeACMmTQFfOWLWVvvvvFQqqqqYOWOooOWOYOYQOWO_O_W_O_W_O_W_O_WO_WO_O_O_O_O_O_OR_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_RY_YyyyySUuuOSISO_S_S_SOPpSOKO_KO_KCcDKWDB_B_____________________________________________
Core #1 schedule: KkLWSFUQPDddddddddXxSUSVRJWKkRNJBWUWwwTttGgRNKkkRWNTtFRWKkRNWUuuGULRFSRSYKkkkRYAYFffGSRYHRYHNWMDddddddddRYGgggggYHNWK_YAHYNnGYSNHWwwwwSWSNKSYyyWKNNWKNNGAKWGggNnNW_NNWE_E_EF__________________________________________________

And now with -exit_if_fraction_left 0.05, where we lose (1567052521 -
1564522227)/1567052521. = 0.16% of the instructions but drastically
reduce the tail from 14% of the time to less than 1% of the time:

  1564522227 instructions
   120512812 idles
       92.85% cpu busy by record count
       96.39% cpu busy by time
       97.46% cpu busy by time, ignoring idle past last instr
766.85user 6.33system 1:15.88elapsed 1018%CPU (0avgtext+0avgdata 4947364maxresident)k
Core #0 schedule: CccccccOXHKYEGGETRARrrPRTVvvvRrrNWwwOOKWVRRrPBbbXUVvvvvvOWKVLWVvvJjSOWKVUuTIiiiFPpppKAaaMFfffAHOKWAaGNBOWKAPPOABCWKPWOKWPCXxxxZOWKCccJSOSWKJUYRCOWKCcSOSUKkkkOROK_O_O_O_O_O
Core #1 schedule: KkLWSMmmFLSFffffffJjWBbGBUuuuuuuuuuuBDBJJRJWKkRNJWMBKkkRNWKkRNWKkkkRNWXxxxxxZOooAaUIiTHhhhSDNnnnHZzQNnnRNWXxxxxxRNWUuuRNWKXUuXRNKRWKNXxxRWKONNHRKWONURKWXRKXRKNW_KR_KkRK_KRKR_R_R_R_R_R_R_R_R_R_R_R__R__R__R___R___R___R___R___R

Fixes #6959
derekbruening added a commit that referenced this issue Oct 4, 2024
Adds a new scheduler feature and CLI option exit_if_fraction_inputs_left. This
applies to -core_sharded and -core_serial modes. When an input reaches
EOF, if the number of non-EOF inputs left as a fraction of the original
inputs is equal to or less than this value then the scheduler exits
(sets all outputs to EOF) rather than finishing off the final inputs.
This helps avoid long sequences of idles during staggered endings with
fewer inputs left than cores and only a small fraction of the total
instructions left in those inputs.

The default value in scheduler_options_t and the CLI option is 0.05 (i.e., 5%),
which when tested on an large internal trace helps eliminate much of the
final idle time from the cores without losing many instructions.

Compare the numbers below for today's default with a long idle time and
so distinct differences between the "cpu busy by time" and "cpu busy by
time, ignoring idle past last instr" stats on a 39-core schedule-stats
run of a moderately large trace, with key stats and the 1st 2 cores (for
brevity) shown here:

```
  1567052521 instructions
   878027975 idles
       64.09% cpu busy by record count
       82.38% cpu busy by time
       96.81% cpu busy by time, ignoring idle past last instr
Core #0 schedule: CccccccOXHhUuuuuAaSEOGOWEWQqqqFffIiTETENWwwOWEeeeeeeACMmTQFfOWLWVvvvvFQqqqqYOWOooOWOYOYQOWO_O_W_O_W_O_W_O_WO_WO_O_O_O_O_O_OR_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_R_RY_YyyyySUuuOSISO_S_S_SOPpSOKO_KO_KCcDKWDB_B_____________________________________________ 
Core #1 schedule: KkLWSFUQPDddddddddXxSUSVRJWKkRNJBWUWwwTttGgRNKkkRWNTtFRWKkRNWUuuGULRFSRSYKkkkRYAYFffGSRYHRYHNWMDddddddddRYGgggggYHNWK_YAHYNnGYSNHWwwwwSWSNKSYyyWKNNWKNNGAKWGggNnNW_NNWE_E_EF__________________________________________________
```

And now with -exit_if_fraction_inputs_left 0.05, where we lose (1567052521 -
1564522227)/1567052521. = 0.16% of the instructions but drastically
reduce the tail from 14% of the time to less than 1% of the time:

```
  1564522227 instructions
   120512812 idles
       92.85% cpu busy by record count
       96.39% cpu busy by time
       97.46% cpu busy by time, ignoring idle past last instr
Core #0 schedule: CccccccOXHKYEGGETRARrrPRTVvvvRrrNWwwOOKWVRRrPBbbXUVvvvvvOWKVLWVvvJjSOWKVUuTIiiiFPpppKAaaMFfffAHOKWAaGNBOWKAPPOABCWKPWOKWPCXxxxZOWKCccJSOSWKJUYRCOWKCcSOSUKkkkOROK_O_O_O_O_O 
Core #1 schedule: KkLWSMmmFLSFffffffJjWBbGBUuuuuuuuuuuBDBJJRJWKkRNJWMBKkkRNWKkRNWKkkkRNWXxxxxxZOooAaUIiTHhhhSDNnnnHZzQNnnRNWXxxxxxRNWUuuRNWKXUuXRNKRWKNXxxRWKONNHRKWONURKWXRKXRKNW_KR_KkRK_KRKR_R_R_R_R_R_R_R_R_R_R_R__R__R__R___R___R___R___R___R
```

Fixes #6959
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant