-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add core-oriented drmemtrace iterator #5694
Labels
Comments
derekbruening
added a commit
that referenced
this issue
May 12, 2023
Adds replay of the as-traced schedule using the cpu_schedule.zip file written by raw2trace. That file is converted into the record+replay format for dynamic schedules and the existing replay code leveraged. This implements as-traced cpu-oriented parallel iteration in the scheduler which is part of #5694. Adding analysis tool support for this will be done separately. Updates all replays (whether as-traced or as-previously) to consider DEPENDENCY_TIMESTAMPS to indicate whether to have one output wait if it gets ahead of another's timestamp. Adds a missing check for output count mismatches on as-previously replay. Issue: #5843, #5694
derekbruening
added a commit
that referenced
this issue
May 15, 2023
Adds replay of the as-traced schedule using the cpu_schedule.zip file written by raw2trace. That file is converted into the record+replay format for dynamic schedules and the existing replay code leveraged. This implements as-traced cpu-oriented parallel iteration in the scheduler which is part of #5694. Adding analysis tool support for this will be done separately. Updates all replays (whether as-traced or as-previously) to consider DEPENDENCY_TIMESTAMPS to indicate whether to have one output wait if it gets ahead of another's timestamp. Removes STATUS_IDLE since we always use STATUS_WAIT. We're between releases so we can delete from the public enum for this still-under-development component without worrying about binary compatibility. Renames schedule_entry_t.instr_count to start_instruction to make it clearer it is an ordinal and not a duration. Adds a missing check for output count mismatches on as-previously replay. Adds unit tests of synthetic and from-file replay. Issue: #5843, #5694
derekbruening
added a commit
that referenced
this issue
Aug 25, 2023
Adds a new type of sharding for drmemtrace analysis tools: by core instead of by thread. Introduces a shard_type_t enum (SHARD_BY_THREAD and SHARD_BY_CORE) passed to a new analysis_tool_t::initialize_shard_type() function to inform tools of the shard type (this cannot be easily added to the stream interface as the scheduler is not aware of the shard type). Adds a new memtrace_stream_t::get_output_cpuid() query to get the output cpu ordinal, or for replaying as-traced the original traced cpuid (#6262). Implements this for the scheduler. Generalizes analyzer_t to take in scheduler options for SHARD_BY_CORE to support analysis tools using the full range of schedules. In this mode, the core count is the worker count. Updates the shard index to be the core ordinal. Adds time-based scheduling support with analyzer_t using wall-clock time as the current time. Adds a number of options to set sharding mode (-core_sharding, -core_serial (not yet implemented)) and control the schedule (-sched_quantum, -sched_time, sched_order_time, -record_file, -replay_file, -cpu_schedule_file). Updates the basic_counts tool to support core sharding. Adds a new test core_sharded_test which leverages the analyzer_multi and option parsing to test the top-level options within a framework that can capture the output and run multiple tests sequentially in a simpler framework than having a separate test with an output file for each parameter being tested. Left as future work: + Convert scheduler_launcher into a new schedule_stats tool + Add a new record to indicate STATUS_WAIT + Add -core_serial support + Convert drcachesim default and -cpu_scheduling to use get_output_cpuid() Issue: #5694
derekbruening
added a commit
that referenced
this issue
Aug 31, 2023
Adds a new type of sharding for drmemtrace analysis tools: by core instead of by thread. Introduces a shard_type_t enum (SHARD_BY_THREAD and SHARD_BY_CORE) passed to a new analysis_tool_t::initialize_shard_type() function to inform tools of the shard type (this cannot be easily added to the stream interface as the scheduler is not aware of the shard type). Adds a new memtrace_stream_t::get_output_cpuid() query to get the output cpu ordinal, or for replaying as-traced the original traced cpuid (#6262). Implements this for the scheduler. This addresses #6262. Generalizes analyzer_t to take in scheduler options for SHARD_BY_CORE to support analysis tools using the full range of schedules. In this mode, the core count is the worker count. Updates the shard index to be the core ordinal. Adds time-based scheduling support with analyzer_t using wall-clock time as the current time. Adds a number of options to set sharding mode (-core_sharding, -core_serial (not yet implemented)) and control the schedule (-sched_quantum, -sched_time, sched_order_time, -record_file, -replay_file, -cpu_schedule_file). Updates the basic_counts tool to support core sharding. Adds a new test core_sharded_test which leverages the analyzer_multi and option parsing to test the top-level options within a framework that can capture the output and run multiple tests sequentially in a simpler framework than having a separate test with an output file for each parameter being tested. Left as future work: + Convert scheduler_launcher into a new schedule_stats tool + Add a new record to indicate STATUS_WAIT + Add -core_serial support + Convert drcachesim default and -cpu_scheduling to use get_output_cpuid() Issue: #5694, #6262 Fixes #6262
derekbruening
added a commit
that referenced
this issue
Sep 21, 2023
Adds first-class counting of threads per shard in the basic_counts tool, for use with core-sharded operation. Updates the core-sharded tests with sanity checks. Issue: #5694
derekbruening
added a commit
that referenced
this issue
Sep 22, 2023
Adds first-class counting of threads per shard in the basic_counts tool, for use with core-sharded operation. Updates the core-sharded tests with sanity checks. Issue: #5694
derekbruening
added a commit
that referenced
this issue
Sep 29, 2023
Adds several routines to the memtrace_stream_t interface for drmemtrace analysis tools in core-sharded mode: + get_workload_id() + get_input_id() + get_input_interface() Adds a new analysis_unit_tests executable with some sanity tests. Splits out the mock_reader_t and helpers used by scheduler_unit_tests to share them with the new test. Documents the additions. Fixes a bug with core-sharded analysis tools where parallel_shard_exit was called for every thread, resulting use-after-frees when more than one thread was on a core and the tool deletes its shard data structure in the parallel_shard_exit routine (most of our tools do not, which is why this was not noticed before). Issue: #5694
derekbruening
added a commit
that referenced
this issue
Oct 2, 2023
Adds several routines to the memtrace_stream_t interface for drmemtrace analysis tools in core-sharded mode: + get_workload_id() + get_input_id() + get_input_interface() Adds a new analysis_unit_tests executable with some sanity tests. Splits out the mock_reader_t and helpers used by scheduler_unit_tests to share them with the new test. Documents the additions. Fixes a bug with core-sharded analysis tools where parallel_shard_exit was called for every thread, resulting use-after-frees when more than one thread was on a core and the tool deletes its shard data structure in the parallel_shard_exit routine (most of our tools do not, which is why this was not noticed before). Issue: #5694
derekbruening
added a commit
that referenced
this issue
Nov 7, 2023
Adds a new TRACE_MARKER_TYPE_WAIT marker which is a synthetic marker inserted in core-sharded drmemtrace analysis tool mode when the scheduler returns STATUS_WAIT. This is meant for tools which analyze schedules themselves. Adds a unit test. Issue: #5694
derekbruening
added a commit
that referenced
this issue
Nov 8, 2023
Adds a new TRACE_MARKER_TYPE_CORE_WAIT marker which is a synthetic marker inserted in core-sharded drmemtrace analysis tool mode when the scheduler returns STATUS_WAIT. This is meant for tools which analyze schedules themselves. Adds a unit test. Issue: #5694
derekbruening
added a commit
that referenced
this issue
Dec 20, 2023
Adds a new scheduler option single_lockstep_output which multiplexes the virtual core output streams onto a single global stream. This is simple to implement as the existing scheduler_t::stream_t class already multiplexes inputs onto an output. Hooks up the drcachesim launcher -core_serial option to this new scheduler mode. Updates the schedule_stats, basic_counts, and cache_simulator tools to support core_serial. For cache_simulator, the existing thread-to-core mapping code for round-robin and for -cpu_scheduling is kept for when in thread-sharded mode; in core-sharded mode, the scheduler's cpuid is mapped to a core index. Adds a core_serial test of schedule_stats and basic_counts and a test of cache_simulator using the scheduler's -cpu_schedule_file as-traced mode. Adds some dr$sim unit tests for cpuid to core mapping and error modes. Issue: #5694
derekbruening
added a commit
that referenced
this issue
Dec 21, 2023
Adds a new scheduler option single_lockstep_output which multiplexes the virtual core output streams onto a single global stream. This is simple to implement as the existing scheduler_t::stream_t class already multiplexes inputs onto an output. Hooks up the drcachesim launcher -core_serial option to this new scheduler mode. Updates the schedule_stats, basic_counts, and cache_simulator tools to support core_serial. For cache_simulator, the existing thread-to-core mapping code for round-robin and for -cpu_scheduling is kept for when in thread-sharded mode; in core-sharded mode, the scheduler's cpuid is mapped to a core index. Adds a core_serial test of schedule_stats and basic_counts and a test of cache_simulator using the scheduler's -cpu_schedule_file as-traced mode. Adds some dr$sim unit tests for cpuid to core mapping and error modes. Issue: #5694
derekbruening
added a commit
that referenced
this issue
Jan 19, 2024
Adds 2 new memtrace_stream_t interfaces to simplify generalizing tools to handle either thread or core sharded operation: + get_shard_index() returns a 0-based shard ordinal regardless of whether core-sharded or thread-sharded. + get_input_tid() returns the thread id of the current input. This is a convenience method for use in parallel_shard_init_stream() prior to access to any memref_t records. Changes an existing interface: + Guarantees that the shard_index passed to parallel_shard_init_stream() is a 0-based ordinal. Implements the 2 new interfaces in the scheduler and adds two new interface there: + get_output_stream_ordinal() to get the underlying output when using single_lockstep_output. + get_output_cpuid(ord) taking in an ordinal so the analyzer or other user can get the cpuids statically when using single_lockstep_output. Removes dr$sim's manual mapping of cpuid to core index in favor of using the new get_shard_index(). Updates all the analysis tools to use the new interfaces and to generalize their code to either handle both thread and core shards (reuse_time, reuse_distance, basic_counts, histogram, opcode_mix, syscall_mix, record_filter) or explicitly return an error for core-sharded modes (func_view, invariant_checker). (schedule_stats and record_filter needed no changes.) Adds some sanity tests on the new interfaces. Adds a new end-to-end test running the newly-updated tools as -core_sharded. Issue: #5694
derekbruening
added a commit
that referenced
this issue
Jan 25, 2024
Adds 2 new memtrace_stream_t interfaces to simplify generalizing tools to handle either thread or core sharded operation: + get_shard_index() returns a 0-based shard ordinal regardless of whether core-sharded or thread-sharded. + get_tid() returns the thread id of the current input. This is a convenience method for use in parallel_shard_init_stream() prior to access to any memref_t records. For online analysis where there's a single input, the scheduler remembers and returns the last memref.data.tid for get_tid() and uses the dynamic tid discovery order for get_shard_index(). Changes an existing interface: + Guarantees that the shard_index passed to parallel_shard_init_stream() is a 0-based ordinal. Implements the 2 new interfaces in the scheduler and adds two new interface there: + get_output_stream_ordinal() to get the underlying output when using single_lockstep_output. + get_output_cpuid(ord) taking in an ordinal so the analyzer or other user can get the cpuids statically when using single_lockstep_output. Analysis tools must dynamically discover the cpuids (stopped short of making this a memtrace_stream_t interface, as analysis tools in general must dynamically discover most things already). Removes dr$sim's manual mapping of cpuid to core index in favor of using the new get_shard_index(). Updates all the analysis tools to use the new interfaces and to generalize their code to either handle both thread and core shards (reuse_time, reuse_distance, basic_counts, histogram, opcode_mix, syscall_mix, record_filter) or explicitly return an error for core-sharded modes (func_view, invariant_checker). (schedule_stats and record_filter needed no changes.) Updates several unit tests to handle these changes: + Expands the default_memtrace_stream_t to be suitable as a mock stream for unit tests with the new interfaces. + Skips invariant stream checks for the mock stream by checking its input interface, since the stream itself is no longer null. + Fixes drcachesim unit tests which were not initializing tid. Adds some sanity tests on the new interfaces. Adds a new end-to-end test running the newly-updated tools as -core_sharded. Limits the reuse_time histogram printing output to avoid hanging CMake's regex matcher in this test. Issue: #5694
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When drmemtrace tools want to analyze a trace using the recorded schedule they may want to look at hardware threads rather than software threads. An iterator over each core would be useful for this purpose. For interactions with fast seeking #5538 the plan is to generate a schedule summary file that contains
<tid, timestamp, cpuid, instr-count>
tuples to make it possible to compute instruction counts on a core across multiple software threads.The text was updated successfully, but these errors were encountered: