-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
drmemtrace scheduler should synthesize headers for cores that start idle? #6703
Comments
derekbruening
added a commit
that referenced
this issue
Mar 12, 2024
Multiple changes to allow the record filter to operate in core-sharded fashion: Makes the pc2encoding table per-input, as one input can migrate across multiple core shards and thus one core can see a later instruction without ever having seen its encoding. To handle synchronization, there is no C++11 std:: rwlock, so we use mutexes -- but we limit their use to per-context-switch for the added global lock, and we assume there is no contention for the per-input lock as only one shard operates on one input at any one time. Sets the memref counter reader to core_sharded_ to avoid asserts. Appends footer records to ending-in-idle-record cores. Adds an error check ensuring a single workload, as multiple will require expanding the keys used in some tables. Renames the output files to include "core.<shard_index>" and not the tid. This is surprisingly complex, as an input filename is needed to determine the output filename compression type: yet not all shards are guaranteed to have an input at the start. A condition variable and mutex are used to coordinate this among shards. Adds support for started-idle cores by synthesizing headers in record_filter; #6703 covers having the scheduler do this for all analyzers. Adds the version as another field available up front from the scheduler, and adds an idle-tid sentinel needed to be distinct from INVALID_THREAD_ID. Adds two end-to-end tests, one with a single-threaded app scheduled onto 4 cores to test start-idle cores and one to test multiple threads. Adds a macro to share code with the existing end-to-end test. Updates the unit test mock classes. Issue: #6635, #6703
derekbruening
added a commit
that referenced
this issue
Mar 13, 2024
) Multiple changes to allow the record filter to operate in core-sharded fashion: Makes the pc2encoding table per-input, as one input can migrate across multiple core shards and thus one core can see a later instruction without ever having seen its encoding. To handle synchronization, there is no C++11 std:: rwlock, so we use mutexes -- but we limit their use to per-context-switch for the added global lock, and we assume there is no contention for the per-input lock as only one shard operates on one input at any one time. Sets the memref counter reader to core_sharded_ to avoid asserts. Appends footer records to ending-in-idle-record cores. Adds an error check ensuring a single workload, as multiple will require expanding the keys used in some tables. Renames the output files to include "core.<shard_index>" and not the tid. This is surprisingly complex, as an input filename is needed to determine the output filename compression type: yet not all shards are guaranteed to have an input at the start. A condition variable and mutex are used to coordinate this among shards. Adds support for started-idle cores by synthesizing headers in record_filter; #6703 covers having the scheduler do this for all analyzers. Adds the version as another field available up front from the scheduler, and adds an idle-tid sentinel needed to be distinct from INVALID_THREAD_ID. Adds two end-to-end tests, one with a single-threaded app scheduled onto 4 cores to test start-idle cores and one to test multiple threads. Adds a macro to share code with the existing end-to-end test. Updates the unit test mock classes. Issue: #6635, #6703
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The drmemtrace framework stores key info in headers at the start of each trace file. This is meant to be accessible to parallel worker threads in analysis tools and simulators. However, when there are more output streams than inputs, some outputs start idle and so have no headers at all. (These outputs might later host inputs, so they will not necessarily always be idle.) This causes problems for analyzers that need to know the version or filetype or cache line size or chunk size or whatnot in every shard. One solution could be for the scheduler to always read ahead to the first timestamp, store global values of the common header records, and synthesize headers in outputs that start idle.
Even this is not enough for record_filter to operate core-sharded in #6635 as it also needs the input filename extension so it knows how to compress the outputs, but that particular detail seems reasonable to leave as a burden for record_filter.
The text was updated successfully, but these errors were encountered: