-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trilinos: Enable build statistics #7376
Comments
This is going to be quite nice. |
Nice! |
PR #7377 has merged, so the tools used to collect the stats are in the repo. The next step is sorting out how to get CMake to use the statistic gathering compiler wrappers. If anyone has comments, this is what I've outlined to go over w/Ross.
|
…nos#7376) I also changed the logic in how the compiler wrappers are generated a little.
@jjellio, I talked with Zack Galbreath at Kitware today and he mentioned that there is another option for uploading files from and downloading files from CDash. That is the For example, you could define a URL like: (the Zack is going to add a strong automated test to CDash to make sure that you can download those files, one at a time. To support this, I could add a hook to We will need to do some testing of this to work this out, but if this works, then any build that is run with the But even if we use Anyway, I have some more work to do before PR #7508 is ready to merge so I will get to it. |
…s disable logic (trilinos#7376) Now the package TrilinosBuildStats will get forced set to OFF if <Project>_USE_BUILD_STATS_WRAPPERS=ON is not set.
I actually tried to pass the cdash file to my github.io page: It fails due to security policies that block javascript from loading files from a domain other than the script... I'm not browser savvy enough to know what to do about it.... it would be nice if I could work around that. But I guess if all else failed, the webpage could get hosted inside SNL (maybe that would avoid the security issue). Even if I work around that security stuff, I'll still need to figure out how to decode a tarball (that should be doable, I see javascript libraries for it) |
@jjellio, worst-case scenario, developers could just download the 'build_stats.csv' file off of CDash and then upload it to your site when they are doing deeper analysis. Otherwise, we can ask Kitware for help with the web issues. But developers are not going to bother looking at any data unless they think there is a problem. That is what we can address with filling out the test
Such a tool needs to know how to map file names to TriBITS packages. There is already code in TriBITS that can do that. Are you okay with me taking a crack at writing an initial version of What do you think? |
So it turns out that CDash does not currently support downloading files from CDash uploaded using the with the files (and URL) viewed at: However, it does look like CDash supports downloading files uploaded to a test using the if you get the JSON from: you see (pretty printed):
So it looks like you can get that data in Python (converted to recursive list/dict datastructure) and you can loop over the dicts in
That dict is Given that 'fileid' field value of '1', you can then download the data using the URL: You can fund your way to this test, for example, by knowing the CDash Group, Site, Build Name, and Build Start Time and plug those into this query: The JSON for that is shown at: which has the element:
which has So there you have it. If you know the following fields:
you can find the test So we can do what we need by attaching the file to a test. But it is a bit of a run-around to find what we need. It would be more straightforward to to upload with So for now, I would suggest that we just go with uploading the 'build_stats.csv' file to the test |
FYI: Further discussion about CDash upload and download options should occur in newly created issue: so we can get some direct help/advice from Kitware. |
Ross, I think the summarize would be better (for maintenance/extensibility) implemented as script CMake calls. That (optionally promising to generate a file if needed) If there was a dummy script, I can conceive how to implement I do think there would be value in showing package-level aggregates:
Size in particular is helpful, as it indicates how much filesystem servers need. All of the above can be implemented via BASH I think, just a few loops + cut/grep/awk (which are coreutils so will always be present on machines) |
@jjellio, yes, that is exactly what I was suggesting.
Such a tool would be very hard to write, test, and maintain in bash. Do you have something against Python? Just to get this started, I will add simple Python script:
that will just provide project-level stats:
That will avoid needing to deal with the package logic for now. We can always add package-level stats later when we have the time (and that will require using some TriBITS utilities to convert from file paths to package names). That way, we can turn this on for PR testing now and merge PR #7508. Okay? |
I have no issues with Python other than you have to be aware of 2.x vs 3.x stuff. I love python's regex library. Python 3.x with 'format strings' is awesome. e.g., Tangents below: Another issue to consider is how to interact with developers. I'll need to improve the webpage (better explanations, and styling for sure). Yet another issue: can you use the info here, to feedback into Ninja or CMake to improve our build system. This could be an interesting question for Kitware. E.g., if we could provide a list of targets (file.o things) plus a weight. Could Kitware use that orchestrate a good Ninja file. Or perhaps we could do that ourselves (I already have dome something similar). Given weights + num_parallel_procs, coerce the existing build.ninja such that a certain memory highwater mark is avoided. (it's effectively a variant of the knapsack packing problem I believe) @rmmilewi (CC Reed, this may be something he'd like to be abreast of) |
…sable logic (trilinos#7376) Now, the package TrilinosBuildStats will get enabled by default if <Project>_ENABLE_BUILD_STATS=ON is set but the package TrilinosBuildStatus will not get disabled if <Project>_ENABLE_BUILD_STATS=ON is not set. But the test TrilinosBuildStats_Results will only get enabled if <Project>_ENABLE_BUILD_STATS=ON.
Also factored out small file BuildStatsSharedVars.cmake to avoid duplication. I did this for two reasons: 1. This code is really quite independent from the code that creates the wrappers. 2. Future projects that use these build stats suuport code (once this gets pulled out of Trilinos and put into its own repo) may want to generate the build stats but not bother with a gather-build-stats target.
…nos#7376) This makes the 'gather-build-stat' target completely quiet. This responds to feedback from @jjellio that the make command was a bit verbose (and I agree). But I added the -v option and I used that in the TrilinosBuildStats_Results test so that you can see the statistics so it will be shown on CDash.
…y default in PR builds (trilinos#7376) Now that gather_build_stats.py is super robust, it should be fine to pick up old *.timing files with different sets of headers and be in all types of messed up states.
…ilds (trilinos#7376) With the updated magic_wrapper.py, this should be safe to do. Individual drivers can still set Trilinos_ENABLE_BUILD_STATS=OFF if they want. This is just the default. I also remove an obsolete Trilinos_CTEST_DO_ALL_AT_ONCE=TRUE since that has been the default for many years.
) In commit 6f2afd5 Matt Bettencourt <[email protected]> tried to update this to clang-10 but that does not magically update what is actaully being tested on CDash and therefore does not make this a supported build. Someone needs to actually add the driver scripts and update the Jenkins jobs (and clean up any failing Trilinos tests). Making this change allows one to run 'ctest-s-local-test-driver.sh all' properly (as I was trying to do with trilinos#7376).
) In commit 36b53f6 jmgate tried to update this to clang-10 but that does not automatically update what is actaully being run on jenkins and displaysed on CDash and therefore listed builds in this file along does not make them supported builds. Someone needs to actually add the driver scripts for these builds under cmake/ctest/drivers/atdm/sems-rhel7/drivers/ and update the Jenkins jobs, and triage any new failing Trilinos tests. Making this change allows one to run 'ctest-s-local-test-driver.sh all' properly (as I was trying to do with trilinos#7376).
…7376) This should result in the full test results to be uploaded and displayed on CDash, even for Trilinos PR testing that does not use tribits_ctest_driver().
This merges in the state of Trilinos 'develop' from 'atdm-nightly' from testing day 2021-06-10.
…ilds (trilinos#7376) With the updated magic_wrapper.py, this should be safe to do. Individual drivers can still set Trilinos_ENABLE_BUILD_STATS=OFF if they want. This is just the default. I also remove an obsolete Trilinos_CTEST_DO_ALL_AT_ONCE=TRUE since that has been the default for many years.
) In commit 36b53f6 jmgate tried to update this to clang-10 but that does not automatically update what is actaully being run on jenkins and displaysed on CDash and therefore listed builds in this file along does not make them supported builds. Someone needs to actually add the driver scripts for these builds under cmake/ctest/drivers/atdm/sems-rhel7/drivers/ and update the Jenkins jobs, and triage any new failing Trilinos tests. Making this change allows one to run 'ctest-s-local-test-driver.sh all' properly (as I was trying to do with trilinos#7376).
…7376) This should result in the full test results to be uploaded and displayed on CDash, even for Trilinos PR testing that does not use tribits_ctest_driver().
This merges in the state of Trilinos 'develop' from 'atdm-nightly' from testing day 2021-06-10.
…ilinos#7376) Commenting out these builds just avoids people running them with: ./ctest-s-local-test-driver.sh all It does not impact what currently runs on jenkins and submits to CDash. (No sense beating a dead horse.)
Update build stats and turn on in all ATDM Trilinos builds!
CC: @jjellio A glorious day. The PR #8638 has finally been merged! This turns on the build stats wrappers in all of the ATDM Trilinos builds (when running test ctest -S driver) and in all of the Trilinos PR builds. We can see 141 submissions of the test And we are starting to see new PRs running this looking at this query. We need to keep an eye on the PR builds for a few days. It would be nice to break the build-stats summary reported in the |
…s:develop' (16c177b). * trilinos-develop: (71 commits) Tpetra: Remove some output from the Bug7758 test MueLu Stratimikos adapter: Enable half precision for factory-style PLs Tpetra: remove some deprecated usage ROL: implement the apply function for Thyra Vector Piro: changes to ROL adapters comply with ROL changes Piro: bug-fix in Piro::NOX_Solver Ifpack2: disabling tests causing build errors with extended scalar types (see issue trilinos#9280). Ifpack2: cleaning up unused variables in tests. Ctest: Adding Amesos2/Belos tests Ctest: Stuff failing on ride that worked on ascicgpu Ctest: Enabling non-UVM Ifpack2 tests Ifpack2: changing GO to the one in Tpetra_Details_DefaultTypes.hpp. Disable support for Makefile.export.* files (trilinos#8498) Tpetra: remove unused variable (copied too many times when breaking up a function) ats2: Comment out listing of long-broken XL builds (trilinos#9270, trilinos#7376) Ifpack2: adding missing logic for new tests. Belos: writing tests for 'long double' and 'float128' ScalarType. STK: Snapshot 06-11-21 17:50 Tpetra: remove comments that don't apply to HIP Tpetra: Use HIPSpace for HIPWrapperNode ...
…s:develop' (16c177b). * trilinos-develop: (71 commits) Tpetra: Remove some output from the Bug7758 test MueLu Stratimikos adapter: Enable half precision for factory-style PLs Tpetra: remove some deprecated usage ROL: implement the apply function for Thyra Vector Piro: changes to ROL adapters comply with ROL changes Piro: bug-fix in Piro::NOX_Solver Ifpack2: disabling tests causing build errors with extended scalar types (see issue trilinos#9280). Ifpack2: cleaning up unused variables in tests. Ctest: Adding Amesos2/Belos tests Ctest: Stuff failing on ride that worked on ascicgpu Ctest: Enabling non-UVM Ifpack2 tests Ifpack2: changing GO to the one in Tpetra_Details_DefaultTypes.hpp. Disable support for Makefile.export.* files (trilinos#8498) Tpetra: remove unused variable (copied too many times when breaking up a function) ats2: Comment out listing of long-broken XL builds (trilinos#9270, trilinos#7376) Ifpack2: adding missing logic for new tests. Belos: writing tests for 'long double' and 'float128' ScalarType. STK: Snapshot 06-11-21 17:50 Tpetra: remove comments that don't apply to HIP Tpetra: Use HIPSpace for HIPWrapperNode ...
CC: @prwolfe, @jwillenbring @jjellio, it occurred to me that adding begin and end time stamp fields to the Having the build start and end time stamps also has other uses as well. For example, when doing a rebuild with old targets lying around, if you only want to report build stats for targets that got built on a rebuild, you could add an argument to |
…develop' (7591b32). * trilinos/develop: (77 commits) zoltan2: fix memory leak when sizeof(SCOTCH_Num) == sizeof(lno_t) trilinos#9312 Tpetra: Remove some output from the Bug7758 test MueLu Stratimikos adapter: Enable half precision for factory-style PLs Tpetra: remove some deprecated usage Fixed some deprecated code MueLu Thyra adapter: Allow construction of half precision operator ROL: implement the apply function for Thyra Vector Piro: changes to ROL adapters comply with ROL changes Piro: bug-fix in Piro::NOX_Solver MueLu: Print Scalar in MG Summary for high and extreme verbosity Ifpack2: disabling tests causing build errors with extended scalar types (see issue trilinos#9280). Ifpack2: cleaning up unused variables in tests. Ctest: Adding Amesos2/Belos tests Ctest: Stuff failing on ride that worked on ascicgpu Ctest: Enabling non-UVM Ifpack2 tests Ifpack2: changing GO to the one in Tpetra_Details_DefaultTypes.hpp. Disable support for Makefile.export.* files (trilinos#8498) Tpetra: remove unused variable (copied too many times when breaking up a function) ats2: Comment out listing of long-broken XL builds (trilinos#9270, trilinos#7376) Ifpack2: adding missing logic for new tests. ...
CC: @jjellio, @jwillenbring So I have a Trilinos PR #9894 that is stuck in loop of failed builds due to the compiler crashing running out of memory. Following on from the discussion above, it occurred to me that if you store the beginning and end time stamps for each target in the
This would show how close a build is to running out of memory on a given machine and we could plot that number as a function of time. In fact, we could have the CTest test that runs
using XML in the STDOUT like:
Then you could see a graph of these measurements over time right on CDash! |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
FYI: Kitware is adding build stats support to native CMake, CTest, and CDash. See: Therefore, I think there will be no need for a separate compiler wrapper tool to gather build stats or scripts to manage that and submit it to CDash. |
Enhancement
This Issue is for tracking any effects from a PR I am submitting with tools for collecting very detailed build statistics, that seems to have near zero cost. A goal of this work is to enable packages/product owners to understand how their packages impact compile time, memory usage, and file size (among others).
The impact of using the tool seems to be zero. That is, the tool's overhead sits entirely inside the noise of the build. I did two builds on Rzansel, both use Cuda + Serial, which is the standard ATDM Trilinos settings (I actually built EMPIRE using the tool as well)
Updated Data (for the more complicated python wrappers + NM usage)
NM ON
NM OFF
Clearly using python has a price... but this is still on pretty tiny.
A pass through the code for efficiency is planned (maybe move to in-memory files for the temporaries)
Path forward
The scripts work by wrapper '$MPICC' inside the ATDM trilinos ENV. CMake then uses these 'wrapped' compilers. The wrapped compilers emit copious data in the build tree along side the object file/library/executable that is created. After the build is complete, these 'timing' files are then aggregated into one massive CSV file. On Rzansel, the CSV is about 1.8MB, and it has lines equal to the number of things built.
To prevent the wrappers from tampering with CMake's configure phase, I've added a single line to CMakeLists.txt which sets an ENV
CMAKE_IS_IN_CONFIGURE_MODE
, this allows the wrappers to toggle on/off based on whether a real build is happening, versus the configuration phase.One idea for making this work, is to have CTest post the resulting build statistics files directory to CDash along with any testing data. For customers not posting, I can help provide a script that will aggregate the data manually.
Once the CSV data is posted, then others can develop tools for tracking this over time. I also have some tools that operate directly on the files (using Javascript).
The tool tracks:
This issue is also tracked in CDOFA-119.
@bartlettroscoe @jwillenbring
Links
Related to
Tasks:
Trilinos_ENABLE_BUILD_STATS=ON
in all of the Trilinos PR builds?magic_wrapper.py
script for the broken intel builds (see Build stats compiler wrappers and summary (#7376, CDOFA-119) #7508 (comment) and See merged PR Trilinos: Fix/Update build stats tools #8638).magic_wrapper.py
script for wrapping AR so that we can get build statistics for static libraries as well (and put in hooks in Trilinos CMake build system to use this for static builds, see See merged PR Trilinos: Fix/Update build stats tools #8638).find_program
commands for tools being wrapped if they are undefined (applies to AR/Ranlib/LD) ... See See merged PR Trilinos: Fix/Update build stats tools #8638*.timing
files into thebuild_stats.csv
file to instead merge by the column names so it could be robust to when the set (and location) of fields changes in the*.timing
files for older versions of themagic_wrapper.py
tool. See merged PR Trilinos: Fix/Update build stats tools #8638export Trilinos_ENABLE_BUILD_STATS=ON
for all of the remaining ATDM Trilinos builds (ctest -S driver only by default). See merged PR Trilinos: Fix/Update build stats tools #8638summarize_build_stats.py
into libraries (i.e.*.a
or*.so.*
files), executables (i.e.*.exe
) and other files (mostly*.o
object files). That way, we can see where the big*.o
files are coming from that are creating big*.a
and.so
files and therefore big*.exe
files (and where the greatest expense lies). Also, this will allow you to see the expense of building just libraries compared to building tests and examples to some extent.The text was updated successfully, but these errors were encountered: