Trilinos: Enable build statistics #7376

jjellio · 2020-05-15T13:41:52Z

Enhancement

This Issue is for tracking any effects from a PR I am submitting with tools for collecting very detailed build statistics, that seems to have near zero cost. A goal of this work is to enable packages/product owners to understand how their packages impact compile time, memory usage, and file size (among others).

The impact of using the tool seems to be zero. That is, the tool's overhead sits entirely inside the noise of the build. I did two builds on Rzansel, both use Cuda + Serial, which is the standard ATDM Trilinos settings (I actually built EMPIRE using the tool as well)

Without build_stats: 53:34.98 (3214.98s)
With build_stats: 53:22.07 (3202.07s) -  0.5%

Updated Data (for the more complicated python wrappers + NM usage)

NM ON

---	configure	build (elapsed)	build (user time)
ON	45.08	1216.21 (2.5%)	58257.09 (2%)
OFF	43.51	1186	57089.23

NM OFF

---	configure	build (elapsed)	build (user time)
ON	45.82	1203.11 (1.2%)	57687.75 (1.1%)
OFF	44.55	1189.23	57062.09

Clearly using python has a price... but this is still on pretty tiny.
A pass through the code for efficiency is planned (maybe move to in-memory files for the temporaries)

Path forward

The scripts work by wrapper '$MPICC' inside the ATDM trilinos ENV. CMake then uses these 'wrapped' compilers. The wrapped compilers emit copious data in the build tree along side the object file/library/executable that is created. After the build is complete, these 'timing' files are then aggregated into one massive CSV file. On Rzansel, the CSV is about 1.8MB, and it has lines equal to the number of things built.

To prevent the wrappers from tampering with CMake's configure phase, I've added a single line to CMakeLists.txt which sets an ENV CMAKE_IS_IN_CONFIGURE_MODE, this allows the wrappers to toggle on/off based on whether a real build is happening, versus the configuration phase.

One idea for making this work, is to have CTest post the resulting build statistics files directory to CDash along with any testing data. For customers not posting, I can help provide a script that will aggregate the data manually.

Once the CSV data is posted, then others can develop tools for tracking this over time. I also have some tools that operate directly on the files (using Javascript).

The tool tracks:

FileSize

# data from /usr/bin/time regarding memory use and build time
# this is the memory highwater mark
max_resident_size_Kb
# this is the actual time the file took to compile
elapsed_real_time_sec

# more data from /usr/bin/time
avg_total_memory_used_Kb
num_major_page_faults
num_filesystem_inputs
exit_status
perc_cpu_used
avg_size_unshared_data_area_Kb
num_waits
avg_size_unshared_text_area_Kb
cpu_sec_user_mode
num_swapped
num_signals
num_involuntary_context_switch
num_minor_page_faults
num_socket_msg_sent
cpu_sec_kernel_mode
num_socket_msg_recv
num_filesystem_outputs

# data from nm -aS about symbols
symbol_stack_unwind
symbol_ro_data_local
symbol_unique_global
symbol_ro_data_global
symbol_text_global
symbol_text_local
symbol_debug

This issue is also tracked in CDOFA-119.

@bartlettroscoe @jwillenbring

Links

https://jjellio.github.io/build_stats/index.html?csv_file=/build_stats/trilinos.csv

Related to

SEPW-203

Tasks:

The text was updated successfully, but these errors were encountered:

bartlettroscoe · 2020-05-15T15:04:33Z

This is going to be quite nice.

csiefer2 · 2020-05-15T22:01:52Z

Nice!

jjellio · 2020-05-16T21:55:40Z

PR #7377 has merged, so the tools used to collect the stats are in the repo.

The next step is sorting out how to get CMake to use the statistic gathering compiler wrappers.

If anyone has comments, this is what I've outlined to go over w/Ross.

Wrapper creation.
- We discussed using an ENV variable to enable/disable the feature
- If done via CMake, then whatever changes needs to be propagatable to downstream clients (Sierra, EMPIRE, SPARC, …)
Wrapper preservation
- The wrappers make sense at build time, but Trilinos is a library consumed by customers - we want to enable this capability for those customers potentially
- If the wrappers get installed, then installed CMake packages needs to have their associated variables reset to match the installed wrappers.
  That is, we told cmake CXX is ./build_dir/build_stat_wrapeprs/wrapper_cxx,
  but now we need the installed Cmake files to have CXX as $install_dir/bin/build_stat_wrappers/wrapper_cxx
  (because the the installed Trilinos should never depend on the build dir)
Iron out how downstream customer can interact with this. (or defer this, and focus on showing capability w/Trilinos)
Data aggregation
- the wrappers leave *.timing files for everything a compiler creates (libraries, object code, executables)
- After building (make all finishes), we need to aggregate all *.timing files into a single file. (The data are just CSV files)
- The aggregation can be done outside Cmake (what I’ve done), by just adding 2 lines of bash.
  a. A more elegant solution would be to use a rule or some sort that fires after make all by default (or explicitly, make gather_build_stats)
Sort out what to do with aggregated data
- I think a good idea is to simply install this CSV file along with Trilinos
  a. If installing, this would fit naturally with the ‘rule’ to aggregate the data. That rule provides the stats.csv. (this rule just works w/install)
  b. Make sure this doesn’t break things if stats aren’t enabled!
- Another excellent idea is to have it posted to CDash. Ideally as a CSV file (not lumped inside Stdout, but as some file that is web-accessible, e.g., http://..../stats.csv).
  Optionally, we can also compress it:
  s992398:html jjellio$ du -hs trilinos.csv*
  1.7M trilinos.csv
  256K trilinos.csv.tar.bz2
  336K trilinos.csv.tar.gz
  256K trilinos.csv.tar.xz
- CDash/Posting is a dark-art to me, no idea how/what this entails

…inos#7376)

…nos#7376) I also changed the logic in how the compiler wrappers are generated a little.

…ilinos#7376)

bartlettroscoe · 2020-06-09T23:27:45Z

@jjellio, I posted WIP PR #7508 that gets the basic build wrappers in place. See the task list in that PR for next steps. I think with a few hours of work, we will have a very nice Minimal Viable Product that we can deploy and start using in a bunch of builds that post to CDash.

bartlettroscoe · 2020-06-10T23:37:58Z

@jjellio, I talked with Zack Galbreath at Kitware today and he mentioned that there is another option for uploading files from and downloading files from CDash. That is the ctest_upload() command. That command also allows you to define URLs that will be listed for the build. With that, I think would could provide the data and the hooks that are needed for your tool with the prototype at:

https://jjellio.github.io/build_stats/index.html?csv_file=/build_stats/trilinos.csv

For example, you could define a URL like:

https://jjellio.github.io/build_stats/index.html?cdash_url=https://testing-dev.sandia.gov/cdash/index.php????

(the ??? part yet to be filled in).

Zack is going to add a strong automated test to CDash to make sure that you can download those files, one at a time.

To support this, I could add a hook to tribits_ctest_driver() to call ctest_upload() with a custom list of files (and URLs).

We will need to do some testing of this to work this out, but if this works, then any build that is run with the tribits_ctest_driver() function would automatically support uploading the build stats file and the associated URL links to look at it in more detail. So we could easily support this for all of the ATDM Trilinos builds and all other builds that use the tribits_ctest_driver() function. But this would not work for Trilinos PR builds since those don't use the tribits_ctest_driver() function so we could not get them to build build stats. (But they could implement a call to ctest_upload().)

But even if we use ctest_upload() to upload the build_stats.csv file I think we still want a runtime test (TrilinosBuildStats_Results) that will summarize the most important stats in text form (like shown in #7508 (comment)) so we can search them with the "Test Output" filter field to the cdash/queryTests.php page and so we can put strong checks for these max values so we can fail the test if these numbers get too high. And with this latter part, you could even fail PR builds of the numbers get too high.

Anyway, I have some more work to do before PR #7508 is ready to merge so I will get to it.

…s disable logic (trilinos#7376) Now the package TrilinosBuildStats will get forced set to OFF if <Project>_USE_BUILD_STATS_WRAPPERS=ON is not set.

…os#7376)

jjellio · 2020-06-11T02:26:29Z

I actually tried to pass the cdash file to my github.io page:

https://jjellio.github.io/build_stats/index.html?csv_file=https://testing-dev.sandia.gov/cdash/api/v1/testDetails.php?buildtestid=18733695&fileid=1

It fails due to security policies that block javascript from loading files from a domain other than the script... I'm not browser savvy enough to know what to do about it.... it would be nice if I could work around that. But I guess if all else failed, the webpage could get hosted inside SNL (maybe that would avoid the security issue).

Even if I work around that security stuff, I'll still need to figure out how to decode a tarball (that should be doable, I see javascript libraries for it)

bartlettroscoe · 2020-06-11T13:12:56Z

@jjellio, worst-case scenario, developers could just download the 'build_stats.csv' file off of CDash and then upload it to your site when they are doing deeper analysis. Otherwise, we can ask Kitware for help with the web issues.

But developers are not going to bother looking at any data unless they think there is a problem. That is what we can address with filling out the test TrilinosBuildStats_Results to run a tool that summarizes the critical build stats. I suggested that in #7508 (comment). What I propose is to write a Python tool called summarize_build_stats.py that will read in the 'build_stats.csv' file and then produce, to STDOUT, a report like:

Full Project: max max_resident_size = <max_resident_size> (<file-name>)
Full Project: max elapsed_time: = <elapsed_time> (<file-name>)
Full Project: max file_size= <file_size> (<file-name.)

Kokkos: max max_resident_size = <max_resident_size> (<file-name>)
Kokkos: max elapsed_time: = <elapsed_time> (<file-name>)
Kokkos: max file_size= <file_size> (<file-name.)

Teuchos: max max_resident_size = <max_resident_size> (<file-name>)
Teuchos: max elapsed_time: = <elapsed_time> (<file-name>)
Teuchos: max file_size= <file_size> (<file-name.)

...

Panzer: max max_resident_size = <max_resident_size> (<file-name>)
Panzer: max elapsed_time: = <elapsed_time> (<file-name>)
Panzer: max file_size= <file_size> (<file-name.)

...

Such a tool needs to know how to map file names to TriBITS packages. There is already code in TriBITS that can do that.

Are you okay with me taking a crack at writing an initial version of summarize_build_stats.py? It would be better to write that as a TriBITS utility because then I could use MockTrilinos to write strong unit tests for it.

What do you think?

bartlettroscoe · 2020-06-11T15:30:25Z

@jjellio

So it turns out that CDash does not currently support downloading files from CDash uploaded using the ctset_upload() command like you see here:

https://open.cdash.org/index.php?project=CDash&date=2020-06-11&filtercount=1&showfilters=1&field1=buildname&compare1=61&value1=ctest_upload_example

with the files (and URL) viewed at:

https://open.cdash.org/viewFiles.php?buildid=6586787

However, it does look like CDash supports downloading files uploaded to a test using the ATTACH_FILES ctest property. For example, for the trial build and submit shown at:

https://testing-dev.sandia.gov/cdash/test/18733864

if you get the JSON from:

https://testing-dev.sandia.gov/cdash/api/v1/testDetails.php?buildtestid=18733864

you see (pretty printed):

{
   ...
   test: {
      id: 8313,
      buildid: 5522160,
      build: "Linux-gnu-openmp-shared-dbg-pt",
      buildstarttime: "2020-06-09 15:45:48",
      site: "crf450.srn.sandia.gov",
      siteid: "187",
      test: "TrilinosBuildStats_Results",
      time: " 50ms",
      ...
      measurements: [
         {
            name: "Pass Reason",
            type: "text/string",
            value: "Required regular expression found.Regex=[OVERALL FINAL RESULT: TEST PASSED .TrilinosBuildStats_Results.<br />\n]"
         },
         {
            name: "Processors",
            type: "numeric/double",
            value: "1"
         },
         {
            name: "build_stats.csv",
            type: "file",
            fileid: 1,
            value: ""
         }
      ]
   },
   generationtime: 0.04
}

So it looks like you can get that data in Python (converted to recursive list/dict datastructure) and you can loop over the dicts in data['test']['measurements'] and find the file as:

         {
            name: "build_stats.csv",
            type: "file",
            fileid: 1,
            value: ""
         }

That dict is data['test']['measurements'][2] in this case.

Given that 'fileid' field value of '1', you can then download the data using the URL:

https://testing-dev.sandia.gov/cdash/api/v1/testDetails.php?buildtestid=18733695&fileid=1

You can fund your way to this test, for example, by knowing the CDash Group, Site, Build Name, and Build Start Time and plug those into this query:

https://testing-dev.sandia.gov/cdash/queryTests.php?project=Trilinos&filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=Experimental&field2=site&compare2=61&value2=crf450.srn.sandia.gov&field3=buildname&compare3=61&value3=Linux-gnu-openmp-shared-dbg-pt&field4=buildstarttime&compare4=81&value4=2020-06-09T15%3A47%3A22%20MDT&field5=testname&compare5=61&value5=TrilinosBuildStats_Results

The JSON for that is shown at:

https://testing-dev.sandia.gov/cdash/api/v1/queryTests.php?project=Trilinos&filtercount=5&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=Experimental&field2=site&compare2=61&value2=crf450.srn.sandia.gov&field3=buildname&compare3=61&value3=Linux-gnu-openmp-shared-dbg-pt&field4=buildstarttime&compare4=81&value4=2020-06-09T15%3A47%3A22%20MDT&field5=testname&compare5=61&value5=TrilinosBuildStats_Results

which has the element:

{
   ...
   builds: [
      {
         testname: "TrilinosBuildStats_Results",
         site: "crf450.srn.sandia.gov",
         buildName: "Linux-gnu-openmp-shared-dbg-pt",
         buildstarttime: "2020-06-09T15:47:22 MDT",
         time: 0.05,
         prettyTime: " 50ms",
         details: "Completed\n",
         siteLink: "viewSite.php?siteid=187",
         buildSummaryLink: "build/5522163",
         testDetailsLink: "test/18733864",
         status: "Passed",
         statusclass: "normal",
         nprocs: 1,
         procTime: 0.05,
         prettyProcTime: " 50ms"
      }
   ],
   ...
}

which has testDetailsLink: "test/18733864".

So there you have it. If you know the following fields:

Group
Site
Build Name
Build Start Time

you can find the test TrilinosBuildStats_Results for that build and download its attached 'build_stats.csv' file (as a tared and zipped file'build_stats.csv.tgz').

So we can do what we need by attaching the file to a test. But it is a bit of a run-around to find what we need.

It would be more straightforward to to upload with ctest_upload() and then directly download the file from CDash. But, again, CDash does not currently support that and the Trilinos PR ctest -S driver does not support that.

So for now, I would suggest that we just go with uploading the 'build_stats.csv' file to the test TrilinosBuildStats_Results and then downloading it from there for any automated tools.

bartlettroscoe · 2020-06-11T16:20:23Z

FYI: Further discussion about CDash upload and download options should occur in newly created issue:

https://gitlab.kitware.com/snl/project-1/-/issues/150

so we can get some direct help/advice from Kitware.

jjellio · 2020-06-11T16:30:23Z

Ross, I think the summarize would be better (for maintenance/extensibility) implemented as script CMake calls. That (optionally promising to generate a file if needed)

If there was a dummy script, commonTools/build_stats/summarize_build_stats.py (or wherever), then you could build the CMake stuff now, but later changes would just need to fiddle with that file.

I can conceive how to implement summarize_build_stats.py as just plain bash. Since the file is CSV, you'd head -n1 to get the header, store that as an array variable. Then search the header for the indexes of the metrics you want. Next, you'd have to have grep the file for FileName = packages/foo/. From that subset of the file, cut -fN, where N is the number from the header array. Pipe that through awk or bc to sum it up. Additionally, you could sort the subset matching the package, and select the top file for each metric. This would could be a fairly simple /bin/bash script. Python + CSV makes sense if you want complex analysis, but just package summaries perhaps it would be easier via BASH.

I do think there would be value in showing package-level aggregates:

Panzer: max max_resident_size = <max_resident_size> (<file-name>)
Panzer: max elapsed_time: = <elapsed_time> (<file-name>)
Panzer: max file_size= <file_size> (<file-name.)

Panzer: Total time:
Panzer Total Memory:
Panzer Total Size:

All Files: Total Time (this is effectively total build time)
All Memory (to be consistent)
All Size:  Roughly How much storage this build required

Size in particular is helpful, as it indicates how much filesystem servers need.

All of the above can be implemented via BASH I think, just a few loops + cut/grep/awk (which are coreutils so will always be present on machines)

bartlettroscoe · 2020-06-11T16:41:29Z

Ross, I think the summarize would be better (for maintenance/extensibility) implemented as script CMake calls. That (optionally promising to generate a file if needed)

@jjellio, yes, that is exactly what I was suggesting.

I can conceive how to implement summarize_build_stats.py as just plain bash

Such a tool would be very hard to write, test, and maintain in bash. Do you have something against Python?

Just to get this started, I will add simple Python script:

Trilinos/commonTools/build_stats/summarize_build_stats.py

that will just provide project-level stats:

Full Project: sum(max_resident_size_size_mb) = <sum_max_resident_size_mb> (<num-entries> entries)
Full Project: max(max_resident_size_size_mb) = <max_max_resident_size_mb> (<file-name>)
Full Project: max(elapsed_real_time_sec) = <max_elapsed_time_sec> (<file-name>)
Full Project: sum(elapsed_real_time_sec) = <sum_elapsed_time_sec> (<num-entries> entries)
Full Project: sum(file_size_mb) = <sum_file_size_mb> (<num-entries> entries)
Full Project: max(file_size_mb) = <max_file_size_mb> (<file-name>)

That will avoid needing to deal with the package logic for now. We can always add package-level stats later when we have the time (and that will require using some TriBITS utilities to convert from file paths to package names). That way, we can turn this on for PR testing now and merge PR #7508.

Okay?

jjellio · 2020-06-11T16:56:35Z

I have no issues with Python other than you have to be aware of 2.x vs 3.x stuff.

I love python's regex library. Python 3.x with 'format strings' is awesome. e.g.,f'A variable in scipe: {some_var}'

Tangents below:

Another issue to consider is how to interact with developers. I'll need to improve the webpage (better explanations, and styling for sure).

Yet another issue: can you use the info here, to feedback into Ninja or CMake to improve our build system. This could be an interesting question for Kitware. E.g., if we could provide a list of targets (file.o things) plus a weight. Could Kitware use that orchestrate a good Ninja file. Or perhaps we could do that ourselves (I already have dome something similar). Given weights + num_parallel_procs, coerce the existing build.ninja such that a certain memory highwater mark is avoided. (it's effectively a variant of the knapsack packing problem I believe)

@rmmilewi (CC Reed, this may be something he'd like to be abreast of)

…sable logic (trilinos#7376) Now, the package TrilinosBuildStats will get enabled by default if <Project>_ENABLE_BUILD_STATS=ON is set but the package TrilinosBuildStatus will not get disabled if <Project>_ENABLE_BUILD_STATS=ON is not set. But the test TrilinosBuildStats_Results will only get enabled if <Project>_ENABLE_BUILD_STATS=ON.

Also factored out small file BuildStatsSharedVars.cmake to avoid duplication. I did this for two reasons: 1. This code is really quite independent from the code that creates the wrappers. 2. Future projects that use these build stats suuport code (once this gets pulled out of Trilinos and put into its own repo) may want to generate the build stats but not bother with a gather-build-stats target.

@jjellio

…nos#7376) This makes the 'gather-build-stat' target completely quiet. This responds to feedback from @jjellio that the make command was a bit verbose (and I agree). But I added the -v option and I used that in the TrilinosBuildStats_Results test so that you can see the statistics so it will be shown on CDash.

…y default in PR builds (trilinos#7376) Now that gather_build_stats.py is super robust, it should be fine to pick up old *.timing files with different sets of headers and be in all types of messed up states.

…ilds (trilinos#7376) With the updated magic_wrapper.py, this should be safe to do. Individual drivers can still set Trilinos_ENABLE_BUILD_STATS=OFF if they want. This is just the default. I also remove an obsolete Trilinos_CTEST_DO_ALL_AT_ONCE=TRUE since that has been the default for many years.

) In commit 6f2afd5 Matt Bettencourt <[email protected]> tried to update this to clang-10 but that does not magically update what is actaully being tested on CDash and therefore does not make this a supported build. Someone needs to actually add the driver scripts and update the Jenkins jobs (and clean up any failing Trilinos tests). Making this change allows one to run 'ctest-s-local-test-driver.sh all' properly (as I was trying to do with trilinos#7376).

) In commit 36b53f6 jmgate tried to update this to clang-10 but that does not automatically update what is actaully being run on jenkins and displaysed on CDash and therefore listed builds in this file along does not make them supported builds. Someone needs to actually add the driver scripts for these builds under cmake/ctest/drivers/atdm/sems-rhel7/drivers/ and update the Jenkins jobs, and triage any new failing Trilinos tests. Making this change allows one to run 'ctest-s-local-test-driver.sh all' properly (as I was trying to do with trilinos#7376).

…7376) This should result in the full test results to be uploaded and displayed on CDash, even for Trilinos PR testing that does not use tribits_ctest_driver().

This merges in the state of Trilinos 'develop' from 'atdm-nightly' from testing day 2021-06-10.

…ilds (trilinos#7376) With the updated magic_wrapper.py, this should be safe to do. Individual drivers can still set Trilinos_ENABLE_BUILD_STATS=OFF if they want. This is just the default. I also remove an obsolete Trilinos_CTEST_DO_ALL_AT_ONCE=TRUE since that has been the default for many years.

) In commit 36b53f6 jmgate tried to update this to clang-10 but that does not automatically update what is actaully being run on jenkins and displaysed on CDash and therefore listed builds in this file along does not make them supported builds. Someone needs to actually add the driver scripts for these builds under cmake/ctest/drivers/atdm/sems-rhel7/drivers/ and update the Jenkins jobs, and triage any new failing Trilinos tests. Making this change allows one to run 'ctest-s-local-test-driver.sh all' properly (as I was trying to do with trilinos#7376).

…7376) This should result in the full test results to be uploaded and displayed on CDash, even for Trilinos PR testing that does not use tribits_ctest_driver().

This merges in the state of Trilinos 'develop' from 'atdm-nightly' from testing day 2021-06-10.

…ilinos#7376) Commenting out these builds just avoids people running them with: ./ctest-s-local-test-driver.sh all It does not impact what currently runs on jenkins and submits to CDash. (No sense beating a dead horse.)

Update build stats and turn on in all ATDM Trilinos builds!

bartlettroscoe · 2021-06-18T14:34:12Z

CC: @jjellio

A glorious day. The PR #8638 has finally been merged! This turns on the build stats wrappers in all of the ATDM Trilinos builds (when running test ctest -S driver) and in all of the Trilinos PR builds.

We can see 141 submissions of the test TrilinosBuildStats_Results in the ATDM Trilinos builds showing the new gather script gather_build_stats.py in this query.

And we are starting to see new PRs running this looking at this query.

We need to keep an eye on the PR builds for a few days.

It would be nice to break the build-stats summary reported in the TrilinosBuildStats_Results test into libraries, executables and object files separately before we close this. But that could really be a separate story.

…s:develop' (16c177b). * trilinos-develop: (71 commits) Tpetra: Remove some output from the Bug7758 test MueLu Stratimikos adapter: Enable half precision for factory-style PLs Tpetra: remove some deprecated usage ROL: implement the apply function for Thyra Vector Piro: changes to ROL adapters comply with ROL changes Piro: bug-fix in Piro::NOX_Solver Ifpack2: disabling tests causing build errors with extended scalar types (see issue trilinos#9280). Ifpack2: cleaning up unused variables in tests. Ctest: Adding Amesos2/Belos tests Ctest: Stuff failing on ride that worked on ascicgpu Ctest: Enabling non-UVM Ifpack2 tests Ifpack2: changing GO to the one in Tpetra_Details_DefaultTypes.hpp. Disable support for Makefile.export.* files (trilinos#8498) Tpetra: remove unused variable (copied too many times when breaking up a function) ats2: Comment out listing of long-broken XL builds (trilinos#9270, trilinos#7376) Ifpack2: adding missing logic for new tests. Belos: writing tests for 'long double' and 'float128' ScalarType. STK: Snapshot 06-11-21 17:50 Tpetra: remove comments that don't apply to HIP Tpetra: Use HIPSpace for HIPWrapperNode ...

bartlettroscoe · 2021-07-15T11:59:12Z

CC: @prwolfe, @jwillenbring

@jjellio, it occurred to me that adding begin and end time stamp fields to the *.timing file generated by magic_wraper.py could help to debug out-of-memory problems like reported in #9432. If you know the start and end time for when each target is getting built and you know the max RAM usage for each target, you can compute, at any moment in time, the max possible RAM getting used on the machine and you will know which targets are involved. That will tell you where you need to put in effort to reduce the RAM usage to build specific targets and get around a build bottle neck that consumes all the RAM.

Having the build start and end time stamps also has other uses as well. For example, when doing a rebuild with old targets lying around, if you only want to report build stats for targets that got built on a rebuild, you could add an argument to summarize_build_stats.py --after=<start-time> that filtered build starts built only after the start of the last rebuild <start-time> (which you know at configure time and you can put into the definition of the test) . This would also automatically filter out build stats for targets that no longer exist in the build system from rebuilds months (or years) old. This may also be used for other purposes that I don't even realize yet but these are the obvious ones.

…develop' (7591b32). * trilinos/develop: (77 commits) zoltan2: fix memory leak when sizeof(SCOTCH_Num) == sizeof(lno_t) trilinos#9312 Tpetra: Remove some output from the Bug7758 test MueLu Stratimikos adapter: Enable half precision for factory-style PLs Tpetra: remove some deprecated usage Fixed some deprecated code MueLu Thyra adapter: Allow construction of half precision operator ROL: implement the apply function for Thyra Vector Piro: changes to ROL adapters comply with ROL changes Piro: bug-fix in Piro::NOX_Solver MueLu: Print Scalar in MG Summary for high and extreme verbosity Ifpack2: disabling tests causing build errors with extended scalar types (see issue trilinos#9280). Ifpack2: cleaning up unused variables in tests. Ctest: Adding Amesos2/Belos tests Ctest: Stuff failing on ride that worked on ascicgpu Ctest: Enabling non-UVM Ifpack2 tests Ifpack2: changing GO to the one in Tpetra_Details_DefaultTypes.hpp. Disable support for Makefile.export.* files (trilinos#8498) Tpetra: remove unused variable (copied too many times when breaking up a function) ats2: Comment out listing of long-broken XL builds (trilinos#9270, trilinos#7376) Ifpack2: adding missing logic for new tests. ...

bartlettroscoe · 2021-11-04T18:08:14Z

CC: @jjellio, @jwillenbring

So I have a Trilinos PR #9894 that is stuck in loop of failed builds due to the compiler crashing running out of memory. Following on from the discussion above, it occurred to me that if you store the beginning and end time stamps for each target in the *.timing file, then the summarize_build_stats.py tool can sort the build stats by start time and end time and compute the high watermark on the machine due to building at any time. For example, if 10 object files are currently being built then you just add up max_resident_size_mb for each of these targets and that gives you the max high water mark at that time for that build as the build stat:

Full Project: max_sum_over_active_targets(max_resident_size_mb)

This would show how close a build is to running out of memory on a given machine and we could plot that number as a function of time. In fact, we could have the CTest test that runs summarize_build_stats.py create CTest test measurements for:

Full Project: max_sum_over_active_targets(max_resident_size_mb)
Full Project: sum(max_resident_size_mb)
Full Project: max(max_resident_size_mb)
Full Project: sum(elapsed_real_time_sec)
Full Project: max(elapsed_real_time_sec)
Full Project: sum(file_size_mb)
Full Project: max(file_size_mb)

using XML in the STDOUT like:

<DartMeasurement type="numeric/double" name="Full Project: sum(max_resident_size_mb)">4667989.73</DartMeasurement>

Then you could see a graph of these measurements over time right on CDash!

github-actions · 2022-11-05T12:32:00Z

This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.

bartlettroscoe · 2024-10-10T23:21:27Z

FYI: Kitware is adding build stats support to native CMake, CTest, and CDash. See:

https://gitlab.kitware.com/cmake/cmake/-/issues/26247

Therefore, I think there will be no need for a separate compiler wrapper tool to gather build stats or scripts to manage that and submit it to CDash.

jjellio added the type: enhancement Issue is an enhancement, not a bug label May 15, 2020

jjellio self-assigned this May 15, 2020

jjellio mentioned this issue May 15, 2020

Provide per-file build statistics for Trilinos and others... #7377

Merged

bartlettroscoe added ATDM Config Issues that are specific to the ATDM configuration settings ATDM DevOps Issues that will be worked by the Coordinated ATDM DevOps teams client: ATDM Any issue primarily impacting the ATDM project labels May 15, 2020

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 9, 2020

Build stats wrapper python needs 2.7+, not 2.6 (trilinos#7376)

b25211a

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 9, 2020

WIP: Initial usage of build stats wrappers (trilinos#7376)

b88042b

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 9, 2020

WIP: Add basic TrilinosBuildStats to gather and report to CDash (tril…

4929084

…inos#7376)

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 9, 2020

WIP: Add install hooks for compiler wrappers and gather script (trili…

27dd802

…nos#7376) I also changed the logic in how the compiler wrappers are generated a little.

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 9, 2020

WIP: Don't print the var Trilinos_USE_BUILD_PERF_WRAPPERS_DEFAULT (tr…

64d4502

…ilinos#7376)

bartlettroscoe mentioned this issue Jun 10, 2020

Build stats compiler wrappers and summary (#7376, CDOFA-119) #7508

Merged

31 tasks

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 10, 2020

Fix TrilinosFrameworkTests for addition of TrilinosBuildStats (trilin…

03ad108

…os#7376)

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Jun 11, 2020

WIP: Create basic 'gather-build-stats' target (trilinos#7376)

f157533

bartlettroscoe added a commit to jjellio/Trilinos that referenced this issue Jun 11, 2021

Merge 'develop' into fixbuild-stats2 (trilinos#7376)

27a2018

This merges in the state of Trilinos 'develop' from 'atdm-nightly' from testing day 2021-06-10.

bartlettroscoe added a commit to jjellio/Trilinos that referenced this issue Jun 11, 2021

Merge 'develop' into fixbuild-stats2 (trilinos#7376)

71c21a5

This merges in the state of Trilinos 'develop' from 'atdm-nightly' from testing day 2021-06-10.

bartlettroscoe added a commit that referenced this issue Jun 14, 2021

Merge branch 'fixbuild-stats2' into atdm-nightly-manual-updates (#7376)

a0cabff

Update build stats and turn on in all ATDM Trilinos builds!

bartlettroscoe removed AT: WIP Causes the PR autotester to not test the PR. (Remove to allow testing to occur.) ATDM Config Issues that are specific to the ATDM configuration settings labels Jun 23, 2021

bartlettroscoe mentioned this issue Jun 23, 2021

Trilinos: Build stats continued todos #9333

Open

13 tasks

bartlettroscoe mentioned this issue Nov 4, 2021

Trilinos auto PR tester stability issues #3276

Closed

bartlettroscoe mentioned this issue May 17, 2022

Stratimikos: Factory for scalar types != double #10507

Merged

github-actions bot added the MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. label Nov 5, 2022

bartlettroscoe added DO_NOT_AUTOCLOSE This issue should be exempt from auto-closing by the GitHub Actions bot. and removed MARKED_FOR_CLOSURE Issue or PR is marked for auto-closure by the GitHub Actions bot. labels Nov 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trilinos: Enable build statistics #7376

Trilinos: Enable build statistics #7376

jjellio commented May 15, 2020 •

edited by bartlettroscoe

Loading

bartlettroscoe commented May 15, 2020

csiefer2 commented May 15, 2020

jjellio commented May 16, 2020

bartlettroscoe commented Jun 9, 2020

bartlettroscoe commented Jun 10, 2020

jjellio commented Jun 11, 2020

bartlettroscoe commented Jun 11, 2020

bartlettroscoe commented Jun 11, 2020

bartlettroscoe commented Jun 11, 2020

jjellio commented Jun 11, 2020

bartlettroscoe commented Jun 11, 2020 •

edited

Loading

jjellio commented Jun 11, 2020

bartlettroscoe commented Jun 18, 2021

bartlettroscoe commented Jul 15, 2021 •

edited

Loading

bartlettroscoe commented Nov 4, 2021 •

edited

Loading

github-actions bot commented Nov 5, 2022

bartlettroscoe commented Oct 10, 2024

Trilinos: Enable build statistics #7376

Trilinos: Enable build statistics #7376

Comments

jjellio commented May 15, 2020 • edited by bartlettroscoe Loading

Enhancement

Path forward

Links

Related to

Tasks:

bartlettroscoe commented May 15, 2020

csiefer2 commented May 15, 2020

jjellio commented May 16, 2020

bartlettroscoe commented Jun 9, 2020

bartlettroscoe commented Jun 10, 2020

jjellio commented Jun 11, 2020

bartlettroscoe commented Jun 11, 2020

bartlettroscoe commented Jun 11, 2020

bartlettroscoe commented Jun 11, 2020

jjellio commented Jun 11, 2020

bartlettroscoe commented Jun 11, 2020 • edited Loading

jjellio commented Jun 11, 2020

bartlettroscoe commented Jun 18, 2021

bartlettroscoe commented Jul 15, 2021 • edited Loading

bartlettroscoe commented Nov 4, 2021 • edited Loading

github-actions bot commented Nov 5, 2022

bartlettroscoe commented Oct 10, 2024

jjellio commented May 15, 2020 •

edited by bartlettroscoe

Loading

bartlettroscoe commented Jun 11, 2020 •

edited

Loading

bartlettroscoe commented Jul 15, 2021 •

edited

Loading

bartlettroscoe commented Nov 4, 2021 •

edited

Loading