tests: add experimental-report-test-result-info option to the test goal #21171

AlexTereshenkov · 2024-07-14T16:37:56Z

A new option --report-test-result-source is added to the [test] config section. Enabling this option will
produce a file on disk with information that would tell you where the results of tests come from. The tests might
have been executed locally or remotely, but they might have been retrieved from the local or remote cache,
or be memoized. Knowing where the test results come from may be useful when evaluating the efficiency of the
caching strategy and the nature of the changes in the source code that may lead to frequent cache invalidations.

Why?

After running tests one can see how the tests execute, but this information is available only in plain text:

$ pants --no-pantsd test --report ::
23:13:22.00 [INFO] Completed: Run Pytest - helloworld/translator/translator_test.py:tests - succeeded.
23:13:22.20 [INFO] Completed: Run Pytest - helloworld/greet/greeting_test.py:tests - succeeded.

✓ helloworld/greet/greeting_test.py:tests succeeded in 0.20s.
✓ helloworld/translator/translator_test.py:tests succeeded in 0.35s (cached locally).

Wrote test reports to dist/test/reports

It would be helpful to store that helloworld/greet/greeting_test.py was executed and helloworld/translator/translator_test.py was cached locally in a parseable form. I would want to have this produced in every CI build and stored in the metrics database to be able to see how effectively the cache is used.

For instance, imagine you have nice remote cache setup, but if every PR would modify a few files that are transitive dependencies of majority of tests, the cache won't really help. If one can see what tests are executed most often (instead of being picked up from the cache), they might be good candidates for refactoring.

Some of the caching stats is available via the [stats] subsystem and I've recently added support for JSON Lines for the stats subsystem, see #20579, but this has no information about the tests modules themselves.

The workunit-logger backend does provide some information such as which tests were executed, but doesn't tell anything about the other modules (was it memoized? cached locally? cached remotely?)

@sureshjoshi also had an idea:

I'd be suuuuper interested in any pass/fail/memoized/cached introspection we could offer - as I could also plug that into an IDE Test Explorer (in VSCode for example). Right now, the extent of it is mostly "here are the test files" and "here's how we run them" and then parsing text

so this another use case, too.

AlexTereshenkov · 2024-07-14T21:31:31Z

src/python/pants/core/goals/test.py

+    """Save a JSON file with the information about the source of test result."""
+    timestamp = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
+    obj = json.dumps({"timestamp": timestamp, "run_id": run_id, "sources": results})
+    with safe_open(f"test_result_source_report_runid{run_id}_{timestamp}.json", "w") as fh:


This should be concurrent-safe since every invocation of the test goal would have a unique id, so multiple files can be created in parallel with no racing. This follows closely the naming pattern of the workunit-logger files, e.g. pants_run_2024_05_11_23_39_48_438_cf61f6cd0266404bb307a9ff3103cea4. I don't think a GUID is justified here and run id should suffice.

Also relying on run id provides a hint at the order in which the goals were run (in case there are multiple test goal invocations as part of the build without the pantsd restart).

After running the test goal a few times one after another:

$ ls test_result* -1 test_result_source_report_runid7_2024_07_14_23_30_02.json test_result_source_report_runid8_2024_07_14_23_30_12.json test_result_source_report_runid9_2024_07_14_23_30_17.json

A few example files produced:

$ cat test_result_source_report_runid9_2024_07_14_23_30_17.json | jq { "timestamp": "2024_07_14_23_30_17", "run_id": 9, "sources": { "src/python/pants/core/goals/test_test.py:tests": "memoized" } }

$ cat test_result_source_report_runid0_2024_07_14_23_29_40.json | jq { "timestamp": "2024_07_14_23_29_40", "run_id": 0, "sources": { "helloworld/greet/greeting_test.py:tests": "hit_locally", "helloworld/translator/translator_test.py:tests": "hit_locally" } }

huonw

Interesting!

The workunit-logger backend does provide some information such as which tests were executed, but doesn't tell anything about the other modules (was it memoized? cached locally? cached remotely?)

I wonder if we could also be trying to extract even more information into the workunit logger? For instance, cache look-ups take time, so in theory those could/should appear as spans too?

While I imagine tests are usually the slowest part of a build and so most interesting, I could imagine linting/type checking/packaging can also take a while in some cases, so having a "more general" infrastructure for tracking processes and their runs seems like it maybe be useful?

src/python/pants/core/goals/test.py

AlexTereshenkov · 2024-07-15T21:33:01Z

I wonder if we could also be trying to extract even more information into the workunit logger? For instance, cache look-ups take time, so in theory those could/should appear as spans too?

While I imagine tests are usually the slowest part of a build and so most interesting, I could imagine linting/type checking/packaging can also take a while in some cases, so having a "more general" infrastructure for tracking processes and their runs seems like it maybe be useful?

I've been also torn apart whether this should be part of the generic framework or just an extension to the test goal. My primary concerns are:

there's a performance hit running workunit-logger enabled as there's more data to track and write
this is a "debugging" tool, not something you'd want to run on a regular basis in all CI builds to be able to see whether your tests results were fetched from the cache or not

Making workunit-logger subsystem flexible in what's being tracked might require a significant effort - e.g. a user may only be interested in tracking the test results sources, nothing else, and specifying this may be rather tedious. However, the idea of having workunit-logger more sophisticated in what it tracks is great. I just can't see how this would be a great fit for storing the test results sources.

So perhaps there's a need for some generic framework to track everything that happens with a Pants run. The workunit-logger is great and maybe that's the right place for this type of information (similarly to what Bazel does with the "execution log", see https://bazel.build/remote/cache-remote). However, it feels like getting information about the tests result sources is more of a "user-friendly" metadata that can be emitted by the test goal in the same way we emit coverage or JUnit XML reports. And then if you want to find out whether your linting operation on a file is cached or not (most likely because you want to confirm your cache operates and Pants is bug free), then workunit-logger makes sense.

But I also see how having this option in the test goal feels a bit like it has a special nature as none of the other "user-friendly" goals really emit metadata about the cache interaction (apart from the stats subsystem), but also for tests as you said it's when it makes most sense to explore how good the cache hit rates are, so maybe it's actually okay?

huonw · 2024-07-16T22:55:27Z

there's a performance hit running workunit-logger enabled as there's more data to track and write

Oh, I hadn't realised. And I just started it using in my work repo... 🤔

Do you happen to have some numbers about the perf impact?

this is a "debugging" tool, not something you'd want to run on a regular basis in all CI builds to be able to see whether your tests results were fetched from the cache or not

Ah interesting. I was imagining that someone might have this running constantly and ingested into some sort of dashboard/metrics.

But I also see how having this option in the test goal feels a bit like it has a special nature as none of the other "user-friendly" goals really emit metadata about the cache interaction (apart from the stats subsystem), but also for tests as you said it's when it makes most sense to explore how good the cache hit rates are, so maybe it's actually okay?

Yeah, maybe it is! Along that line of thinking, it'd be cool to get this in and allow people to start using it/us to learn how people use it.

To unblock getting that, I think I'd be more than happy with the changes we're discussing in #21171 (comment)

adjusting to have the more extensible structure (including renaming the file and option, good catch 👍 ), but no need to add any more info other than source for now
adding experimental into the option name

What do you think?

AlexTereshenkov · 2024-07-17T16:10:28Z

there's a performance hit running workunit-logger enabled as there's more data to track and write

Oh, I hadn't realised. And I just started it using in my work repo... 🤔

Do you happen to have some numbers about the perf impact?

That shouldn't be a problem in a small/medium size repos. I've seen only ~10% performance degradation in local experiments so I wouldn't worry too much. But on a huge repo like mine, I just can't afford collecting any debugging information while only caring about the caching efficiency for the tests run.

this is a "debugging" tool, not something you'd want to run on a regular basis in all CI builds to be able to see whether your tests results were fetched from the cache or not

Ah interesting. I was imagining that someone might have this running constantly and ingested into some sort of dashboard/metrics.

Sorry, my wording was poor. I mean collecting the source of test results into the dashboard/metrics database - yes! But no to running workunit-logger enabled by default in CI as it just would collect too much data I'd have little value of. In other words, I don't want to collect lots of logs just to get the test results info. I thought one would normally only enable workunit-logger during the debugging. However, if the build completes under a minute, then who cares of course, one can collect everything and then should there be issues, the logs are readily available. I just don't think this is a strategy we can expect everyone to follow, particularly with massive, slow builds.

But I also see how having this option in the test goal feels a bit like it has a special nature as none of the other "user-friendly" goals really emit metadata about the cache interaction (apart from the stats subsystem), but also for tests as you said it's when it makes most sense to explore how good the cache hit rates are, so maybe it's actually okay?

Yeah, maybe it is! Along that line of thinking, it'd be cool to get this in and allow people to start using it/us to learn how people use it.

To unblock getting that, I think I'd be more than happy with the changes we're discussing in #21171 (comment)
* adjusting to have the more extensible structure (including renaming the file and option, good catch 👍 ), but no need to add any more info other than `source` for now

* adding `experimental` into the option name
What do you think?

Sure, this is what I've now done :) option and file are renamed and I've tweaked the name of the option and updated the docs accordingly.

huonw · 2024-07-18T02:24:07Z

That shouldn't be a problem in a small/medium size repos. I've seen only ~10% performance degradation in local experiments so I wouldn't worry too much. But on a huge repo like mine, I just can't afford collecting any debugging information while only caring about the caching efficiency for the tests run.

👍

Total tangent from this PR, but maybe workunit logger could/should support filtering by level, so that it can just log the INFO and higher units, and skip the DEBUG ones.

But no to running workunit-logger enabled by default in CI as it just would collect too much data I'd have little value of

Ah, right!

huonw

Nice

AlexTereshenkov · 2024-07-20T06:27:43Z

Total tangent from this PR, but maybe workunit logger could/should support filtering by level, so that it can just log the INFO and higher units, and skip the DEBUG ones.

@huonw it seems there's already a setting to do that:

pants --workunit-logger-enabled --workunit-logger-logdir=artifacts --streaming-workunits-level=debug test ::
pants --workunit-logger-enabled --workunit-logger-logdir=artifacts --streaming-workunits-level=info test ::

I confirm the verbosity of the messages is different in the files. See https://www.pantsbuild.org/2.23/reference/global-options#streaming_workunits_level

…al (pantsbuild#21171) A new option `--report-test-result-source` is added to the `[test]` config section. Enabling this option will produce a file on disk with information that would tell you where the results of tests come from. The tests might have been executed locally or remotely, but they might have been retrieved from the local or remote cache, or be memoized. Knowing where the test results come from may be useful when evaluating the efficiency of the caching strategy and the nature of the changes in the source code that may lead to frequent cache invalidations.

AlexTereshenkov added the category:new feature label Jul 14, 2024

AlexTereshenkov force-pushed the pants/test-result-source-report branch 2 times, most recently from 8e68f9b to d1cf132 Compare July 14, 2024 20:16

AlexTereshenkov commented Jul 14, 2024

View reviewed changes

AlexTereshenkov marked this pull request as ready for review July 14, 2024 21:37

AlexTereshenkov requested review from huonw, sureshjoshi and kaos July 14, 2024 21:37

huonw reviewed Jul 15, 2024

View reviewed changes

src/python/pants/core/goals/test.py Outdated Show resolved Hide resolved

huonw reviewed Jul 15, 2024

View reviewed changes

src/python/pants/core/goals/test.py Outdated Show resolved Hide resolved

tests: add report-test-result-source option to the test goal

7b04359

AlexTereshenkov force-pushed the pants/test-result-source-report branch from d1cf132 to 06044f9 Compare July 17, 2024 15:56

tests: rename the option and the report file

4fbe116

AlexTereshenkov force-pushed the pants/test-result-source-report branch from 06044f9 to 4fbe116 Compare July 17, 2024 15:59

AlexTereshenkov added 2 commits July 17, 2024 18:13

Reformat code

6aa315c

Merge branch 'main' into pants/test-result-source-report

207ce4c

AlexTereshenkov requested a review from huonw July 17, 2024 16:58

AlexTereshenkov changed the title ~~tests: add report-test-result-source option to the test goal~~ tests: add experimental-report-test-result-info option to the test goal Jul 17, 2024

huonw approved these changes Jul 18, 2024

View reviewed changes

AlexTereshenkov merged commit 0ccf0cf into pantsbuild:main Jul 18, 2024
25 checks passed

AlexTereshenkov deleted the pants/test-result-source-report branch July 18, 2024 10:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: add experimental-report-test-result-info option to the test goal #21171

tests: add experimental-report-test-result-info option to the test goal #21171

AlexTereshenkov commented Jul 14, 2024 •

edited

Loading

AlexTereshenkov Jul 14, 2024 •

edited

Loading

AlexTereshenkov Jul 14, 2024 •

edited

Loading

AlexTereshenkov Jul 14, 2024 •

edited

Loading

huonw left a comment

AlexTereshenkov commented Jul 15, 2024

huonw commented Jul 16, 2024

AlexTereshenkov commented Jul 17, 2024 •

edited

Loading

huonw commented Jul 18, 2024

huonw left a comment

AlexTereshenkov commented Jul 20, 2024

tests: add experimental-report-test-result-info option to the test goal #21171

tests: add experimental-report-test-result-info option to the test goal #21171

Conversation

AlexTereshenkov commented Jul 14, 2024 • edited Loading

AlexTereshenkov Jul 14, 2024 • edited Loading

Choose a reason for hiding this comment

AlexTereshenkov Jul 14, 2024 • edited Loading

Choose a reason for hiding this comment

AlexTereshenkov Jul 14, 2024 • edited Loading

Choose a reason for hiding this comment

huonw left a comment

Choose a reason for hiding this comment

AlexTereshenkov commented Jul 15, 2024

huonw commented Jul 16, 2024

AlexTereshenkov commented Jul 17, 2024 • edited Loading

huonw commented Jul 18, 2024

huonw left a comment

Choose a reason for hiding this comment

AlexTereshenkov commented Jul 20, 2024

AlexTereshenkov commented Jul 14, 2024 •

edited

Loading

AlexTereshenkov Jul 14, 2024 •

edited

Loading

AlexTereshenkov Jul 14, 2024 •

edited

Loading

AlexTereshenkov Jul 14, 2024 •

edited

Loading

AlexTereshenkov commented Jul 17, 2024 •

edited

Loading