Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: add experimental-report-test-result-info option to the test goal #21171

Conversation

AlexTereshenkov
Copy link
Member

@AlexTereshenkov AlexTereshenkov commented Jul 14, 2024

A new option --report-test-result-source is added to the [test] config section. Enabling this option will
produce a file on disk with information that would tell you where the results of tests come from. The tests might
have been executed locally or remotely, but they might have been retrieved from the local or remote cache,
or be memoized. Knowing where the test results come from may be useful when evaluating the efficiency of the
caching strategy and the nature of the changes in the source code that may lead to frequent cache invalidations.

Why?

After running tests one can see how the tests execute, but this information is available only in plain text:

$ pants --no-pantsd test --report ::
23:13:22.00 [INFO] Completed: Run Pytest - helloworld/translator/translator_test.py:tests - succeeded.
23:13:22.20 [INFO] Completed: Run Pytest - helloworld/greet/greeting_test.py:tests - succeeded.

✓ helloworld/greet/greeting_test.py:tests succeeded in 0.20s.
✓ helloworld/translator/translator_test.py:tests succeeded in 0.35s (cached locally).

Wrote test reports to dist/test/reports

It would be helpful to store that helloworld/greet/greeting_test.py was executed and helloworld/translator/translator_test.py was cached locally in a parseable form. I would want to have this produced in every CI build and stored in the metrics database to be able to see how effectively the cache is used.

For instance, imagine you have nice remote cache setup, but if every PR would modify a few files that are transitive dependencies of majority of tests, the cache won't really help. If one can see what tests are executed most often (instead of being picked up from the cache), they might be good candidates for refactoring.

Some of the caching stats is available via the [stats] subsystem and I've recently added support for JSON Lines for the stats subsystem, see #20579, but this has no information about the tests modules themselves.

The workunit-logger backend does provide some information such as which tests were executed, but doesn't tell anything about the other modules (was it memoized? cached locally? cached remotely?)

@sureshjoshi also had an idea:

I'd be suuuuper interested in any pass/fail/memoized/cached introspection we could offer - as I could also plug that into an IDE Test Explorer (in VSCode for example). Right now, the extent of it is mostly "here are the test files" and "here's how we run them" and then parsing text

so this another use case, too.

@AlexTereshenkov AlexTereshenkov force-pushed the pants/test-result-source-report branch 2 times, most recently from 8e68f9b to d1cf132 Compare July 14, 2024 20:16
"""Save a JSON file with the information about the source of test result."""
timestamp = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
obj = json.dumps({"timestamp": timestamp, "run_id": run_id, "sources": results})
with safe_open(f"test_result_source_report_runid{run_id}_{timestamp}.json", "w") as fh:
Copy link
Member Author

@AlexTereshenkov AlexTereshenkov Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be concurrent-safe since every invocation of the test goal would have a unique id, so multiple files can be created in parallel with no racing. This follows closely the naming pattern of the workunit-logger files, e.g. pants_run_2024_05_11_23_39_48_438_cf61f6cd0266404bb307a9ff3103cea4. I don't think a GUID is justified here and run id should suffice.

Also relying on run id provides a hint at the order in which the goals were run (in case there are multiple test goal invocations as part of the build without the pantsd restart).

After running the test goal a few times one after another:

$ ls test_result* -1
test_result_source_report_runid7_2024_07_14_23_30_02.json
test_result_source_report_runid8_2024_07_14_23_30_12.json
test_result_source_report_runid9_2024_07_14_23_30_17.json

Copy link
Member Author

@AlexTereshenkov AlexTereshenkov Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few example files produced:

$ cat test_result_source_report_runid9_2024_07_14_23_30_17.json | jq
{
  "timestamp": "2024_07_14_23_30_17",
  "run_id": 9,
  "sources": {
    "src/python/pants/core/goals/test_test.py:tests": "memoized"
  }
}

Copy link
Member Author

@AlexTereshenkov AlexTereshenkov Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ cat test_result_source_report_runid0_2024_07_14_23_29_40.json | jq
{
  "timestamp": "2024_07_14_23_29_40",
  "run_id": 0,
  "sources": {
    "helloworld/greet/greeting_test.py:tests": "hit_locally",
    "helloworld/translator/translator_test.py:tests": "hit_locally"
  }
}

@AlexTereshenkov AlexTereshenkov marked this pull request as ready for review July 14, 2024 21:37
Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!

The workunit-logger backend does provide some information such as which tests were executed, but doesn't tell anything about the other modules (was it memoized? cached locally? cached remotely?)

I wonder if we could also be trying to extract even more information into the workunit logger? For instance, cache look-ups take time, so in theory those could/should appear as spans too?

While I imagine tests are usually the slowest part of a build and so most interesting, I could imagine linting/type checking/packaging can also take a while in some cases, so having a "more general" infrastructure for tracking processes and their runs seems like it maybe be useful?

src/python/pants/core/goals/test.py Outdated Show resolved Hide resolved
@AlexTereshenkov
Copy link
Member Author

I wonder if we could also be trying to extract even more information into the workunit logger? For instance, cache look-ups take time, so in theory those could/should appear as spans too?

While I imagine tests are usually the slowest part of a build and so most interesting, I could imagine linting/type checking/packaging can also take a while in some cases, so having a "more general" infrastructure for tracking processes and their runs seems like it maybe be useful?

I've been also torn apart whether this should be part of the generic framework or just an extension to the test goal. My primary concerns are:

  • there's a performance hit running workunit-logger enabled as there's more data to track and write
  • this is a "debugging" tool, not something you'd want to run on a regular basis in all CI builds to be able to see whether your tests results were fetched from the cache or not

Making workunit-logger subsystem flexible in what's being tracked might require a significant effort - e.g. a user may only be interested in tracking the test results sources, nothing else, and specifying this may be rather tedious. However, the idea of having workunit-logger more sophisticated in what it tracks is great. I just can't see how this would be a great fit for storing the test results sources.

So perhaps there's a need for some generic framework to track everything that happens with a Pants run. The workunit-logger is great and maybe that's the right place for this type of information (similarly to what Bazel does with the "execution log", see https://bazel.build/remote/cache-remote). However, it feels like getting information about the tests result sources is more of a "user-friendly" metadata that can be emitted by the test goal in the same way we emit coverage or JUnit XML reports. And then if you want to find out whether your linting operation on a file is cached or not (most likely because you want to confirm your cache operates and Pants is bug free), then workunit-logger makes sense.

But I also see how having this option in the test goal feels a bit like it has a special nature as none of the other "user-friendly" goals really emit metadata about the cache interaction (apart from the stats subsystem), but also for tests as you said it's when it makes most sense to explore how good the cache hit rates are, so maybe it's actually okay?

@huonw
Copy link
Contributor

huonw commented Jul 16, 2024

there's a performance hit running workunit-logger enabled as there's more data to track and write

Oh, I hadn't realised. And I just started it using in my work repo... 🤔

Do you happen to have some numbers about the perf impact?

this is a "debugging" tool, not something you'd want to run on a regular basis in all CI builds to be able to see whether your tests results were fetched from the cache or not

Ah interesting. I was imagining that someone might have this running constantly and ingested into some sort of dashboard/metrics.

But I also see how having this option in the test goal feels a bit like it has a special nature as none of the other "user-friendly" goals really emit metadata about the cache interaction (apart from the stats subsystem), but also for tests as you said it's when it makes most sense to explore how good the cache hit rates are, so maybe it's actually okay?

Yeah, maybe it is! Along that line of thinking, it'd be cool to get this in and allow people to start using it/us to learn how people use it.

To unblock getting that, I think I'd be more than happy with the changes we're discussing in #21171 (comment)

  • adjusting to have the more extensible structure (including renaming the file and option, good catch 👍 ), but no need to add any more info other than source for now
  • adding experimental into the option name

What do you think?

@AlexTereshenkov AlexTereshenkov force-pushed the pants/test-result-source-report branch from d1cf132 to 06044f9 Compare July 17, 2024 15:56
@AlexTereshenkov AlexTereshenkov force-pushed the pants/test-result-source-report branch from 06044f9 to 4fbe116 Compare July 17, 2024 15:59
@AlexTereshenkov
Copy link
Member Author

AlexTereshenkov commented Jul 17, 2024

there's a performance hit running workunit-logger enabled as there's more data to track and write

Oh, I hadn't realised. And I just started it using in my work repo... 🤔

Do you happen to have some numbers about the perf impact?

That shouldn't be a problem in a small/medium size repos. I've seen only ~10% performance degradation in local experiments so I wouldn't worry too much. But on a huge repo like mine, I just can't afford collecting any debugging information while only caring about the caching efficiency for the tests run.

this is a "debugging" tool, not something you'd want to run on a regular basis in all CI builds to be able to see whether your tests results were fetched from the cache or not

Ah interesting. I was imagining that someone might have this running constantly and ingested into some sort of dashboard/metrics.

Sorry, my wording was poor. I mean collecting the source of test results into the dashboard/metrics database - yes! But no to running workunit-logger enabled by default in CI as it just would collect too much data I'd have little value of. In other words, I don't want to collect lots of logs just to get the test results info. I thought one would normally only enable workunit-logger during the debugging. However, if the build completes under a minute, then who cares of course, one can collect everything and then should there be issues, the logs are readily available. I just don't think this is a strategy we can expect everyone to follow, particularly with massive, slow builds.

But I also see how having this option in the test goal feels a bit like it has a special nature as none of the other "user-friendly" goals really emit metadata about the cache interaction (apart from the stats subsystem), but also for tests as you said it's when it makes most sense to explore how good the cache hit rates are, so maybe it's actually okay?

Yeah, maybe it is! Along that line of thinking, it'd be cool to get this in and allow people to start using it/us to learn how people use it.

To unblock getting that, I think I'd be more than happy with the changes we're discussing in #21171 (comment)

* adjusting to have the more extensible structure (including renaming the file and option, good catch 👍 ), but no need to add any more info other than `source` for now

* adding `experimental` into the option name

What do you think?

Sure, this is what I've now done :) option and file are renamed and I've tweaked the name of the option and updated the docs accordingly.

@AlexTereshenkov AlexTereshenkov requested a review from huonw July 17, 2024 16:58
@AlexTereshenkov AlexTereshenkov changed the title tests: add report-test-result-source option to the test goal tests: add experimental-report-test-result-info option to the test goal Jul 17, 2024
@huonw
Copy link
Contributor

huonw commented Jul 18, 2024

That shouldn't be a problem in a small/medium size repos. I've seen only ~10% performance degradation in local experiments so I wouldn't worry too much. But on a huge repo like mine, I just can't afford collecting any debugging information while only caring about the caching efficiency for the tests run.

👍

Total tangent from this PR, but maybe workunit logger could/should support filtering by level, so that it can just log the INFO and higher units, and skip the DEBUG ones.

But no to running workunit-logger enabled by default in CI as it just would collect too much data I'd have little value of

Ah, right!

Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@AlexTereshenkov AlexTereshenkov merged commit 0ccf0cf into pantsbuild:main Jul 18, 2024
25 checks passed
@AlexTereshenkov AlexTereshenkov deleted the pants/test-result-source-report branch July 18, 2024 10:25
@AlexTereshenkov
Copy link
Member Author

Total tangent from this PR, but maybe workunit logger could/should support filtering by level, so that it can just log the INFO and higher units, and skip the DEBUG ones.

@huonw it seems there's already a setting to do that:

pants --workunit-logger-enabled --workunit-logger-logdir=artifacts --streaming-workunits-level=debug test ::
pants --workunit-logger-enabled --workunit-logger-logdir=artifacts --streaming-workunits-level=info test ::

I confirm the verbosity of the messages is different in the files. See https://www.pantsbuild.org/2.23/reference/global-options#streaming_workunits_level

tobni pushed a commit to tobni/pants that referenced this pull request Jul 22, 2024
…al (pantsbuild#21171)

A new option `--report-test-result-source` is added to the `[test]`
config section. Enabling this option will
produce a file on disk with information that would tell you where the
results of tests come from. The tests might
have been executed locally or remotely, but they might have been
retrieved from the local or remote cache,
or be memoized. 

Knowing where the test results come from may be useful
when evaluating the efficiency of the
caching strategy and the nature of the changes in the source code that
may lead to frequent cache invalidations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants