[Proposal] Index Environment Metadata for every benchmark #298

whitleykeith · 2021-07-14T17:56:19Z

Environment Fields

Problem

benchmark-wrapper is meant to run in many different environments. However, many times the results don't show environment information, or if they do it's from a user-defined CLI arg (i.e. cluster-name) that may or may not be accurate. This makes analysis hard as we can't query on things about the environment and instead have to find uuids/run_ids and figure out how they map.

How we can solve

This is a proposal to implement a one-time, non-blocking, post-benchmark step for environment metadata collection after the run of a benchmark. We can make the wrapper take a new flag (--environment) at the top level (prone to error), or we can define methods to detect the environment in the wrapper (more usable but trickier).

This would probably be a new package in snafu, at snafu/environments where each module corresponds to an Environment definition which should contain the fields to index, methods grabbing the field values from the environment, and a method to detect whether or not the runtime is in that environment.

Each benchmark would then gather the environment metadata after a run and index it alongside the results.

Thoughts?
@learnitall
@jtaleric
@rsevilla87

The text was updated successfully, but these errors were encountered:

dry923 · 2021-07-14T19:57:30Z

Talked about this with @whitleykeith on slack. Since backpack already collects all the data prior to any run we could simply have it write its json output to a shared storage or just the bits we care about to redis and then have snafu read it in. It would be a pretty trivial change to backpack and snafu then. thoughts?

learnitall · 2021-07-14T21:54:24Z

My only concern with backpack is that it seems like a pretty heavy dependency. Would the automotive team be able to use backpack for environment information?

Also, what sort of information are we looking to collect?

I like the idea of having snafu/environments for a location of Python modules that pull env metadata. Another idea that might work, since some env collection could be done in straight-up bash, is the following:

Create snafu/environments as a collection of tiny executable scripts which take no arguments and output environment data in the form of KEY=VALUE, with each key-value pair on a separate line. If no key-value pair was found in the output, or if the return-code is non-zero, then assume the script isn't applicable for the environment.
Create a module called environment.py which runs each of these bash scripts one by one, collects the output, and saves the key-value pairs to a dataclass which we can then export.

From this perspective, we can do metacollection using bash, perl, python, whatever. It'll be super lightweight, easy to maintain, easy to test and have minimal external dependencies.

But at the disadvantage that we now have to maintain code that does the same thing as backpack.

jtaleric · 2021-07-15T17:27:19Z

My only concern with backpack is that it seems like a pretty heavy dependency. Would the automotive team be able to use backpack for environment information?

They would never use backpack -- backpack is specific to OCP. They would use the underlying Ansible roles which is Stockpile.

Also, what sort of information are we looking to collect?

I like the idea of having snafu/environments for a location of Python modules that pull env metadata. Another idea that might work, since some env collection could be done in straight-up bash, is the following:

Create snafu/environments as a collection of tiny executable scripts which take no arguments and output environment data in the form of KEY=VALUE, with each key-value pair on a separate line. If no key-value pair was found in the output, or if the return-code is non-zero, then assume the script isn't applicable for the environment.

Create a module called environment.py which runs each of these bash scripts one by one, collects the output, and saves the key-value pairs to a dataclass which we can then export.

From this perspective, we can do metacollection using bash, perl, python, whatever. It'll be super lightweight, easy to maintain, easy to test and have minimal external dependencies.

But at the disadvantage that we now have to maintain code that does the same thing as backpack.

IMHO This is a step backwards. We already do this today. Check out Stockpile and Scribe.

learnitall · 2021-07-15T20:06:35Z

My concern with Scribe was that it seemed like a large dependency, which was where that bash idea came from, but if we are willing to do use it then all the merrier. My bad on the confusion.

learnitall mentioned this issue Jul 14, 2021

ElasticSearch Unified Index #288

Open

whitleykeith closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Index Environment Metadata for every benchmark #298

[Proposal] Index Environment Metadata for every benchmark #298

whitleykeith commented Jul 14, 2021

dry923 commented Jul 14, 2021

learnitall commented Jul 14, 2021

jtaleric commented Jul 15, 2021

learnitall commented Jul 15, 2021

[Proposal] Index Environment Metadata for every benchmark #298

[Proposal] Index Environment Metadata for every benchmark #298

Comments

whitleykeith commented Jul 14, 2021

Environment Fields

Problem

How we can solve

dry923 commented Jul 14, 2021

learnitall commented Jul 14, 2021

jtaleric commented Jul 15, 2021

learnitall commented Jul 15, 2021