Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Index Environment Metadata for every benchmark #298

Closed
whitleykeith opened this issue Jul 14, 2021 · 4 comments
Closed

[Proposal] Index Environment Metadata for every benchmark #298

whitleykeith opened this issue Jul 14, 2021 · 4 comments

Comments

@whitleykeith
Copy link

Environment Fields

Problem

benchmark-wrapper is meant to run in many different environments. However, many times the results don't show environment information, or if they do it's from a user-defined CLI arg (i.e. cluster-name) that may or may not be accurate. This makes analysis hard as we can't query on things about the environment and instead have to find uuids/run_ids and figure out how they map.

How we can solve

This is a proposal to implement a one-time, non-blocking, post-benchmark step for environment metadata collection after the run of a benchmark. We can make the wrapper take a new flag (--environment) at the top level (prone to error), or we can define methods to detect the environment in the wrapper (more usable but trickier).

This would probably be a new package in snafu, at snafu/environments where each module corresponds to an Environment definition which should contain the fields to index, methods grabbing the field values from the environment, and a method to detect whether or not the runtime is in that environment.

Each benchmark would then gather the environment metadata after a run and index it alongside the results.

Thoughts?
@learnitall
@jtaleric
@rsevilla87

@dry923
Copy link
Member

dry923 commented Jul 14, 2021

Talked about this with @whitleykeith on slack. Since backpack already collects all the data prior to any run we could simply have it write its json output to a shared storage or just the bits we care about to redis and then have snafu read it in. It would be a pretty trivial change to backpack and snafu then. thoughts?

@learnitall
Copy link
Collaborator

My only concern with backpack is that it seems like a pretty heavy dependency. Would the automotive team be able to use backpack for environment information?

Also, what sort of information are we looking to collect?

I like the idea of having snafu/environments for a location of Python modules that pull env metadata. Another idea that might work, since some env collection could be done in straight-up bash, is the following:

  • Create snafu/environments as a collection of tiny executable scripts which take no arguments and output environment data in the form of KEY=VALUE, with each key-value pair on a separate line. If no key-value pair was found in the output, or if the return-code is non-zero, then assume the script isn't applicable for the environment.
  • Create a module called environment.py which runs each of these bash scripts one by one, collects the output, and saves the key-value pairs to a dataclass which we can then export.

From this perspective, we can do metacollection using bash, perl, python, whatever. It'll be super lightweight, easy to maintain, easy to test and have minimal external dependencies.

But at the disadvantage that we now have to maintain code that does the same thing as backpack.

@jtaleric
Copy link
Member

My only concern with backpack is that it seems like a pretty heavy dependency. Would the automotive team be able to use backpack for environment information?

They would never use backpack -- backpack is specific to OCP. They would use the underlying Ansible roles which is Stockpile.

Also, what sort of information are we looking to collect?

I like the idea of having snafu/environments for a location of Python modules that pull env metadata. Another idea that might work, since some env collection could be done in straight-up bash, is the following:

  • Create snafu/environments as a collection of tiny executable scripts which take no arguments and output environment data in the form of KEY=VALUE, with each key-value pair on a separate line. If no key-value pair was found in the output, or if the return-code is non-zero, then assume the script isn't applicable for the environment.
  • Create a module called environment.py which runs each of these bash scripts one by one, collects the output, and saves the key-value pairs to a dataclass which we can then export.

From this perspective, we can do metacollection using bash, perl, python, whatever. It'll be super lightweight, easy to maintain, easy to test and have minimal external dependencies.

But at the disadvantage that we now have to maintain code that does the same thing as backpack.

IMHO This is a step backwards. We already do this today. Check out Stockpile and Scribe.

@learnitall
Copy link
Collaborator

My concern with Scribe was that it seemed like a large dependency, which was where that bash idea came from, but if we are willing to do use it then all the merrier. My bad on the confusion.

@whitleykeith whitleykeith closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants