Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of Collectors and Sample Abstraction Request #290

Open
Maxusmusti opened this issue Jul 5, 2021 · 2 comments
Open

Addition of Collectors and Sample Abstraction Request #290

Maxusmusti opened this issue Jul 5, 2021 · 2 comments

Comments

@Maxusmusti
Copy link

Maxusmusti commented Jul 5, 2021

Hi! I'm using this space to write about my current progress, updates and plans. The biggest piece right now is the addition of "collectors to the benchmark wrapper":

COLLECTORS

  • Introduces the ability to run data collection tools alongside benchmark runs

  • For example, running pbench data collection alongside a fio or uperf benchmark

  • Structure:

    • Added --collector and --collector-config options to run_snafu (as well as --upload)
    • Abstract collector class specifies need for config file, as well as:
      • set_config_vars(config): Parses through the config file and sets any specified vars / defaults, makes sure everything that needs to be in the file is.
      • startup(): Performs env setup and initialization of necessary processes, as well as begins any persistent data collection processes (for continuous data collection over the course of the benchmark).
      • start_sample(): Starts a data collection sample (for sample-based data collection tools, synced with benchmark samples). Return the sample archive dir.
      • stop_sample(): Stops a data collection sample.
      • shutdown(): Stops persistent data collcetion and other processes, performs cleanup and any desired post-processing.
      • upload(): Uploads a collector's archives using a specified procedure to a desired location.
    • The necessary collector_factory, _load_collectors, __init__ files, and changes to registry have also been made
    • In run_snafu, if a collector has been specified with a config, it will create/initialize the desired collector class and run that startup tasks. Then, before each benchmark sample, it will start a data collection sample which will be stopped once the benchmark sample ends. After the benchmark is complete, the collector shutdown/post-process tasks are run.
    • The upload option:
      • A user can also run snafu with the --upload option while still including collector + config
      • This will then upload the created archives for a specific collector to a location specified/authorized through a combination of the upload() method and the config file.
      • For example, this option is used for moving pbench archives to the pbench server, and auth info/locations can be specified through the config file
    • Adding a Collector:
      • A really simple process
      • Just create a collector_name dir under the main collector dir
      • Add an init file, a sample config, and the collector_name.py file that creates a Collector_name class based on the abstract collector.
      • Robert Krawitz is also currently interested in implementing his prom_extract tool as another collector

PBENCH

  • Note that all of this structure already exists and has been written, and pbench has been fully implemented using it (upload option still needs a bit of work).
  • To see all of this work, go to: my pbench integration PR
  • This has all been built off of upstream branch: learnitall/feature-extend-oo-structure
  • I also have a small demo in the latest tools meeting recording, and can do another whenever desired

ISSUES/TO-DO

  • Add support to specify/run multiple collectors for a given benchmark run
  • SAMPLE ISSUES (described in more detail below)
  • How to handle dockerfiles to have environments with both necessary collector bits and benchmark bits

SAMPLE ISSUE

  • Currently no universal notion of a sample, samples are defined within the benchmark, with each benchmark having its own for-loop iterating through the number of samples and yielding results.
  • No way to sync collector samples with benchmark samples (or do any other work revolving around benchmark samples) without hard-coding hacks into each individual benchmark.
  • Instead, would like to request that the new benchmark rewrite fix this problem by abstracting the main sample loop and introducing a universal definition of sample:
    • Basically, instead of having a collect() method in the benchmarks where a for-loop is run over the number of samples with each iteration yielding one sample result, simply have a collect_sample() method that collects one sample of the benchmark (essentially the same code minus the for-loop), then have that called in the new sample loop in run_snafu at each iteration instead.
    • That way, in the main sample loop, I could start a collector sample, run a benchmark sample, then stop the collector sample, and then the rest of the benchmark sample processing/indexing would stay the same for each iteration.
    • PSEUDO-CODE:
      • OLD:
        for data in benchmark.run():
          (current indexing stuff)
          ...and then inside the .run() there would be a for loop for sample in range(samples)....(code to collect sample)
      • NEW:
        for sample in range(samples):
          collector.start_sample()
          data = benchmark.collect_sample() ...now just has (code to collect sample), no for sample in range(samples)
          collector.stop_sample()
          (current indexing stuff)
    • Would help a lot with current/future work, and also would likely help with unit testing and whatnot.
    • This is more of an example of a solution rather than the "one perfect solution", so I'm sure people may think of better ones. Rather, I'm just looking to add this level of abstraction to allow for proper integration of stuff like collectors, as well as new features in the future.
    • Currently, what collectors are ACTUALLY doing are just running one collection sample over the course of the whole benchmark run process, because that's all that is currently possible.
@jtaleric
Copy link
Member

jtaleric commented Jul 9, 2021

Hi! I'm using this space to write about my current progress, updates and plans. The biggest piece right now is the addition of "collectors to the benchmark wrapper":

COLLECTORS

  • Introduces the ability to run data collection tools alongside benchmark runs

SNAFU/Benchmark-wrapper is to run benchmarks, not orchestrate outside tools.

  • For example, running pbench data collection alongside a fio or uperf benchmark

I think if we want pbench collection in a K8s deployment, we orchestrate the pbench agents via Rispaw/Benchmark-Operator.

  • Structure:

    • Added --collector and --collector-config options to run_snafu (as well as --upload)

    • Abstract collector class specifies need for config file, as well as:

      • set_config_vars(config): Parses through the config file and sets any specified vars / defaults, makes sure everything that needs to be in the file is.
      • startup(): Performs env setup and initialization of necessary processes, as well as begins any persistent data collection processes (for continuous data collection over the course of the benchmark).
      • start_sample(): Starts a data collection sample (for sample-based data collection tools, synced with benchmark samples). Return the sample archive dir.
      • stop_sample(): Stops a data collection sample.
      • shutdown(): Stops persistent data collcetion and other processes, performs cleanup and any desired post-processing.
      • upload(): Uploads a collector's archives using a specified procedure to a desired location.
    • The necessary collector_factory, _load_collectors, __init__ files, and changes to registry have also been made

    • In run_snafu, if a collector has been specified with a config, it will create/initialize the desired collector class and run that startup tasks. Then, before each benchmark sample, it will start a data collection sample which will be stopped once the benchmark sample ends. After the benchmark is complete, the collector shutdown/post-process tasks are run.

Most of this functionality could be implemented in the Operator.

  • The upload option:

    • A user can also run snafu with the --upload option while still including collector + config
    • This will then upload the created archives for a specific collector to a location specified/authorized through a combination of the upload() method and the config file.
    • For example, this option is used for moving pbench archives to the pbench server, and auth info/locations can be specified through the config file

Should the PBench agents simply upload the data to the pbench-server, or pbench-controller after the collection is completed?

  • Adding a Collector:

    • A really simple process
    • Just create a collector_name dir under the main collector dir
    • Add an init file, a sample config, and the collector_name.py file that creates a Collector_name class based on the abstract collector.
    • Robert Krawitz is also currently interested in implementing his prom_extract tool as another collector

I discussed this with Robert and the CNV team. Ripsaw/Benchmark-Operator already has a task to collect system metrics and send to ES. https://github.com/cloud-bulldozer/benchmark-operator/blob/master/roles/system-metrics/tasks/main.yml

The role here could be used for the PBench collection work.

PBENCH

  • Note that all of this structure already exists and has been written, and pbench has been fully implemented using it (upload option still needs a bit of work).
  • To see all of this work, go to: my pbench integration PR
  • This has all been built off of upstream branch: learnitall/feature-extend-oo-structure
  • I also have a small demo in the latest tools meeting recording, and can do another whenever desired

ISSUES/TO-DO

  • Add support to specify/run multiple collectors for a given benchmark run
  • SAMPLE ISSUES (described in more detail below)
  • How to handle dockerfiles to have environments with both necessary collector bits and benchmark bits

SAMPLE ISSUE

  • Currently no universal notion of a sample, samples are defined within the benchmark, with each benchmark having its own for-loop iterating through the number of samples and yielding results.

  • No way to sync collector samples with benchmark samples (or do any other work revolving around benchmark samples) without hard-coding hacks into each individual benchmark.

  • Instead, would like to request that the new benchmark rewrite fix this problem by abstracting the main sample loop and introducing a universal definition of sample:

    • Basically, instead of having a collect() method in the benchmarks where a for-loop is run over the number of samples with each iteration yielding one sample result, simply have a collect_sample() method that collects one sample of the benchmark (essentially the same code minus the for-loop), then have that called in the new sample loop in run_snafu at each iteration instead.

    • That way, in the main sample loop, I could start a collector sample, run a benchmark sample, then stop the collector sample, and then the rest of the benchmark sample processing/indexing would stay the same for each iteration.

    • PSEUDO-CODE:

      • OLD:
        for data in benchmark.run():
          (current indexing stuff)
          ...and then inside the .run() there would be a for loop for sample in range(samples)....(code to collect sample)
      • NEW:
        for sample in range(samples):
          collector.start_sample()
          data = benchmark.collect_sample() ...now just has (code to collect sample), no for sample in range(samples)
          collector.stop_sample()
          (current indexing stuff)
    • Would help a lot with current/future work, and also would likely help with unit testing and whatnot.

    • This is more of an example of a solution rather than the "one perfect solution", so I'm sure people may think of better ones. Rather, I'm just looking to add this level of abstraction to allow for proper integration of stuff like collectors, as well as new features in the future.

    • Currently, what collectors are ACTUALLY doing are just running one collection sample over the course of the whole benchmark run process, because that's all that is currently possible.

@learnitall
@dry923
@rsevilla87
@mohit-sheth
@whitleykeith ;)

Any additional thoughts?

@whitleykeith
Copy link

Hi! I'm using this space to write about my current progress, updates and plans. The biggest piece right now is the addition of "collectors to the benchmark wrapper":

COLLECTORS

Ultimately I'm in agreement with @jtaleric on this. The benchmark-wrapper shouldn't be responsible for integrating with other in-process tools. I'm not against maybe supporting a different output format/target but the data collection shouldn't be tied at the process level.

I think it makes sense that any "collector" would be done in a sidecar similar to how Jaeger works https://github.com/jaegertracing/jaeger-kubernetes

I'm not even sure I want prom_extract to be a feature of SNAFU, and it definitely shouldn't be a collector. We have kube-burner for that.

SAMPLE ISSUE

  • Currently no universal notion of a sample, samples are defined within the benchmark, with each benchmark having its own for-loop iterating through the number of samples and yielding results.

  • No way to sync collector samples with benchmark samples (or do any other work revolving around benchmark samples) without hard-coding hacks into each individual benchmark.

  • Instead, would like to request that the new benchmark rewrite fix this problem by abstracting the main sample loop and introducing a universal definition of sample:

    • Basically, instead of having a collect() method in the benchmarks where a for-loop is run over the number of samples with each iteration yielding one sample result, simply have a collect_sample() method that collects one sample of the benchmark (essentially the same code minus the for-loop), then have that called in the new sample loop in run_snafu at each iteration instead.

    • That way, in the main sample loop, I could start a collector sample, run a benchmark sample, then stop the collector sample, and then the rest of the benchmark sample processing/indexing would stay the same for each iteration.

    • PSEUDO-CODE:

      • OLD:
        for data in benchmark.run():
          (current indexing stuff)
          ...and then inside the .run() there would be a for loop for sample in range(samples)....(code to collect sample)
      • NEW:
        for sample in range(samples):
          collector.start_sample()
          data = benchmark.collect_sample() ...now just has (code to collect sample), no for sample in range(samples)
          collector.stop_sample()
          (current indexing stuff)
    • Would help a lot with current/future work, and also would likely help with unit testing and whatnot.

    • This is more of an example of a solution rather than the "one perfect solution", so I'm sure people may think of better ones. Rather, I'm just looking to add this level of abstraction to allow for proper integration of stuff like collectors, as well as new features in the future.

    • Currently, what collectors are ACTUALLY doing are just running one collection sample over the course of the whole benchmark run process, because that's all that is currently possible.

I completely agree that there should be one universal implementation for a "sample" in SNAFU, and I think your example makes perfect sense (minus wrapping it with a collector 😄 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants