Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e tests for Kepler, estimator, and model server #456

Closed
2 tasks
rootfs opened this issue Dec 9, 2022 · 7 comments
Closed
2 tasks

e2e tests for Kepler, estimator, and model server #456

rootfs opened this issue Dec 9, 2022 · 7 comments
Labels
wontfix This will not be worked on

Comments

@rootfs
Copy link
Contributor

rootfs commented Dec 9, 2022

Is your feature request related to a problem? Please describe.
Having all of the components e2e tested on baremetal and VM (especially on CI)

Describe the solution you'd like
The tests should verify that:

  • all the components are configured correctly, up and running
  • the models (ebpf, cgroup, etc) can be trained and updated online
@jichenjc
Copy link
Collaborator

some proposal

@sunya-ch
Copy link
Collaborator

some proposal

Yes, it is mutual exclusive.
We can have the deployment scenario as described here: https://sustainable-computing.io/design/power_estimation/#deployment-scenarios.

  • minimum (no sidecar, no model server) -> kepler local estimator will use LR model weight downloaded by initial URL.
  • with sidecar only -> kepler exporter requests estimated power from the sidecar. The sidecar estimator will use achieved (currently GBR) model downloaded by initial URL.
  • with model server only -> kepler local estimator will request LR model weight from model server.
  • with both (full deployment) -> kepler exporter requests estimated power from the sidecar. The sidecar estimator will request achieved model from model server.

@rootfs
Copy link
Contributor Author

rootfs commented Dec 12, 2022

@jichenjc Pods metrics are ok, but the node metrics are currently all zero, we'll deploy a node power model in the CI

@sunya-ch
Copy link
Collaborator

sunya-ch commented Dec 13, 2022

Idea to create e2e tests for integration.

  • make build-manifest
  • make build-manifest OPTS="ESTIMATOR_SIDECAR_DEPLOY"
    To check whether sidecar is properly set, according to [estimator sidecar integration] model config not applied. #461, check log line:
    Model DynComponentPower initiated (true)
  • make build-manifest OPTS="MODEL_SERVER_DEPLOY"
    To check whether model-sever is connected, according to [model server integration] model server not connected #463, check log line:
    LR Model (AbsComponentModelWeight): getWeightFromServer: map[...
  • make build-manifest OPTS="ESTIMATOR_SIDECAR_DEPLOY MODEL_SERVER_DEPLOY"
    To check whether both connected,
    1. check sidecar is properly set
    2. check model-server is connected sidecar (TBD)

btw, the above log comes with v3 log. We may add another OPT like DEBUG to patch the command with -v=3 or 5 in manfest kustomize.

@jichenjc
Copy link
Collaborator

check log line:

even though we can check the logs but seems my understanding is integration test is mostly a focus on functions
instead of logs ,so the system should work like a black box.. so instead of checking those logs, is there anyway
we can expose some endpoint (like debug endpoint) to curl and see whether the functions works well?

@sunya-ch
Copy link
Collaborator

check log line:

even though we can check the logs but seems my understanding is integration test is mostly a focus on functions instead of logs ,so the system should work like a black box.. so instead of checking those logs, is there anyway we can expose some endpoint (like debug endpoint) to curl and see whether the functions works well?

I think it is a good ise. we can expose all the status of kepler not only the success/failure of the connection and model config but also the available metrics from its discovery. Then we can let operator use these endpoint to update the Kepler CR status.

@stale
Copy link

stale bot commented May 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label May 17, 2023
@stale stale bot closed this as completed May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants