Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

difference between kepler-model-server and kepler-estimator #375

Closed
jichenjc opened this issue Nov 9, 2022 · 15 comments
Closed

difference between kepler-model-server and kepler-estimator #375

jichenjc opened this issue Nov 9, 2022 · 15 comments
Assignees
Labels
kind/feature New feature or request urgent-priority Issues that are part of the next release wontfix This will not be worked on

Comments

@jichenjc
Copy link
Collaborator

jichenjc commented Nov 9, 2022

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I am getting all pod energy are 0, check code seems I need at least enable one of kepler-model-server and kepler-estimator
what's the difference between them and which one should use? the side car (https://github.com/sustainable-computing-io/kepler-estimator) ?

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@rootfs
Copy link
Contributor

rootfs commented Nov 9, 2022

@sunya-ch @KaiyiLiu1234 can you provide a high level usage description? Preferably here

@sunya-ch
Copy link
Collaborator

sunya-ch commented Nov 9, 2022

I put the general explanation about power estimation and deployment here.

With the minimum deployment (no estimator and no model server), the kepler should use an offline model weight with embed linear regression model defined by PodComponentPowerModelConfig variable (which is currently hard coded pointing to this file

PodComponentPowerModelConfig types.ModelConfig = types.ModelConfig{UseEstimatorSidecar: false, InitModelURL: dynCompURL}

The high level usage description in the document will be coming soon. I will remove the hard-coded variable and do the local test with full integration to the current version of Kepler first.

@sunya-ch
Copy link
Collaborator

sunya-ch commented Nov 9, 2022

@jichenjc As there are possible unmatched features name to the current version of Kepler, the initial model weight may not be applicable. You may further investigate the json file and the collected metric calling below.

containerComponentPowers, containerOtherPowers := model.GetContainerPower(
containerMetricValuesOnly, collector_metric.NodeMetadataValues,
nodeTotalPower, nodeTotalGPUPower, nodeTotalPowerPerComponents,
)

@rootfs
Copy link
Contributor

rootfs commented Nov 9, 2022

@jichenjc
Copy link
Collaborator Author

I put the general explanation about power estimation and deployment here.

this is great sharing , thanks ! @sunya-ch

about the original issue, I thinks it's by default there is no model server param provided in manifest
and this makes following to be come not called

pkg/collector/metric/utils.go

if config.ModelServerEndpoint != "" {
                model.InitEstimateFunctions(names, NodeMetadataNames, NodeMetadataValues)
        }

and in turn all the logic reltaed to PodComponentPowerModelConfig is not called at all

@jichenjc
Copy link
Collaborator Author

jichenjc commented Nov 10, 2022

@marceloamaral we have https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/model.go#L66
hardcoded ,then

https://github.com/sustainable-computing-io/kepler/blob/main/cmd/exporter.go#L84 set the endpoint
and the only usage is https://github.com/sustainable-computing-io/kepler/blob/main/pkg/collector/metric/utils.go#L81

so my proposal is to change this to keep it a string
https://github.com/sustainable-computing-io/kepler/blob/main/pkg/collector/metric/utils.go#L81

so we have

  • case that doesn't have model server
  • case that have model server with different URL
  • case that have want to use model server but no URL given, plan default URL (defined above)

but:
so how about make the hard code modelserver URL in manifest or DOC
so anyone want to use model server , input that hard code string or any URL he want
if no modelserver URL, do no action on model server part, but we won't block sidecar related code even the model server is "" ?

@husky-parul
Copy link
Collaborator

husky-parul commented Nov 10, 2022

IMO we should externalize this via a configmap here

apiVersion: v1
kind: ConfigMap
metadata:
  name: model-server-cfm
  namespace: kepler-system
data:
  ENABLED: (true or false)
  ENDPOINT: ("https://some.endpoint" or "")
  ...

@jichenjc
Copy link
Collaborator Author

IMO we should externalize this via a configmap here

but then we depends on model-server ?should this be a resource defined in kepler instead of kepler-model-server?

@jichenjc
Copy link
Collaborator Author

I can

  • create a configmap like above
  • if no configmap, the default is not read model server (same to not enabled)
  • honor the config map when access URL (default will be put into configmap from hard code)

thoughts?

@husky-parul
Copy link
Collaborator

husky-parul commented Nov 11, 2022

IMO we should externalize this via a configmap here

but then we depends on model-server ?should this be a resource defined in kepler instead of kepler-model-server?

Yes, that's correct. The configmap should be in Kepler, not model server deployment.

looping in @sunya-ch

@husky-parul
Copy link
Collaborator

husky-parul commented Nov 11, 2022

I can

  • create a configmap like above
  • if no configmap, the default is not read model server (same to not enabled)
  • honor the config map when access URL (default will be put into configmap from hard code)

thoughts?

@jichenjc Does it mean if no configmap, Kepler will use default model and when configmap is provided by the user it will have models provided by user to be accessed via URL

Both Minimum Deployment and Deployment with General Estimator Sidecar uses offline models and thus would need the model for the local/general estimator

@jichenjc
Copy link
Collaborator Author

@husky-parul based on my test

Does it mean if no configmap, Kepler will use default model and when configmap is provided by the user it will have models provided by user to be accessed via URL

no endpoint then
https://github.com/sustainable-computing-io/kepler/blob/main/pkg/collector/metric/utils.go#L81
will block all further operations,

if it does have some value
https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/model.go#L30
will be used as hard code ..

so I am curious on the logic here

@jichenjc
Copy link
Collaborator Author

I proposed
#384

@wangchen615 wangchen615 added the kind/feature New feature or request label Nov 14, 2022
@rootfs
Copy link
Contributor

rootfs commented Nov 15, 2022

@sunya-ch will have a discussion on GH project, also going to have this supported in operator

@wangchen615 wangchen615 added the urgent-priority Issues that are part of the next release label Nov 15, 2022
@stale
Copy link

stale bot commented May 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label May 17, 2023
@stale stale bot closed this as completed May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature New feature or request urgent-priority Issues that are part of the next release wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

6 participants