difference between kepler-model-server and kepler-estimator #375

jichenjc · 2022-11-09T04:07:59Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I am getting all pod energy are 0, check code seems I need at least enable one of kepler-model-server and kepler-estimator
what's the difference between them and which one should use? the side car (https://github.com/sustainable-computing-io/kepler-estimator) ?

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

rootfs · 2022-11-09T14:47:45Z

@sunya-ch @KaiyiLiu1234 can you provide a high level usage description? Preferably here

sunya-ch · 2022-11-09T14:55:24Z

I put the general explanation about power estimation and deployment here.

With the minimum deployment (no estimator and no model server), the kepler should use an offline model weight with embed linear regression model defined by PodComponentPowerModelConfig variable (which is currently hard coded pointing to this file

kepler/pkg/model/pod_power.go

Line 35 in 1a0f342

    
           PodComponentPowerModelConfig types.ModelConfig = types.ModelConfig{UseEstimatorSidecar: false, InitModelURL: dynCompURL}

The high level usage description in the document will be coming soon. I will remove the hard-coded variable and do the local test with full integration to the current version of Kepler first.

sunya-ch · 2022-11-09T14:59:16Z

@jichenjc As there are possible unmatched features name to the current version of Kepler, the initial model weight may not be applicable. You may further investigate the json file and the collected metric calling below.

kepler/pkg/collector/container_energy_collector.go

Lines 76 to 79 in 1a0f342

    
           containerComponentPowers, containerOtherPowers := model.GetContainerPower( 
        
           	containerMetricValuesOnly, collector_metric.NodeMetadataValues, 
        
           	nodeTotalPower, nodeTotalGPUPower, nodeTotalPowerPerComponents, 
        
           )

rootfs · 2022-11-09T18:56:47Z

cc @sallyom @husky-parul @KaiyiLiu1234

jichenjc · 2022-11-10T07:43:52Z

I put the general explanation about power estimation and deployment here.

this is great sharing , thanks ! @sunya-ch

about the original issue, I thinks it's by default there is no model server param provided in manifest
and this makes following to be come not called

pkg/collector/metric/utils.go

if config.ModelServerEndpoint != "" {
                model.InitEstimateFunctions(names, NodeMetadataNames, NodeMetadataValues)
        }

and in turn all the logic reltaed to PodComponentPowerModelConfig is not called at all

jichenjc · 2022-11-10T09:58:30Z

@marceloamaral we have https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/model.go#L66
hardcoded ,then

https://github.com/sustainable-computing-io/kepler/blob/main/cmd/exporter.go#L84 set the endpoint
and the only usage is https://github.com/sustainable-computing-io/kepler/blob/main/pkg/collector/metric/utils.go#L81

so my proposal is to change this to keep it a string
https://github.com/sustainable-computing-io/kepler/blob/main/pkg/collector/metric/utils.go#L81

so we have

case that doesn't have model server
case that have model server with different URL
case that have want to use model server but no URL given, plan default URL (defined above)

but:
so how about make the hard code modelserver URL in manifest or DOC
so anyone want to use model server , input that hard code string or any URL he want
if no modelserver URL, do no action on model server part, but we won't block sidecar related code even the model server is "" ?

husky-parul · 2022-11-10T15:28:01Z

IMO we should externalize this via a configmap here

apiVersion: v1
kind: ConfigMap
metadata:
  name: model-server-cfm
  namespace: kepler-system
data:
  ENABLED: (true or false)
  ENDPOINT: ("https://some.endpoint" or "")
  ...

jichenjc · 2022-11-11T01:33:14Z

IMO we should externalize this via a configmap here

but then we depends on model-server ?should this be a resource defined in kepler instead of kepler-model-server?

jichenjc · 2022-11-11T01:34:47Z

I can

create a configmap like above
if no configmap, the default is not read model server (same to not enabled)
honor the config map when access URL (default will be put into configmap from hard code)

thoughts?

husky-parul · 2022-11-11T02:40:45Z

IMO we should externalize this via a configmap here

but then we depends on model-server ?should this be a resource defined in kepler instead of kepler-model-server?

Yes, that's correct. The configmap should be in Kepler, not model server deployment.

looping in @sunya-ch

husky-parul · 2022-11-11T03:04:25Z

I can

create a configmap like above

if no configmap, the default is not read model server (same to not enabled)

honor the config map when access URL (default will be put into configmap from hard code)

thoughts?

@jichenjc Does it mean if no configmap, Kepler will use default model and when configmap is provided by the user it will have models provided by user to be accessed via URL

Both Minimum Deployment and Deployment with General Estimator Sidecar uses offline models and thus would need the model for the local/general estimator

jichenjc · 2022-11-14T00:45:38Z

@husky-parul based on my test

Does it mean if no configmap, Kepler will use default model and when configmap is provided by the user it will have models provided by user to be accessed via URL

no endpoint then
https://github.com/sustainable-computing-io/kepler/blob/main/pkg/collector/metric/utils.go#L81
will block all further operations,

if it does have some value
https://github.com/sustainable-computing-io/kepler/blob/main/pkg/model/model.go#L30
will be used as hard code ..

so I am curious on the logic here

jichenjc · 2022-11-14T02:52:06Z

I proposed
#384

rootfs · 2022-11-15T13:48:13Z

@sunya-ch will have a discussion on GH project, also going to have this supported in operator

stale · 2023-05-17T18:12:01Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

rootfs assigned KaiyiLiu1234 and sunya-ch Nov 9, 2022

jichenjc mentioned this issue Nov 14, 2022

use local model if no model server endpoint given #384

Merged

wangchen615 added the kind/feature New feature or request label Nov 14, 2022

wangchen615 added the urgent-priority Issues that are part of the next release label Nov 15, 2022

jichenjc mentioned this issue Dec 12, 2022

e2e tests for Kepler, estimator, and model server #456

Closed

2 tasks

stale bot added the wontfix This will not be worked on label May 17, 2023

stale bot closed this as completed May 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

difference between kepler-model-server and kepler-estimator #375

difference between kepler-model-server and kepler-estimator #375

jichenjc commented Nov 9, 2022

rootfs commented Nov 9, 2022

sunya-ch commented Nov 9, 2022

sunya-ch commented Nov 9, 2022

rootfs commented Nov 9, 2022

jichenjc commented Nov 10, 2022

jichenjc commented Nov 10, 2022 •

edited

Loading

husky-parul commented Nov 10, 2022 •

edited

Loading

jichenjc commented Nov 11, 2022

jichenjc commented Nov 11, 2022

husky-parul commented Nov 11, 2022 •

edited

Loading

husky-parul commented Nov 11, 2022 •

edited

Loading

jichenjc commented Nov 14, 2022

jichenjc commented Nov 14, 2022

rootfs commented Nov 15, 2022

stale bot commented May 17, 2023

difference between kepler-model-server and kepler-estimator #375

difference between kepler-model-server and kepler-estimator #375

Comments

jichenjc commented Nov 9, 2022

rootfs commented Nov 9, 2022

sunya-ch commented Nov 9, 2022

sunya-ch commented Nov 9, 2022

rootfs commented Nov 9, 2022

jichenjc commented Nov 10, 2022

jichenjc commented Nov 10, 2022 • edited Loading

husky-parul commented Nov 10, 2022 • edited Loading

jichenjc commented Nov 11, 2022

jichenjc commented Nov 11, 2022

husky-parul commented Nov 11, 2022 • edited Loading

husky-parul commented Nov 11, 2022 • edited Loading

jichenjc commented Nov 14, 2022

jichenjc commented Nov 14, 2022

rootfs commented Nov 15, 2022

stale bot commented May 17, 2023

jichenjc commented Nov 10, 2022 •

edited

Loading

husky-parul commented Nov 10, 2022 •

edited

Loading

husky-parul commented Nov 11, 2022 •

edited

Loading

husky-parul commented Nov 11, 2022 •

edited

Loading