-
-
Notifications
You must be signed in to change notification settings - Fork 1
Make an exporter #1
Comments
@mrchrisadams I've added a Helm chart and integration test for Kubernetes support in #4 After this I'd like to add a Nomad task and integration test but I need to figure out how to test that so I did K8s first. @ofpiyush I haven't added you as a reviewer as there isn't actually any Go code. But if you would like to review it just let me know. 🙏 |
@rossf7 I'm thinking through how you might replicate the approach you took with k8s, and apply it to nomad. As I understand it with the k8s case the CI workflow is something along the lines of:
right? I think the equivalent, minimal setup for nomad would be
Is that what you had in mind? Based on the nomad docs here, I think it would be a case of:
|
@mrchrisadams Thanks, yes that's what I was thinking and running the nomad agent in dev mode sounds ideal. For the job the metrics need to be available on port 8000. As the integration test connects to http://localhost:8000/metrics. But that looks doable so I think this will work great. |
Sup @rossf7 - there's some more notes here that might help: https://discuss.hashicorp.com/t/local-development-workflow-with-nomad-consul-and-docker/3641/5 Ah… that points to a vagrantfile in the nomand repo too, showing how they set it up. That nomad set up is pretty extensive, and we might not need it all. In our case, if we follow the example of the integration tests you put together for kubernetes, this would, I suspect the key thing we'd need would be somewhere to fetch the generated https://github.com/hashicorp/nomad/blob/master/Vagrantfile |
* Add first pass at a nomad job for the grid-exporter * Update the nomad jobfile, to demonstrate using the exporter * Add semi-nonsensical, pseudo-code job * Add a simple bash script to wait til we have a running value Also update github workflow with my guesses at the necessary steps * Apply changes from #6 Co-authored-by: Ross Fairbanks <[email protected]>
hey @rossf7 I think with #5 and #4 merged in, I think the main outstanding bit before we can merge in might be some docs for this, and maybe some sketch of how it works. For a sketch, I could put together something with Plant UML, to demonstrate the configuration for it for the three configurations running on some cluster (i.e. docker, nomad, k8s). Anything else? |
Hi @mrchrisadams, Then I agree I think we can close this. |
Hi @rossf7 ! (this is a bit of a brain dump, and probably ought to be a separate issue, or even a separate project. apologies in advance for it going all over the place) As I mentioned in thegreenwebfoundation/grid-intensity-go/issues/4, I think there's a way to consume these metrics so that the Nomad scheduler can take them into account when making scheduling decisions. As you mentioned before the Nomad autoscaler can consume this exporter data. You can see it referred to in the # check for this dynamic value and use it as a criterion when making an auto scaling decision
check {
source = "prometheus"
query = "avg((haproxy_server_current_sessions{backend=\"http_back\"}))"
} However, I'm not sure if the APM plugin would be the ideal place for us experimenting, as that would be used for continuously updating jobs to auto scale to a set target. You might use this at a nomad server level, to run a query every N minutes to see if a value is in a threshold, and then decide to trigger an auto scaling event. So a policy applied to a job might look like this. : job "important-but-not-urgent-job" {
# we want to run it to completion then stop
# we might be okay with it being prempted and delayed
type = "batch"
# we need a full list of datacentres as candidates for placement, and these
# could be in different regions with different grid intensities
datacenters = ["dc1", "dc2", "dc3", "dc4"]
group "machine_learning" {
task {
driver = "docker"
config {
image = "greenweb/computationally-expensive"
}
}
scaling {
min = 0
max = 10
enabled = true
# low carbon compute policy - actively look for client nodes with a carbon intensity close
# to this level
policy {
# check every 30 mins
evaluation_interval = "30m"
# after a reshuffle, don't reschuffle again for at least an hour
cooldown = "1h"
check "target_carbon_intensity" {
source = "prometheus"
query = "scalar(local_carbon_intensity)"
# when carbon intensity of compute goes above 85 on the index (I think)
# trigger an autoscale event and reschedule.
strategy "threshold" {
upper_bound = 90
lower_bound = 00
delta = 5
}
}
}
}
}
} However, I think the APM stuff is designed to see when to trigger a re-evaluation, but it still wouldn't know about how to choose the right nodes to bin pack onto, because any carbon intensity metrics would need to be visible to the scheduler, and I don't think this would result in switching nodes off. For that, I think we'd need a way to influence the ranking phase of the scheduling, and be able to actively filter them out o the ranked list. This monster function seems to be the part that that ranks nodes, when choosing which nodes to run jobs on: https://github.com/hashicorp/nomad/blob/main/scheduler/rank.go#L193-L527 You'd probably need a way for that function to query a node's stats during the ranking phase, to query for local carbon intensity for the node, and use that as a criterion. This here looks like a sample test you might use to see if ranking candidate nodes returns them in the order you'd expect. I think we might be able to make a test demonstrating querying for a node property there, and ask the folks in the nomad community form to see how you might satisfy that test for carbon awareness. Other related linksThis post by Bill Johnson largely explains how they tried something related with k8s, to run jobs in geographically distinct places. I hadn't realised before that they use the same Watttime API that I had been checking out this week. https://devblogs.microsoft.com/sustainable-software/carbon-aware-kubernetes/ That uses the paid API. If you are only thinking about moving jobs through time, and not across geographic regions I think the watttime margin intensity index API would be sufficient, for building a prototype. See this note book for more. See also this new paper - they reckon you can get 20% carbon savings through thoughtful scheduling of work that doesn't need to be done right away. |
Hey @mrchrisadams
The APM plugin is used to store the metrics. Using Prometheus for that makes sense to me. We already have the carbon intensity exporter and adding new metrics or running more exporters is easy to do. When carbon intensity is high we will want to scale down or to zero. The target plugin can be used for horizontal cluster autoscaling which is the term used for nomad adding or removing nodes. For cloud this is straightforward e.g. on AWS you can use the auto-scaling-group target plugin. For onprem the nomad target plugin might be useful. As it would allow scaling the number of containers. If there is an API for the physical nodes then a target plugin could be written and used to decide which nodes to shutdown. Its also cloud but for example this is a target plugin for Digital Ocean. The downside is a target plugin is needed per infra provider but that seems to be the architecture they are going for. This is pretty much identical in kubernetes. The cluster-autoscaler is used for scaling nodes and there are plugins for multiple providers. I just saw that this includes Hetzner https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider We can see but I don't think they would want this logic in the nomad scheduler. It's up to the cluster operator to decide how many nodes the cluster should have. The scheduler then does the bin packing of containers to nodes. |
Ah, so if I understand correctly: the key difference is that the scheduler would never make any decisions about how big, or small the pool of compute might be - it would just take care of distributing the load onto the available nodes with the lowest carbon intensity, right? That would leave any scaling or scaling plugin to be responsible for changing the size of some combo of the pool of nodes (be these cloud VMs, physical machines) , the resources allocated to each job (i.e. the size of the tasks inside a job, as controlled by the task drive - be they containers, regular isolated for If that's the case, I can see how this might have a measurable impact even if you were just looking at the scheduler in isolation:
I think this would also leave room for the operator to make a decisions or have strategies that either:
In both cases, I think this would support using something like a internal carbon price, or internal carbon budget over a set period - you'd track cumulative CO2 emissions for a given service against it, and you could have useful discussions about the strategies you might employ like the two above to get the work done, whilst staying inside the budget (cumulative CO2 as part of an SLO, for example). Thinking through how you'd split this work.Based on what you just shared, I now think there's two possible parts of this idea.
My guess would be that of these two, the first is the more interesting one to do technologically, and would require less domain knowledge around carbon emissions - we figure out how to add another scalar value to be used in the scheduler, and then we rely on manual decisions to add or remove resources from a pool used for bin packing. The second one is would involve adding or removing from a pool of resources, and if there is freedom to choose between different regions with a single provider doing so. I figure we'd do this looking at the carbon intensity of each region and picking the lowest one, assuming it still fits the other criteria for the job. As a starting point, you could use data like this for google cloud project: https://github.com/GoogleCloudPlatform/region-carbon-info/blob/main/data/yearly/2020.csv Or this for Amazon: Which one would you be more interested in? |
Hey @rossf7 I'm gonna close this as things have moved on a bit now :) Let's look at the first thing:
Nomad now has a carbon aware scheduler in an experimental branch: https://github.com/hashicorp/nomad/blob/h-carbon-meta/CARBON.md It currently consumes data from a couple of providers, and but I can't remember if it's using any of the libraries we've worked on.
I think the idea of extending an autoscaler to grow and shrink the pool largely relies on having access to data that changes frequently enough to make autoscaling decisions worthwhile. While there is access on an individual basis to specific countries, I'm not aware of a handy feed where you would use this, so to make these calls, you'd likely need to use the electricity map provider (I think we have API keys for experimenting here) or us to implement a Wattime provider (we have keys for this too, for experimenting) |
Actually @rossf7 - would you mind closing this issue once you've had a chance to re-read this thread, and if you see the need to create any new issues in the respective libraries, you have created them? We covered quite a lot in this issue, and I didn't want to close it until we've both had a chance to revisit some of the ideas here in 2022... |
Hi @mrchrisadams, back to this. I've done another pass through the issue.
Yes, I agree without frequent fresh data an autoscaler can't make effective decisions. thegreenwebfoundation/grid-intensity-go#25 you created for the Watttime marginal intensity API looks the best option for this right now. For autoscaling a first step would be having Prometheus metrics including the more frequent data. We could then look at things like KEDA scalers or cluster-autoscaler support if needed but without frequent data they are not that useful. For the carbon branch for Nomad it isn't using grid-intensity-go directly but the data sources are the same. It's using ElectricityMap, gridintensity.org.uk plus climateiq.io https://github.com/hashicorp/nomad/blob/h-carbon-meta/CARBON.md Once the Watttime API is integrated we could add it there too. Or even create a fork of that branch to experiment with. But again thegreenwebfoundation/grid-intensity-go#25 is needed for that. So I'm going to close this but I'll post in the Watttime issue about possible next steps so we don't lose this. |
What this issue says, basically.
thegreenwebfoundation/grid-intensity-go#4
The text was updated successfully, but these errors were encountered: