-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Work_Item] Facilitate shared cost allocation as calculated by the provider #72
Comments
@cnharris10 Could you please chase this down and see if this should become a work item or a discussion topic? |
@cnharris10 The group asked for further information during the TF-1 call on May 28. |
I'm interested in this one for v1.1 |
Some context for the AWS and GCP implementations of this idea: https://docs.aws.amazon.com/cur/latest/userguide/split-cost-allocation-data.html Would the scope of this be only for containers, or would we be looking to expand it beyond that to include any service with usage-based breakdowns? For example, here's a bunch of hoops one can jump through to allocate shared Fabric costs. Painful, but super important for allocation purposes. https://pbi-guy.com/2024/03/30/how-to-extract-data-from-the-fabric-metrics-app-part-1/ |
AWS supports tasks (ECS), pods (EKS), and jobs (Batch). I'd hope we could solve generally across various orchestration systems: K8s, EMR, Dataproc/Dataflow, Spark, Flink, etc. |
Discussed in Oct 22 TF1 call. Need to talk about this one a little bit more to align on scope: is it just orchestration and/or container services, or is it more holistic than that? This is a big conceptual topic, and I believe Chris' proposal is sticking to a narrow scope for 1.2 -- but let's discuss more in calls to ensure that everyone understands the overall concept. |
K8S is taking more and more weightage for the Cloud costs (passing the 50% mark). Better breaking down of those costs is becoming essential. |
In the Oct 29 TF1 call today, we discussed revising the scope of this FYI that we also discussed a net-new |
The intention of this work item is to classify an approach for similar shared allocation models. If "compute" is too narrow and can be expanded to other examples that closely relate, then I'm in support. |
Action Items from TF-1 call on Oct 29:
|
@cnharris10 I have now modified this |
What about direct cost allocation (not shared)? Allocation vision and strategy in general with FOCUS needs to be discussed |
Maintainers notes from Nov 4 call:Context: This task involves developing a model for shared cost allocation within compute clusters. Initial discussions focused on the broader concept of shared cost allocation but were narrowed down to provider-generated data to simplify the scope. This distinction helps streamline the process and make implementation feasible within a single release. |
Action Items from the TF-1 call on November 5:
|
Comments from the Members' call on November 7:#72: TF-1 is working on cost allocation strategies for multi-provider models, addressing cases where multiple resources feed into a single service element, such as clustered resources. The current focus is on allowing providers to share their allocation metadata within the specification. |
1. Problem Statement *
With the emergence of container-based computing and orchestration systems, logical resource configuration commonly occurs even below the virtual and/or physical machines layer. This has created an enormous cost-allocation problem since CSP's commonly report computing costs at the virtual/physical machine layer while applications are now partitioned at the sub-machine levels: container, pod (Kubernetes), task (ECS), executor (Spark), or similar entity level.
With resources configured and fluctuating at this granularity, practitioners must be able to see cost and usage metrics allocated at this level to report accurate showback and chargeback totals.
There are two use cases to consider when solving this problem:
Consume allocated costs as calculated by the provider
Consume usage metrics that facilitate cost allocations as calculated by the practitioner
This
Work Item
attempts to address use case 1; use case 2 is out of scope as it requires access to datasets that will likely never be a part of FOCUS; thus, it shall be addressed via supporting content controlled by a separateWork Item
TBD.There are some services where providers solve for this. Examples include:
There are plenty of other services for which the providers do not yet solve for this, but perhaps they will in the future. Examples include:
While still in its infancy, creating a common standard for reporting shared costs across systems would allow FOCUS to create a standard for major CSP's to largely adopt at their onset, rather than backing into existing models.
2. Objective *
Practitioners will be able to consume the allocations of shared cost and usage metrics more easily and accurately across multiple providers.
3. Supporting Documentation *
As of 2023, 2 of the top 3 CSP's have recently released bespoke solutions for allocating costs below the virtual machine layer within cost and usage datasets, including AWS' split-costs for EKS, ECS, and Batch and GCP's GKE cost allocation at the cluster or namespace level with customer's opting into this additional data.
AWS:
GCP:
4. Proposed Solution / Approach
At these layers, customers commonly want to understand how CPU/vCPU/core, RAM, GPU, networking, etc. cost and usage metrics allocate to these sub-machine levels inclusive of unused resources as well.
For example, a VM machine with 4 cores, 8GB RAM with 3 containers given the following configurations:
will likely see fluctuations of resources used over time, as well as potentially see some resources remain unused or wasted. These allocations are vital producing accurate cost allocation to power chargeback/showback methodologies for practitioners.
A successful solution will allow practitioners to see cost and usage totals for these cost buckets accurately and allow for additional expansions of other shared resources across computing clusters.
The solution we craft will ideally handle generically/holistically for any example of a service that allocates the grain of shared costs from a higher level (e.g. cluster, capacity) down to a lower level (e.g. node, pod, container, core).
5. Epic or Theme Association
TBD
6. Stakeholders *
Companies expressing desire for this feature
The text was updated successfully, but these errors were encountered: