[Work_Item] Facilitate shared cost allocation as calculated by the provider #72

cnharris10 · 2023-05-19T12:13:36Z

1. Problem Statement *

What is the problem?: Explain the context and why it needs resolution.
Impact: Describe how the problem affects users, systems, or the project.

With the emergence of container-based computing and orchestration systems, logical resource configuration commonly occurs even below the virtual and/or physical machines layer. This has created an enormous cost-allocation problem since CSP's commonly report computing costs at the virtual/physical machine layer while applications are now partitioned at the sub-machine levels: container, pod (Kubernetes), task (ECS), executor (Spark), or similar entity level.

With resources configured and fluctuating at this granularity, practitioners must be able to see cost and usage metrics allocated at this level to report accurate showback and chargeback totals.

There are two use cases to consider when solving this problem:

Consume allocated costs as calculated by the provider
Consume usage metrics that facilitate cost allocations as calculated by the practitioner

This Work Item attempts to address use case 1; use case 2 is out of scope as it requires access to datasets that will likely never be a part of FOCUS; thus, it shall be addressed via supporting content controlled by a separate Work Item TBD.

There are some services where providers solve for this. Examples include:

There are plenty of other services for which the providers do not yet solve for this, but perhaps they will in the future. Examples include:

Microsoft Fabric (example of practitioner chargeback here)

While still in its infancy, creating a common standard for reporting shared costs across systems would allow FOCUS to create a standard for major CSP's to largely adopt at their onset, rather than backing into existing models.

2. Objective *

State the objective of this work item. What outcome is expected?
Success Criteria: Define how success will be measured (e.g. metrics and KPIs).

Practitioners will be able to consume the allocations of shared cost and usage metrics more easily and accurately across multiple providers.

3. Supporting Documentation *

Include links to supporting documents such as:

Data Examples: [Link to data or relevant files; DO NOT share proprietary information]

Related Use Cases or Discussion Documents: [Link to discussion]

PRs or Other References: [Link to relevant references]

As of 2023, 2 of the top 3 CSP's have recently released bespoke solutions for allocating costs below the virtual machine layer within cost and usage datasets, including AWS' split-costs for EKS, ECS, and Batch and GCP's GKE cost allocation at the cluster or namespace level with customer's opting into this additional data.

AWS:

ECS / Batch: https://aws.amazon.com/about-aws/whats-new/2023/04/aws-split-cost-allocation-data-amazon-ecs-batch/ (2023)
EKS Split Cost: https://aws.amazon.com/about-aws/whats-new/2024/04/aws-split-cost-allocation-data-amazon-eks/ (2024)

GCP:

GKE Allocation: https://groups.google.com/g/gcp-release-notes/c/pr_S3DS6tYI?pli=1 (2022)

4. Proposed Solution / Approach

Outline any proposed solutions, approaches, or potential paths forward. Do not submit detailed solutions; please keep suggestions high-level.

Initial Ideas: Describe potential solution paths, tools, or technologies.
Considerations: Include any constraints, dependencies, or risks.
Feasibility: Include any information that helps quantify feasibility, such as perceived level of effort to augment the spec, or existing fields in current data generator exports.
Benchmarks: Are there established best practices for solving this problem available to practitioners today (e.g. mappings from existing CSP exports that are widely used)?

At these layers, customers commonly want to understand how CPU/vCPU/core, RAM, GPU, networking, etc. cost and usage metrics allocate to these sub-machine levels inclusive of unused resources as well.

For example, a VM machine with 4 cores, 8GB RAM with 3 containers given the following configurations:

Container 1: 2 cores (max: 3), 2GB RAM (max: 3GB)
Container 2: 3 cores (max: 3), 3GB RAM (max: 3GB)
Container 3: 1 cores (max: 2), 1GB RAM (max: 2GB)

will likely see fluctuations of resources used over time, as well as potentially see some resources remain unused or wasted. These allocations are vital producing accurate cost allocation to power chargeback/showback methodologies for practitioners.

A successful solution will allow practitioners to see cost and usage totals for these cost buckets accurately and allow for additional expansions of other shared resources across computing clusters.

The solution we craft will ideally handle generically/holistically for any example of a service that allocates the grain of shared costs from a higher level (e.g. cluster, capacity) down to a lower level (e.g. node, pod, container, core).

5. Epic or Theme Association

This section will be completed by the Maintainers.

Epic: [Epic Name]
Theme: [Theme Name, if applicable]

TBD

6. Stakeholders *

List the main stakeholders for this issue.

Primary Stakeholders: [Name/Role]
Other Involved Parties: [Names/Roles]

Richard Wang @richwang99
- K8S is taking more and more weightage for the Cloud costs (passing the 50% mark). Better breaking down of those costs is becoming essential.
Shreya Ambast (Atlassian) @Shreya-Ambast
- We also need this for proper cost allocation for Kube clusters. Currently, we are doing this ourselves, but would love to have this as a readymade solution.
Abhishek Mane (DigitalEx)
- We have a lot of customers asking about shared cost allocation as their customers shares the clusters & they would like to see Cost by Namespaces.

Companies expressing desire for this feature

Atlassian
Australian Retirement Trust

The text was updated successfully, but these errors were encountered:

udam-f2 · 2024-01-22T23:38:37Z

@cnharris10 Could you please chase this down and see if this should become a work item or a discussion topic?

jpradocueva · 2024-05-28T16:19:14Z

@cnharris10 The group asked for further information during the TF-1 call on May 28.

AWS-ZachErdman · 2024-05-30T03:15:21Z

I'm interested in this one for v1.1

shawnalpay · 2024-09-09T23:35:37Z

Some context for the AWS and GCP implementations of this idea:

https://docs.aws.amazon.com/cur/latest/userguide/split-cost-allocation-data.html
https://cloud.google.com/kubernetes-engine/docs/how-to/cost-allocations

Would the scope of this be only for containers, or would we be looking to expand it beyond that to include any service with usage-based breakdowns? For example, here's a bunch of hoops one can jump through to allocate shared Fabric costs. Painful, but super important for allocation purposes.

https://pbi-guy.com/2024/03/30/how-to-extract-data-from-the-fabric-metrics-app-part-1/

cnharris10 · 2024-09-10T22:56:56Z

Some context for the AWS and GCP implementations of this idea:

https://docs.aws.amazon.com/cur/latest/userguide/split-cost-allocation-data.html
https://cloud.google.com/kubernetes-engine/docs/how-to/cost-allocations

Would the scope of this be only for containers, or would we be looking to expand it beyond that to include any service with usage-based breakdowns? For example, here's a bunch of hoops one can jump through to allocate shared Fabric costs. Painful, but super important for allocation purposes.

https://pbi-guy.com/2024/03/30/how-to-extract-data-from-the-fabric-metrics-app-part-1/

AWS supports tasks (ECS), pods (EKS), and jobs (Batch). I'd hope we could solve generally across various orchestration systems: K8s, EMR, Dataproc/Dataflow, Spark, Flink, etc.

shawnalpay · 2024-10-22T16:38:08Z

Discussed in Oct 22 TF1 call. Need to talk about this one a little bit more to align on scope: is it just orchestration and/or container services, or is it more holistic than that? This is a big conceptual topic, and I believe Chris' proposal is sticking to a narrow scope for 1.2 -- but let's discuss more in calls to ensure that everyone understands the overall concept.

richwang99 · 2024-10-23T03:39:56Z

K8S is taking more and more weightage for the Cloud costs (passing the 50% mark). Better breaking down of those costs is becoming essential.

shawnalpay · 2024-10-29T18:24:29Z

In the Oct 29 TF1 call today, we discussed revising the scope of this Work Item to be able to generically handle not only for compute clusters (e.g. AWS ECS, GCP GKE), but also other types of services that can be allocated (e.g. OCI pluggable databases), as well as any other services that may be attributed down the road. This would holistically handle for the use case of Consume allocated costs as calculated by the provider. @cnharris10, do you agree with this approach, and if so, are you amenable to revising this Work Item to reflect that? Happy to huddle and discuss if you like.

FYI that we also discussed a net-new Work Item to handle for a separate but related use case of Consume usage metrics that facilitate cost allocations as calculated by the practitioner, which would like result in supporting content rather than a spec change, and which @ahullah and @tobrien will craft.

cnharris10 · 2024-10-29T18:59:48Z

The intention of this work item is to classify an approach for similar shared allocation models. If "compute" is too narrow and can be expanded to other examples that closely relate, then I'm in support.

jpradocueva · 2024-10-30T01:23:19Z

Action Items from TF-1 call on Oct 29:

[#72] Alex @ahullah & Tim @tobrien : Draft a work item detailing concepts for future holistic allocation patterns beyond computing clusters.
[#72] Chris @cnharris10 & Shawn @shawnalpay : Expand the current work item to outline patterns for generic cost allocation across cloud services, ensuring the scalability of new services as they emerge.

shawnalpay · 2024-10-30T19:51:00Z

@cnharris10 I have now modified this Work Item to more holistically include all provider-generated shared cost allocations, not just compute clusters. Give it a look and let me know if it looks alright to you.

jpradocueva · 2024-11-02T04:32:12Z

Action Items from Members' call on Oct 31:

[#72] Alex @ahullah and Tim @tobrien : Revise the work item to specify use cases and align it with provider-focused data.
[#72] All TF1 members: Review the draft for shared cost allocation and provide feedback on proposed divisions

ljadvey · 2024-11-04T21:31:16Z

What about direct cost allocation (not shared)? Allocation vision and strategy in general with FOCUS needs to be discussed

jpradocueva · 2024-11-05T04:00:35Z

Maintainers notes from Nov 4 call:

Context: This task involves developing a model for shared cost allocation within compute clusters. Initial discussions focused on the broader concept of shared cost allocation but were narrowed down to provider-generated data to simplify the scope. This distinction helps streamline the process and make implementation feasible within a single release.
Level of Effort Required: Very High — Handling shared costs for compute clusters, especially in containerized environments, involves complex many-to-many relationships and provider-specific solutions, necessitating decomposition of the task.
**Level of Impact: ** Very High – This work item has a significant impact on practitioners, as shared cost allocation is essential for accurate cost distribution, particularly in complex, containerized environments. Effective cost allocation is a key metric for resource optimization in FinOps.

jpradocueva · 2024-11-06T00:21:39Z

Action Items from the TF-1 call on November 5:

[#638] Alex @ahullah : Augment [Work_Item] Practitioner defined cost allocation #638 to clarify artifact expectations, focusing on FOCUS-specific data and outputs for practitioner allocations.
[#72] Team: Continue discovering existing provider data structures to understand viable FOCUS configurations for split costs.

jpradocueva · 2024-11-08T03:32:40Z

Comments from the Members' call on November 7:

#72: TF-1 is working on cost allocation strategies for multi-provider models, addressing cases where multiple resources feed into a single service element, such as clustered resources. The current focus is on allowing providers to share their allocation metadata within the specification.

cnharris10 added the proposal label May 19, 2023

udam-f2 assigned cnharris10 Jan 22, 2024

jpradocueva added this to FOCUS WG Feb 13, 2024

github-project-automation bot moved this to Triage in FOCUS WG Feb 13, 2024

jpradocueva moved this from Triage to Parking Lot in FOCUS WG Feb 13, 2024

jpradocueva added this to the v1.x milestone Feb 29, 2024

flanakin changed the title ~~[Proposal] Create a generalized shared cost billing model~~ Create a generalized shared cost billing model Mar 3, 2024

jpradocueva removed the proposal label May 17, 2024

jpradocueva assigned MK88NTAP May 17, 2024

shawnalpay added needs stakeholder input Items to review with stakeholders to quantify importance and further details discussion topic Item or question to be discussed by the community shared costs Related to shared costs labels Oct 2, 2024

cnharris10 added the 1.2 consideration To be considered for release 1.2 label Oct 11, 2024

shawnalpay added the needs work item Needs an issue that adheres to the Work Item issue template, prior to consideration by stakeholders label Oct 16, 2024

cnharris10 changed the title ~~Create a generalized shared cost billing model~~ [Work_Item] Create a standardized model for orchestration-based compute clusters (i.e Kubernetes, ECS, Dataproc, Spark, etc.) Oct 20, 2024

cnharris10 unassigned MK88NTAP Oct 20, 2024

cnharris10 added work item Issues to be considered for spec development and removed needs work item Needs an issue that adheres to the Work Item issue template, prior to consideration by stakeholders discussion topic Item or question to be discussed by the community labels Oct 20, 2024

shawnalpay mentioned this issue Oct 21, 2024

Tracking issue for 1.2 Work Items #611

Open

shawnalpay changed the title ~~[Work_Item] Create a standardized model for orchestration-based compute clusters (i.e Kubernetes, ECS, Dataproc, Spark, etc.)~~ [Work_Item] Facilitate shared cost allocation for orchestration-based compute clusters (e.g. Kubernetes, ECS, Dataproc, Spark) Oct 24, 2024

shawnalpay added the csp Cloud service providers label Oct 29, 2024

shawnalpay changed the title ~~[Work_Item] Facilitate shared cost allocation for orchestration-based compute clusters (e.g. Kubernetes, ECS, Dataproc, Spark)~~ [Work_Item] Facilitate shared cost allocation as calculated by the provider Oct 30, 2024

shawnalpay removed the needs stakeholder input Items to review with stakeholders to quantify importance and further details label Nov 4, 2024

jpradocueva mentioned this issue Nov 5, 2024

[Work_Item] Practitioner defined cost allocation #638

Open

shawnalpay removed this from the v1.2 milestone Nov 25, 2024

shawnalpay removed the 1.2 consideration To be considered for release 1.2 label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Work_Item] Facilitate shared cost allocation as calculated by the provider #72

[Work_Item] Facilitate shared cost allocation as calculated by the provider #72

cnharris10 commented May 19, 2023 •

edited by shawnalpay

Loading

udam-f2 commented Jan 22, 2024

jpradocueva commented May 28, 2024

AWS-ZachErdman commented May 30, 2024

shawnalpay commented Sep 9, 2024

cnharris10 commented Sep 10, 2024 •

edited

Loading

shawnalpay commented Oct 22, 2024

richwang99 commented Oct 23, 2024

shawnalpay commented Oct 29, 2024

cnharris10 commented Oct 29, 2024 •

edited

Loading

jpradocueva commented Oct 30, 2024

shawnalpay commented Oct 30, 2024

jpradocueva commented Nov 2, 2024

ljadvey commented Nov 4, 2024

jpradocueva commented Nov 5, 2024

jpradocueva commented Nov 6, 2024

jpradocueva commented Nov 8, 2024

[Work_Item] Facilitate shared cost allocation as calculated by the provider #72

[Work_Item] Facilitate shared cost allocation as calculated by the provider #72

Comments

cnharris10 commented May 19, 2023 • edited by shawnalpay Loading

1. Problem Statement *

2. Objective *

3. Supporting Documentation *

4. Proposed Solution / Approach

5. Epic or Theme Association

6. Stakeholders *

udam-f2 commented Jan 22, 2024

jpradocueva commented May 28, 2024

AWS-ZachErdman commented May 30, 2024

shawnalpay commented Sep 9, 2024

cnharris10 commented Sep 10, 2024 • edited Loading

shawnalpay commented Oct 22, 2024

richwang99 commented Oct 23, 2024

shawnalpay commented Oct 29, 2024

cnharris10 commented Oct 29, 2024 • edited Loading

jpradocueva commented Oct 30, 2024

Action Items from TF-1 call on Oct 29:

shawnalpay commented Oct 30, 2024

jpradocueva commented Nov 2, 2024

Action Items from Members' call on Oct 31:

ljadvey commented Nov 4, 2024

jpradocueva commented Nov 5, 2024

Maintainers notes from Nov 4 call:

jpradocueva commented Nov 6, 2024

Action Items from the TF-1 call on November 5:

jpradocueva commented Nov 8, 2024

Comments from the Members' call on November 7:

cnharris10 commented May 19, 2023 •

edited by shawnalpay

Loading

cnharris10 commented Sep 10, 2024 •

edited

Loading

cnharris10 commented Oct 29, 2024 •

edited

Loading