Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Work_Item] Facilitate shared cost allocation as calculated by the provider #72

Open
cnharris10 opened this issue May 19, 2023 · 16 comments
Assignees
Labels
csp Cloud service providers shared costs Related to shared costs work item Issues to be considered for spec development

Comments

@cnharris10
Copy link
Contributor

cnharris10 commented May 19, 2023

1. Problem Statement *

What is the problem?: Explain the context and why it needs resolution.
Impact: Describe how the problem affects users, systems, or the project.

With the emergence of container-based computing and orchestration systems, logical resource configuration commonly occurs even below the virtual and/or physical machines layer. This has created an enormous cost-allocation problem since CSP's commonly report computing costs at the virtual/physical machine layer while applications are now partitioned at the sub-machine levels: container, pod (Kubernetes), task (ECS), executor (Spark), or similar entity level.

With resources configured and fluctuating at this granularity, practitioners must be able to see cost and usage metrics allocated at this level to report accurate showback and chargeback totals.

There are two use cases to consider when solving this problem:

  1. Consume allocated costs as calculated by the provider
  2. Consume usage metrics that facilitate cost allocations as calculated by the practitioner

This Work Item attempts to address use case 1; use case 2 is out of scope as it requires access to datasets that will likely never be a part of FOCUS; thus, it shall be addressed via supporting content controlled by a separate Work Item TBD.

There are some services where providers solve for this. Examples include:

There are plenty of other services for which the providers do not yet solve for this, but perhaps they will in the future. Examples include:

  • Microsoft Fabric (example of practitioner chargeback here)

While still in its infancy, creating a common standard for reporting shared costs across systems would allow FOCUS to create a standard for major CSP's to largely adopt at their onset, rather than backing into existing models.

2. Objective *

State the objective of this work item. What outcome is expected?
Success Criteria: Define how success will be measured (e.g. metrics and KPIs).

Practitioners will be able to consume the allocations of shared cost and usage metrics more easily and accurately across multiple providers.

3. Supporting Documentation *

Include links to supporting documents such as:

  • Data Examples: [Link to data or relevant files; DO NOT share proprietary information]
  • Related Use Cases or Discussion Documents: [Link to discussion]
  • PRs or Other References: [Link to relevant references]

As of 2023, 2 of the top 3 CSP's have recently released bespoke solutions for allocating costs below the virtual machine layer within cost and usage datasets, including AWS' split-costs for EKS, ECS, and Batch and GCP's GKE cost allocation at the cluster or namespace level with customer's opting into this additional data.

AWS:

GCP:

4. Proposed Solution / Approach

Outline any proposed solutions, approaches, or potential paths forward. Do not submit detailed solutions; please keep suggestions high-level.

Initial Ideas: Describe potential solution paths, tools, or technologies.
Considerations: Include any constraints, dependencies, or risks.
Feasibility: Include any information that helps quantify feasibility, such as perceived level of effort to augment the spec, or existing fields in current data generator exports.
Benchmarks: Are there established best practices for solving this problem available to practitioners today (e.g. mappings from existing CSP exports that are widely used)?

At these layers, customers commonly want to understand how CPU/vCPU/core, RAM, GPU, networking, etc. cost and usage metrics allocate to these sub-machine levels inclusive of unused resources as well.

For example, a VM machine with 4 cores, 8GB RAM with 3 containers given the following configurations:

  • Container 1: 2 cores (max: 3), 2GB RAM (max: 3GB)
  • Container 2: 3 cores (max: 3), 3GB RAM (max: 3GB)
  • Container 3: 1 cores (max: 2), 1GB RAM (max: 2GB)

will likely see fluctuations of resources used over time, as well as potentially see some resources remain unused or wasted. These allocations are vital producing accurate cost allocation to power chargeback/showback methodologies for practitioners.

A successful solution will allow practitioners to see cost and usage totals for these cost buckets accurately and allow for additional expansions of other shared resources across computing clusters.

The solution we craft will ideally handle generically/holistically for any example of a service that allocates the grain of shared costs from a higher level (e.g. cluster, capacity) down to a lower level (e.g. node, pod, container, core).

5. Epic or Theme Association

This section will be completed by the Maintainers.

Epic: [Epic Name]
Theme: [Theme Name, if applicable]

TBD

6. Stakeholders *

List the main stakeholders for this issue.

Primary Stakeholders: [Name/Role]
Other Involved Parties: [Names/Roles]

  • Richard Wang @richwang99
    • K8S is taking more and more weightage for the Cloud costs (passing the 50% mark). Better breaking down of those costs is becoming essential.
  • Shreya Ambast (Atlassian) @Shreya-Ambast
    • We also need this for proper cost allocation for Kube clusters. Currently, we are doing this ourselves, but would love to have this as a readymade solution.
  • Abhishek Mane (DigitalEx)
    • We have a lot of customers asking about shared cost allocation as their customers shares the clusters & they would like to see Cost by Namespaces.

Companies expressing desire for this feature

  • Atlassian
  • Australian Retirement Trust
@udam-f2
Copy link
Contributor

udam-f2 commented Jan 22, 2024

@cnharris10 Could you please chase this down and see if this should become a work item or a discussion topic?

@github-project-automation github-project-automation bot moved this to Triage in FOCUS WG Feb 13, 2024
@jpradocueva jpradocueva moved this from Triage to Parking Lot in FOCUS WG Feb 13, 2024
@jpradocueva jpradocueva added this to the v1.x milestone Feb 29, 2024
@flanakin flanakin changed the title [Proposal] Create a generalized shared cost billing model Create a generalized shared cost billing model Mar 3, 2024
@jpradocueva
Copy link
Contributor

@cnharris10 The group asked for further information during the TF-1 call on May 28.

@AWS-ZachErdman
Copy link
Contributor

I'm interested in this one for v1.1

@shawnalpay
Copy link
Contributor

Some context for the AWS and GCP implementations of this idea:

https://docs.aws.amazon.com/cur/latest/userguide/split-cost-allocation-data.html
https://cloud.google.com/kubernetes-engine/docs/how-to/cost-allocations

Would the scope of this be only for containers, or would we be looking to expand it beyond that to include any service with usage-based breakdowns? For example, here's a bunch of hoops one can jump through to allocate shared Fabric costs. Painful, but super important for allocation purposes.

https://pbi-guy.com/2024/03/30/how-to-extract-data-from-the-fabric-metrics-app-part-1/

@cnharris10
Copy link
Contributor Author

cnharris10 commented Sep 10, 2024

Some context for the AWS and GCP implementations of this idea:

https://docs.aws.amazon.com/cur/latest/userguide/split-cost-allocation-data.html
https://cloud.google.com/kubernetes-engine/docs/how-to/cost-allocations

Would the scope of this be only for containers, or would we be looking to expand it beyond that to include any service with usage-based breakdowns? For example, here's a bunch of hoops one can jump through to allocate shared Fabric costs. Painful, but super important for allocation purposes.

https://pbi-guy.com/2024/03/30/how-to-extract-data-from-the-fabric-metrics-app-part-1/

AWS supports tasks (ECS), pods (EKS), and jobs (Batch). I'd hope we could solve generally across various orchestration systems: K8s, EMR, Dataproc/Dataflow, Spark, Flink, etc.

@shawnalpay shawnalpay added needs stakeholder input Items to review with stakeholders to quantify importance and further details discussion topic Item or question to be discussed by the community shared costs Related to shared costs labels Oct 2, 2024
@cnharris10 cnharris10 added the 1.2 consideration To be considered for release 1.2 label Oct 11, 2024
@shawnalpay shawnalpay added the needs work item Needs an issue that adheres to the Work Item issue template, prior to consideration by stakeholders label Oct 16, 2024
@cnharris10 cnharris10 changed the title Create a generalized shared cost billing model [Work_Item] Create a standardized model for orchestration-based compute clusters (i.e Kubernetes, ECS, Dataproc, Spark, etc.) Oct 20, 2024
@cnharris10 cnharris10 added work item Issues to be considered for spec development and removed needs work item Needs an issue that adheres to the Work Item issue template, prior to consideration by stakeholders discussion topic Item or question to be discussed by the community labels Oct 20, 2024
@shawnalpay
Copy link
Contributor

Discussed in Oct 22 TF1 call. Need to talk about this one a little bit more to align on scope: is it just orchestration and/or container services, or is it more holistic than that? This is a big conceptual topic, and I believe Chris' proposal is sticking to a narrow scope for 1.2 -- but let's discuss more in calls to ensure that everyone understands the overall concept.

@richwang99
Copy link

K8S is taking more and more weightage for the Cloud costs (passing the 50% mark). Better breaking down of those costs is becoming essential.

@shawnalpay shawnalpay changed the title [Work_Item] Create a standardized model for orchestration-based compute clusters (i.e Kubernetes, ECS, Dataproc, Spark, etc.) [Work_Item] Facilitate shared cost allocation for orchestration-based compute clusters (e.g. Kubernetes, ECS, Dataproc, Spark) Oct 24, 2024
@shawnalpay
Copy link
Contributor

In the Oct 29 TF1 call today, we discussed revising the scope of this Work Item to be able to generically handle not only for compute clusters (e.g. AWS ECS, GCP GKE), but also other types of services that can be allocated (e.g. OCI pluggable databases), as well as any other services that may be attributed down the road. This would holistically handle for the use case of Consume allocated costs as calculated by the provider. @cnharris10, do you agree with this approach, and if so, are you amenable to revising this Work Item to reflect that? Happy to huddle and discuss if you like.

FYI that we also discussed a net-new Work Item to handle for a separate but related use case of Consume usage metrics that facilitate cost allocations as calculated by the practitioner, which would like result in supporting content rather than a spec change, and which @ahullah and @tobrien will craft.

@shawnalpay shawnalpay added the csp Cloud service providers label Oct 29, 2024
@cnharris10
Copy link
Contributor Author

cnharris10 commented Oct 29, 2024

The intention of this work item is to classify an approach for similar shared allocation models. If "compute" is too narrow and can be expanded to other examples that closely relate, then I'm in support.

@jpradocueva
Copy link
Contributor

Action Items from TF-1 call on Oct 29:

  • [#72] Alex @ahullah & Tim @tobrien : Draft a work item detailing concepts for future holistic allocation patterns beyond computing clusters.
  • [#72] Chris @cnharris10 & Shawn @shawnalpay : Expand the current work item to outline patterns for generic cost allocation across cloud services, ensuring the scalability of new services as they emerge.

@shawnalpay shawnalpay changed the title [Work_Item] Facilitate shared cost allocation for orchestration-based compute clusters (e.g. Kubernetes, ECS, Dataproc, Spark) [Work_Item] Facilitate shared cost allocation as calculated by the provider Oct 30, 2024
@shawnalpay
Copy link
Contributor

@cnharris10 I have now modified this Work Item to more holistically include all provider-generated shared cost allocations, not just compute clusters. Give it a look and let me know if it looks alright to you.

@jpradocueva
Copy link
Contributor

Action Items from Members' call on Oct 31:

  • [#72] Alex @ahullah and Tim @tobrien : Revise the work item to specify use cases and align it with provider-focused data.
  • [#72] All TF1 members: Review the draft for shared cost allocation and provide feedback on proposed divisions

@shawnalpay shawnalpay removed the needs stakeholder input Items to review with stakeholders to quantify importance and further details label Nov 4, 2024
@ljadvey
Copy link
Contributor

ljadvey commented Nov 4, 2024

What about direct cost allocation (not shared)? Allocation vision and strategy in general with FOCUS needs to be discussed

@jpradocueva
Copy link
Contributor

Maintainers notes from Nov 4 call:

Context: This task involves developing a model for shared cost allocation within compute clusters. Initial discussions focused on the broader concept of shared cost allocation but were narrowed down to provider-generated data to simplify the scope. This distinction helps streamline the process and make implementation feasible within a single release.
Level of Effort Required: Very High — Handling shared costs for compute clusters, especially in containerized environments, involves complex many-to-many relationships and provider-specific solutions, necessitating decomposition of the task.
**Level of Impact: ** Very High – This work item has a significant impact on practitioners, as shared cost allocation is essential for accurate cost distribution, particularly in complex, containerized environments. Effective cost allocation is a key metric for resource optimization in FinOps.

@jpradocueva
Copy link
Contributor

Action Items from the TF-1 call on November 5:

@jpradocueva
Copy link
Contributor

Comments from the Members' call on November 7:

#72: TF-1 is working on cost allocation strategies for multi-provider models, addressing cases where multiple resources feed into a single service element, such as clustered resources. The current focus is on allowing providers to share their allocation metadata within the specification.

@shawnalpay shawnalpay removed this from the v1.2 milestone Nov 25, 2024
@shawnalpay shawnalpay removed the 1.2 consideration To be considered for release 1.2 label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
csp Cloud service providers shared costs Related to shared costs work item Issues to be considered for spec development
Projects
Status: W.I.P
Development

No branches or pull requests

8 participants