Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COST-4745] OCPGCP Network data processing SQL #5058

Merged
merged 17 commits into from
Jul 4, 2024

Conversation

cgoodfred
Copy link
Contributor

@cgoodfred cgoodfred commented Apr 22, 2024

Jira Ticket

COST-4745

Description

This change will add ocp on azure network processing. This change does a few things:

Identifies Network records from the GCP bill that are associated with a specific Compute Instance that can be tied to an OCP Node
Separates the usage and cost for these records into a distinct row per day, one for inbound traffic, one for outbound traffic when we aggregate the gcp_openshift_daily records up
Filter out the networking records when we are grouping by namespace because these values cannot be attributed to a specific namespace/project (hence the Network unattributed project!)
Perform a new insert into the project daily summary table for the networking records grouped by OCP node
Back populate these records into the OCPUsage table adding a data transfer direction to the group by which has 3 options, IN, OUT, and NULL

NOTE: when GCP renamed Ingress to Data Transfer in, Egress was renamed to Data Transfer that sometimes has a conditional of out but sometimes does not. Based on my understanding of this GCP article, Ingress was simply renamed to Data Transfer In and any other data transfer is Egress/Outbound

Nise has been updated and the test customer yamls now include network in and out records.

Testing

  1. Using nise > 4.5.3, create GCP compute data that has networking SKUs defined for the same resource id as an OpenShift node. Something like
---
generators:
  - ComputeEngineGenerator:
      start_date: {{start_date}}
      end_date: {{end_date}}
      price: 2
      sku_id: CF4E-A0C7-E3BF
      usage.amount_in_pricing_units: 1
      usage.pricing_unit: hour
      currency: USD
      instance_type: m2-megamem-416
      location.region: australia-southeast1-a
      resource.name: projects/nise-populator/instances/gcp_compute1
      resource.global_name: //compute.googleapis.com/projects/nise-populator/zones/australia-southeast1-a/instances/3447398860992947181
      labels: [{"environment": "clyde", "app":"winter", "version":"green", "kubernetes-io-cluster-c32se93c-73z3-3s3d-cs23-d3245sj45349": "owned"}]
  - ComputeEngineGenerator:
      start_date: {{start_date}}
      end_date: {{end_date}}
      price: 2
      sku_id: BBF8-C07D-1DF4
      usage.amount_in_pricing_units: 50
      usage.pricing_unit: hour
      currency: USD
      instance_type: m2-megamem-416
      location.region: australia-southeast1-a
      resource.name: projects/nise-populator/instances/gcp_compute1
      resource.global_name: //compute.googleapis.com/projects/nise-populator/zones/australia-southeast1-a/instances/3447398860992947181
      labels: [{"environment": "clyde", "app":"winter", "version":"green", "kubernetes-io-cluster-c32se93c-73z3-3s3d-cs23-d3245sj45349": "owned"}]
  - ComputeEngineGenerator:
      start_date: 2024-05-01
      end_date: 2024-05-31
      price: 30
      sku_id: 9DE9-9092-B3BC
      usage.amount_in_pricing_units: 10
      usage.pricing_unit: hour
      currency: USD
      instance_type: m2-megamem-416
      location.region: australia-southeast1-a
      resource.name: projects/nise-populator/instances/gcp_compute1
      resource.global_name: //compute.googleapis.com/projects/nise-populator/zones/australia-southeast1-a/instances/3447398860992947181
      labels: [{"environment": "clyde", "app":"winter", "version":"green", "kubernetes-io-cluster-c32se93c-73z3-3s3d-cs23-d3245sj45349": "owned"}] 
  1. Create a source and load the OCP data
  2. Create a source and load the GCP data you just created
  3. Let summary run and check the OCP and OCP on GCP database records and verify the network records are visible and distinct with infrastructure_data_in_gigabytes or infrastructure_data_out_gigabytes filled in for each day and each Network unattributed project.
  4. Run a few SQL queries to verify the costs before and after OCPGCP summary line up.
    docker exec -it trino trino --server localhost:8080 --catalog hive --schema org1234567 --user admin --debug
trino:org1234567> SELECT sum(cost) as cost FROM gcp_openshift_daily WHERE month='05';
   cost   
----------
 306528.0 
(1 row)

trino:org1234567> select sum(unblended_cost) from reporting_ocpgcpcostlineitem_project_daily_summary WHERE month = '5';
  _col0   
----------
 306528.0 
(1 row)

trino:org1234567> SELECT sum(cost) as cost FROM gcp_openshift_daily WHERE lower(sku_description) LIKE '%data transfer%' AND month='05';
   cost   
----------
 297600.0 
(1 row)

trino:org1234567> SELECT sum(unblended_cost) as cost FROM reporting_ocpgcpcostlineitem_project_daily_summary WHERE data_transfer_direction IS NOT NULL AND month='5';
   cost   
----------
 297600.0 
(1 row)

trino:org1234567> select sum(unblended_cost) as cost, data_transfer_direction from reporting_ocpgcpcostlineitem_project_daily_summary WHERE data_transfer_direction IS NOT NULL AND month='5' GROUP BY data_transfer_direction;
   cost   | data_transfer_direction 
----------+-------------------------
 223200.0 | OUT                     
  74400.0 | IN                      
(2 rows)

trino:org1234567> select SUM(unblended_cost) from reporting_ocpgcpcostlineitem_project_daily_summary where data_transfer_direction IS NOT NULL AND month='5';
  _col0   
----------
 297600.0 
(1 row)

trino:org1234567> select usage_start, unblended_cost, infrastructure_data_in_gigabytes, infrastructure_data_out_gigabytes, usage_amount from postgres.org1234567.reporting_ocpgcpcostlineitem_project_daily_summary_p_2024_05 WHERE namespace = 'Network unattributed' ORDER BY usage_start;
 usage_start |    unblended_cost    | infrastructure_data_in_gigabytes | infrastructure_data_out_gigabytes |     usage_amount     
-------------+----------------------+----------------------------------+-----------------------------------+----------------------
 2024-05-01  | 7200.000000000000000 |                0.000000000000000 |               257.697599999999970 |  240.000000000000000 
 2024-05-01  | 2400.000000000000000 |             1288.487999999999800 |                 0.000000000000000 | 1200.000000000000000 
 2024-05-02  | 7200.000000000000000 |                0.000000000000000 |               257.697599999999970 |  240.000000000000000 
 2024-05-02  | 2400.000000000000000 |             1288.487999999999800 |                 0.000000000000000 | 1200.000000000000000 

Inbound math:
Cost: 2400 = 50 (usage) * 2 (rate) * 24 hours
Quantity: 1288.488 = 50 (usage) * 24 hours * 1.07374 (gibibyte to gigabyte conversion)
Outbound math:
Cost: 7200 = 30 (usage) * 10 (rate) * 24 hours
Quantity:257.6976 = 30 (usage) * 24 hours * 1.07374 (gibibyte to gigabyte conversion)

Release Notes

  • proposed release note
* [COST-4745](https://issues.redhat.com/browse/COST-4745) This PR will **result in a numbers change when looking at OpenShift or GCP filtered by OpenShift endpoints when grouped by project** as long as OpenShift Costs are coming from a GCP cloud source. 
* Previously the networking cost of the node was distributed amongst the projects on the node but now those networking costs are removed into a separate NEW project called `Network unattributed`.
* Example with numbers: 

- I have a node called `compute_1` and this node has 2 projects, `projectA` and `projectB` that each use 50% of the cluster leaving 0 unallocated costs.
- When I look at the costs for this node grouped by project today, `projectA` costs $15 and `projectB` costs $5 for a total of $20. 
- Of that $20, I know that $5 is networking costs. 
- After this change there will be 3 projects with costs for this node, `projectA`, `projectB`, and `Network unattributed`.
- The cost for `projectA` would now be $12.5, `projectB` would now be $2.5 and `Network unattributed` would be $5. 
- The new Network unattributed project is the networking costs that can be specifically tied to this node but not broken down at the project level. 

Copy link

codecov bot commented Apr 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.1%. Comparing base (07ae2b8) to head (9415f96).

Additional details and impacted files
@@           Coverage Diff           @@
##            main   #5058     +/-   ##
=======================================
- Coverage   94.1%   94.1%   -0.0%     
=======================================
  Files        376     376             
  Lines      31261   31261             
  Branches    4602    4602             
=======================================
- Hits       29429   29427      -2     
- Misses      1167    1168      +1     
- Partials     665     666      +1     

@cgoodfred cgoodfred added the gcp-smoke-tests pr_check will build the image and run gcp + ocp on gcp smoke tests label Jun 3, 2024
@cgoodfred cgoodfred self-assigned this Jun 3, 2024
@cgoodfred cgoodfred marked this pull request as ready for review June 3, 2024 15:56
@cgoodfred cgoodfred requested review from a team as code owners June 3, 2024 15:56
maskarb
maskarb previously approved these changes Jun 4, 2024
@cgoodfred cgoodfred enabled auto-merge (squash) July 4, 2024 03:41
@lcouzens
Copy link
Contributor

lcouzens commented Jul 4, 2024

/retest

1 similar comment
@lcouzens
Copy link
Contributor

lcouzens commented Jul 4, 2024

/retest

@cgoodfred cgoodfred merged commit bdd992d into main Jul 4, 2024
10 of 11 checks passed
@cgoodfred cgoodfred deleted the COST-4745-ocpgcp-network branch July 4, 2024 10:21
djnakabaale pushed a commit that referenced this pull request Jul 9, 2024
* [COST-4745] OCPGCP Network data processing SQL

---------

Co-authored-by: Sam Doran <[email protected]>
djnakabaale added a commit that referenced this pull request Jul 19, 2024
…5117)

* feat: cleaning up old changes. making first changes to create the API pieces.

* feat: cleaning up old changes. making first changes to create the API pieces.

* feat: insert new filters.

* feat: insert new filters and order by params.

* feat: customizing provider map and serializer.

* fix: changing TIME_CHOICES options.

* feat: first unit tests.

* feat: wip.

* feat: fixing provider map.

* feat: fixing provider map.

* update ec2 annotations to get all required fields

* use AWSEC2ComputeQueryParamSerializer

* feat: creating orderby and groupby serializers for ec2

* feat: unit tests for filters

* feat: removing group_by - not needed on ec2

* feat: fixing orderby serializer and starting units tests

* fix: typo

* feat: changing usage_hours to usage_amount

* feat: fixing unit test.

* wip: blocking some filters and unit test.

* feat: unit tests for group by filter

* flake8 fix

* feat: inserting more filters on validate function.

* feat: updating validate method to use similar logic and add filters.

* fix: changing unit tests for some filters.

* feat: testing filter combinations and flake8 checks.

* fix: test

* feat: serializer Unit tests and view Unit test fix

* flake8 fix

* fix: new approach to satisfy CodeCov

* fix:: getting rid of validate custom method.

* fix: commenting tags.

* fix: validarte functions, tests.

* handle filter params for specific report type

* transform tags to desired ui format

* default to monthly resolution on the EC2 endpoint

* add special pagination for EC2

* use default report type time period settings if exists

* fix typo

* fix: fixing parameters validations.

* [COST-5141] Fix management command to use continue instead of return. (#5173)

* [COST-5128] Process new subs tagging strategy to identify non-converted instances (#5162)

* [COST-4745] Add data_transfer_direction to OCP on GCP Trino tables (#5130)

* [COST-4741] Add data_transfer_direction for AWS network costs to Trino tables (#5129)

* [COST-5168] - Adding new penalty pipeline (#5176)

* [COST-5168] - Adding new penalty pipeline

* Improve our logging readability (#5178)

* add prometheus metrics for new queues (#5179)

* add v3.3.0 operator commits (#5143)

* [COST-5124] Improve Trino migration management command (#5163)

* Add exponential backoff and logging to retries
* Change log level to reflect severity
* Explicit SQL alias for clarity
* Catch and log exception instead of exiting
* Add return type hints
* Return if unsuccessful
  No point in verifying if the SQL did not run correctly
* Fine tune exponential backoff
* Create action class for adding/verifying columns were added
* Assign default list using default_fatory
  Instead of doing it in the post_init, which get’s a little weird.
* Add drop column action
* Quote items in logs for better legibility
* Consolidate action classes
  We lose some of the action-specific logging messages, but there is less
  code overall. I’m not sure how this scale to the action related to dropping.
* Change local variable name
  No need to add a prefix to differentiate it from the parameter name.
* Use a set to prevent running on the same schema multiple times

Co-authored-by: Cody Myers <[email protected]>

* Filter accounts by matching criteria during subs processing to prevent unnecessary SQL from running (#5184)

* Update tasks.py (#5185)

* clean up grafana dashboard (#5183)

* Skip OCPCloud tag SQL if key is present in cache but value is None (#5186)

* [COST-5196] - Send OCP tasks to correct queues (#5187)

* [COST-5196] - Send OCP tasks to correct queues

* [COST-5176] correctly pass context dictionary within log_json function call (#5182)

* [COST-5176] correctly pass context dictionary within log_json function call

* add unittests for exceptions in generate_report

* batch delete S3 files (#5180)

* Bump urllib3 from 1.26.18 to 1.26.19 in the pip group across 1 directory (#5172)

Bumps the pip group with 1 update in the / directory: [urllib3](https://github.com/urllib3/urllib3).


Updates `urllib3` from 1.26.18 to 1.26.19
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/1.26.19/CHANGES.rst)
- [Commits](urllib3/urllib3@1.26.18...1.26.19)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add flower as a dev dependency (#5189)

* Add docs

* [COST-4844] Serializer update for ordering by storageclass (#5174)

* Switch to using podman in build_deploy (#5193)

The VM used in CI is now RHEL 8

* skip polling providers still processing (#5181)

* skip polling providers that are still processing

* [COST-5214] Move TARGETARCH declaration to the top of the Dockerfile (#5195)

There is a bug in podman where this is only used correctly for the
multi-stage build if it is defined as the first line.

Update Jenkinsfile to use RHEL 8

Unfortunately this breaks the image build for Docker. I'll fix that in a followup PR.

* [COST-5213] - fix S3 prepare (#5194)

* Switch default parquet flag to prevent iterating on all files in each worker when there is nothing to delete

* [COST-5214] pass build-arg to docker build command (#5196)

* [COST-5216] Delete filtering optimization (#5197)

* Revert "[COST-5216] Delete filtering optimization (#5197)" (#5200)

This reverts commit 97ba98e.

* [COST-5226] - Skip S3 delete (daily flow) if we have marked deletion complete. (#5198)

* dont attempt more S3 deletes if we have marked deletion complete

* [COST-5076] upgrade to python 3.11 (#4444)

* upgrade to python 3.11

* pipfile update

* add gcc-c++ compiler

Co-authored-by: Sam Doran <[email protected]>

* update test

* replace gcc with gcc-c++

---------

Co-authored-by: Sam Doran <[email protected]>

* [COST-5228] log outside for loop (#5202)

* [COST-5228] log outside for loop

* additional log clean up

* add context to logs in _remove_expired_data func

* log s3 batch deletes (#5204)

* log s3 batch deletes

* [COST-5219] Correctly report VM usage for metering when billing record is split (#5201)

* [COST-5219] Handle Azure instance record being split

* [COST-4745] OCPGCP Network data processing SQL (#5058)

* [COST-4745] OCPGCP Network data processing SQL

---------

Co-authored-by: Sam Doran <[email protected]>

* [COST-5198] - split read traffic to read replica db using nginx proxy (#5188)

* update nginx with HTTP method routing
* switch koku-api to koku-api-writes
* duplicate koku-api to koku-api-reads add a optional mounted secret for the read replica
* update clowder configurator to read from read replica secret if mounted and enabled via ENV var

* remove unused methods (#5208)

* Bump certifi in the pip group across 1 directory (#5207)

* chore(image): update and rebuild image (#5203)

Co-authored-by: Update-a-Bot <[email protected]>

* Handle case when resource ID cannot be obtained (#5209)

* Catch exception case.

* [COST-5148] filter out empty resource ids and SavingsPlanCoveredUsage entries (#5206)

* [COST-5148] update insert sql
filter out empty resource ids
offset savings from SavingsPlanCoveredUsage

* closing CASE statement

* clean up comment

* remove case stmts in favor of filtering out SavingsPlanCoveredUsage

* clean up

* Unpause the csi volume handle sql (#5175)

* update linting

* feat: cleaning up old changes. making first changes to create the API pieces.

* feat: cleaning up old changes. making first changes to create the API pieces.

* feat: insert new filters.

* feat: insert new filters and order by params.

* feat: customizing provider map and serializer.

* fix: changing TIME_CHOICES options.

* feat: first unit tests.

* feat: wip.

* feat: fixing provider map.

* feat: fixing provider map.

* update ec2 annotations to get all required fields

* use AWSEC2ComputeQueryParamSerializer

* feat: creating orderby and groupby serializers for ec2

* feat: unit tests for filters

* feat: removing group_by - not needed on ec2

* feat: fixing orderby serializer and starting units tests

* fix: typo

* feat: changing usage_hours to usage_amount

* feat: fixing unit test.

* wip: blocking some filters and unit test.

* feat: unit tests for group by filter

* flake8 fix

* feat: inserting more filters on validate function.

* feat: updating validate method to use similar logic and add filters.

* fix: changing unit tests for some filters.

* feat: testing filter combinations and flake8 checks.

* fix: test

* feat: serializer Unit tests and view Unit test fix

* flake8 fix

* fix: new approach to satisfy CodeCov

* fix:: getting rid of validate custom method.

* fix: commenting tags.

* fix: validarte functions, tests.

* handle filter params for specific report type

* transform tags to desired ui format

* default to monthly resolution on the EC2 endpoint

* add special pagination for EC2

* use default report type time period settings if exists

* fix typo

* fix: fixing parameters validations.

* update linting

* squash commits

* clean up query params and time period settings

* do not use filter keyword

* more code clean up
update unit tests

* address feedback
- move report_specific filters to main filter map
- use report_type instead of kwargs in get_paginator
- do not use deepcopy - just overwrite query_data
- resolution and time_scope_units are always monthly and month respectively
- overide start and end date params in base ParamSerializer
- overide limit and offset in base FilterSerializer

* more unit tests

* update openapi spec

* clean up and add unit tests

* move changes to openapi spec to a separate pr

* use serializer choice field and not customer validate method

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: David N <[email protected]>
Co-authored-by: Luke Couzens <[email protected]>
Co-authored-by: Cody Myers <[email protected]>
Co-authored-by: Corey Goodfred <[email protected]>
Co-authored-by: Sam Doran <[email protected]>
Co-authored-by: Michael Skarbek <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Chris Hambridge <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Update-a-Bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gcp-smoke-tests pr_check will build the image and run gcp + ocp on gcp smoke tests smokes-required
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants