Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COST-4741] - Separate AWS node network costs into a Network unattributed namespace #5104

Merged
merged 12 commits into from
Jul 25, 2024
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ INSERT INTO {{schema | sqlsafe}}.reporting_ocpusagelineitem_daily_summary (
infrastructure_project_raw_cost,
infrastructure_usage_cost,
supplementary_usage_cost,
infrastructure_data_in_gigabytes,
infrastructure_data_out_gigabytes,
pod_usage_cpu_core_hours,
pod_request_cpu_core_hours,
pod_limit_cpu_core_hours,
Expand Down Expand Up @@ -67,6 +69,14 @@ INSERT INTO {{schema | sqlsafe}}.reporting_ocpusagelineitem_daily_summary (
sum(coalesce(nullif(ocp_aws.savingsplan_effective_cost, 0), ocp_aws.unblended_cost) + coalesce(nullif(ocp_aws.markup_cost_savingsplan, 0), ocp_aws.markup_cost)) AS infrastructure_project_raw_cost,
'{"cpu": 0.000000000, "memory": 0.000000000, "storage": 0.000000000}'::jsonb as infrastructure_usage_cost,
'{"cpu": 0.000000000, "memory": 0.000000000, "storage": 0.000000000}'::jsonb as supplementary_usage_cost,
CASE
WHEN upper(data_transfer_direction) = 'IN' THEN sum(infrastructure_data_in_gigabytes)
ELSE NULL
END as infrastructure_data_in_gigabytes,
CASE
WHEN upper(data_transfer_direction) = 'OUT' THEN sum(infrastructure_data_out_gigabytes)
ELSE NULL
END as infrastructure_data_out_gigabytes,
0 as pod_usage_cpu_core_hours,
0 as pod_request_cpu_core_hours,
0 as pod_limit_cpu_core_hours,
Expand Down Expand Up @@ -102,5 +112,6 @@ INSERT INTO {{schema | sqlsafe}}.reporting_ocpusagelineitem_daily_summary (
ocp_aws.persistentvolumeclaim,
ocp_aws.resource_id,
ocp_aws.pod_labels,
ocp_aws.data_transfer_direction,
rp.provider_id
;
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ INSERT INTO hive.{{schema | sqlsafe}}.aws_openshift_daily_resource_matched_temp
region,
unit,
usage_amount,
data_transfer_direction,
currency_code,
unblended_cost,
blended_cost,
Expand All @@ -200,6 +201,19 @@ SELECT cast(uuid() as varchar) as uuid,
nullif(aws.product_region, '') as region,
max(nullif(aws.pricing_unit, '')) as unit,
sum(aws.lineitem_usageamount) as usage_amount,
-- Determine network direction
CASE
-- Is this a network record?
WHEN max(aws.lineitem_productcode) = 'AmazonEC2' AND max(aws.product_productfamily) = 'Data Transfer' THEN
-- Yes, it's a network record. What's the direction?
CASE
WHEN strpos(lower(max(aws.lineitem_usagetype)), 'in-bytes') > 0 THEN 'IN'
WHEN strpos(lower(max(aws.lineitem_usagetype)), 'out-bytes') > 0 THEN 'OUT'
WHEN (strpos(lower(max(aws.lineitem_usagetype)), 'regional-bytes') > 0 AND strpos(lower(max(lineitem_operation)), '-in') > 0) THEN 'IN'
WHEN (strpos(lower(max(aws.lineitem_usagetype)), 'regional-bytes') > 0 AND strpos(lower(max(lineitem_operation)), '-out') > 0) THEN 'OUT'
ELSE NULL
END
END AS data_transfer_direction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SQL is grouped by these items below, are these enough to ensure a unique data_transfer_direction or should we be grouping by the data_transfer_direction result as well?

GROUP BY aws.lineitem_usagestartdate,
    aws.lineitem_resourceid,
    4, -- CASE satement
    aws.product_productfamily,
    aws.product_instancetype,
    aws.lineitem_availabilityzone,
    aws.product_region,
    aws.resourcetags,
    aws.costcategory

Copy link
Contributor Author

@samdoran samdoran Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do probably need to group by the direction, but we can't use an aggregation in a group by. Grouping by aws.lineitem_usagetype may work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use numbers to group by items in the SELECT statement, for example the 4 in the group by there is based on the CASE statement in the SQL

max(nullif(aws.lineitem_currencycode, '')) as currency_code,
sum(aws.lineitem_unblendedcost) as unblended_cost,
sum(aws.lineitem_blendedcost) as blended_cost,
Expand Down Expand Up @@ -228,13 +242,15 @@ WHERE aws.source = {{aws_source_uuid}}
AND aws.resource_id_matched = TRUE
GROUP BY aws.lineitem_usagestartdate,
aws.lineitem_resourceid,
4, -- CASE satement
4, -- product_code
aws.product_productfamily,
aws.product_instancetype,
aws.lineitem_availabilityzone,
aws.product_region,
aws.lineitem_usagetype,
aws.resourcetags,
aws.costcategory
aws.costcategory,
lineitem_operation
;

INSERT INTO hive.{{schema | sqlsafe}}.aws_openshift_daily_tag_matched_temp (
Expand Down Expand Up @@ -321,7 +337,7 @@ WHERE aws.source = {{aws_source_uuid}}
AND (aws.resource_id_matched = FALSE OR aws.resource_id_matched IS NULL)
GROUP BY aws.lineitem_usagestartdate,
aws.lineitem_resourceid,
4, -- CASE satement
4, -- product_code
aws.product_productfamily,
aws.product_instancetype,
aws.lineitem_availabilityzone,
Expand Down Expand Up @@ -448,6 +464,8 @@ SELECT aws.uuid as aws_uuid,
AND aws.ocp_source = {{ocp_source_uuid}}
AND aws.year = {{year}}
AND aws.month = {{month}}
-- Filter out Node Network Costs since they cannot be attributed to a namespace and are accounted for later
AND aws.data_transfer_direction IS NULL
GROUP BY aws.uuid, ocp.namespace, ocp.pod_labels
;

Expand Down Expand Up @@ -604,6 +622,7 @@ INSERT INTO hive.{{schema | sqlsafe}}.reporting_ocpawscostlineitem_project_daily
region,
unit,
usage_amount,
data_transfer_direction,
currency_code,
unblended_cost,
markup_cost,
Expand Down Expand Up @@ -652,6 +671,7 @@ SELECT pds.aws_uuid,
region,
unit,
usage_amount / aws_uuid_count as usage_amount,
NULL AS data_transfer_direction,
currency_code,
CASE WHEN resource_id_matched = TRUE AND data_source = 'Pod'
THEN ({{pod_column | sqlsafe}} / nullif({{node_column | sqlsafe}}, 0)) * unblended_cost
Expand Down Expand Up @@ -721,6 +741,118 @@ LEFT JOIN postgres.{{schema | sqlsafe}}.reporting_awsaccountalias AS aa
WHERE pds.ocp_source = {{ocp_source_uuid}} AND year = {{year}} AND month = {{month}}
;

-- Put Node Network Costs into the Network unattributed namespace
INSERT INTO hive.{{schema | sqlsafe}}.reporting_ocpawscostlineitem_project_daily_summary (
aws_uuid,
cluster_id,
cluster_alias,
data_source,
namespace,
node,
persistentvolumeclaim,
persistentvolume,
storageclass,
resource_id,
usage_start,
usage_end,
product_code,
product_family,
instance_type,
usage_account_id,
account_alias_id,
availability_zone,
region,
unit,
usage_amount,
data_transfer_direction,
currency_code,
unblended_cost,
markup_cost,
blended_cost,
markup_cost_blended,
savingsplan_effective_cost,
markup_cost_savingsplan,
calculated_amortized_cost,
markup_cost_amortized,
pod_cost,
project_markup_cost,
pod_labels,
tags,
aws_cost_category,
cost_category_id,
aws_source,
ocp_source,
year,
month,
day
)
SELECT
aws.uuid AS aws_uuid,
max(cluster_id),
max(cluster_alias),
max(data_source),
'Network unattributed' AS namespace,
ocp.node AS node,
max(persistentvolumeclaim),
max(persistentvolume),
max(storageclass),
max(aws.resource_id),
max(aws.usage_start),
max(usage_end),
max(product_code),
max(product_family),
max(instance_type),
max(usage_account_id),
max(aa.id) AS account_alias_id,
max(availability_zone),
max(region),
max(unit),
max(usage_amount),
data_transfer_direction,
max(currency_code),
max(unblended_cost),
max(unblended_cost) * cast({{markup}} AS decimal(24,9)),
max(blended_cost),
max(blended_cost) * cast({{markup}} AS decimal(24,9)),
max(savingsplan_effective_cost),
max(savingsplan_effective_cost) * cast({{markup}} AS decimal(24,9)),
max(calculated_amortized_cost),
max(calculated_amortized_cost) * cast({{markup}} AS decimal(33,9)),
max(unblended_cost) AS pod_cost,
max(unblended_cost) * cast({{markup}} AS decimal(24,9)) AS project_markup_cost,
max(ocp.pod_labels),
cast(NULL AS varchar) AS tags,
cast(NULL AS varchar) AS aws_cost_category,
max(cost_category_id),
max({{aws_source_uuid}}) AS aws_source,
max({{ocp_source_uuid}}) AS ocp_source,
max(cast(year(aws.usage_start) AS varchar)) AS year,
max(cast(month(aws.usage_start) AS varchar)) AS month,
max(cast(day(aws.usage_start) AS varchar)) AS day
FROM hive.{{schema | sqlsafe}}.reporting_ocpusagelineitem_daily_summary AS ocp
JOIN hive.{{schema | sqlsafe}}.aws_openshift_daily_resource_matched_temp AS aws
ON aws.usage_start = ocp.usage_start
AND strpos(aws.resource_id, ocp.resource_id) != 0
LEFT JOIN postgres.{{schema | sqlsafe}}.reporting_awsaccountalias AS aa
ON aws.usage_account_id = aa.account_id
WHERE ocp.source = {{ocp_source_uuid}}
AND ocp.year = {{year}}
AND lpad(ocp.month, 2, '0') = {{month}} -- Zero pad the month when fewer than 2 characters
AND ocp.day IN {{days | inclause}}
AND (ocp.resource_id IS NOT NULL AND ocp.resource_id != '')
AND aws.ocp_source = {{ocp_source_uuid}}
AND aws.year = {{year}}
AND aws.month = {{month}}
-- Network related costs
AND aws.data_transfer_direction IS NOT NULL
-- Storage and Pod can have the same resource_id and we want the Pod
AND ocp.data_source = 'Pod'
GROUP BY
aws.uuid,
ocp.node,
aws.data_transfer_direction
;

INSERT INTO postgres.{{schema | sqlsafe}}.reporting_ocpawscostlineitem_project_daily_summary_p (
uuid,
report_period_id,
Expand All @@ -745,6 +877,9 @@ INSERT INTO postgres.{{schema | sqlsafe}}.reporting_ocpawscostlineitem_project_d
region,
unit,
usage_amount,
infrastructure_data_in_gigabytes,
infrastructure_data_out_gigabytes,
data_transfer_direction,
currency_code,
unblended_cost,
markup_cost,
Expand Down Expand Up @@ -785,6 +920,15 @@ SELECT uuid(),
region,
unit,
usage_amount,
CASE
WHEN upper(data_transfer_direction) = 'IN' THEN usage_amount
ELSE 0
END AS infrastructure_data_in_gigabytes,
CASE
WHEN upper(data_transfer_direction) = 'OUT' THEN usage_amount
ELSE 0
END AS infrastructure_data_out_gigabytes,
data_transfer_direction,
samdoran marked this conversation as resolved.
Show resolved Hide resolved
currency_code,
unblended_cost,
markup_cost,
Expand Down
Loading