At a high-level the process will be:
- Daily export your cost data using Azure Cost Management to blob storage
- When its created, load the blob automatically into ADX
- Ensure you don't introduce duplicate records with your ingestion
Eliminating duplicates is the most important part of this process. To understand why deduplication is important, you need to understand how Azure Cost Management export works.
- A subscription with the resource providers
Microsoft.Storage
,Microsoft.ContainerInstance
, andMicrosoft.EventGrid
already registered. Register the providers. - A resource group is required to deploy in to. Create a resource group. You must have 'Owner' rights on the resource group.
With Azure Cost Management Exports, you can export your data daily. The export places a new csv file on azure blob storage daily with all the data upto that day, month-to-date, or weekly depending on your selection. For example, on the 1st of the month you'll have a csv file with one day's worth of data, and on the 20th day of the month you'll have another csv file with 20 days-worth of data. However, these costs aren't finalized until the end of the month, so your 1st day of the month may not match if you compare the exports of the 20th day for the 1st day expenditures. Therefore, if we do not handle duplicate records correctly the data will NOT make sense.
- We can end up with a lot of duplicates for each day.
- We end up keeping the wrong record instead of the latest.
- An update to a record made later during the month doesn't appear (ie. missing data).
For this implementation the following are needed:
- Azure Data Explorer (required for performance needs)
- Azure Storage Account
- Azure Data Factory (allows to easily tag extents)
- Cost Management Export
After implementing this example, the cost information for your Azure subscription(s) will be refreshed daily into your ADX database - as of the previous day.
Having this data in ADX can unlock other scenarios such as:
- Join this information with other metadata to enrich and provide new insights.
- Running ad-hoc queries in seconds to look/dashboard over the historical spend.
- Build powerful visuals in your favorite tool such as ADX Dashboards, Power BI, Grafana, Tableau, Jupyter Notebooks, etc...
In this walkthrough we utilized ADF to manage the ingestion to a Kusto Database and orchestrate elimination of duplicates. But that's not the only way to handle this workflow.
- An Azure Function could be used instead of ADF with ADX Bindings for Azure Functions. However, that would be a code-first aproach, instead of using the low-code UI in ADF.
- A KQL materialize view could be used instead of .drop extents command to eliminate duplicates. As long as you understand which columns make the records unique, add the tag as values in an additional column or build a surrogate-key to uniquely identify or filter by, then a KQL materialize view would work. You wouldn't need to delete data either. Consider that records may be updated throughout the month (ie. prices change, etc) the materialized view would need to utilize an arg_max() instead of a take_any() for the deduplication.