The trace contains a representative subset of the first-party Azure VM workload in one geographical region.
This jupyter notebook directly compares the main characteristics of the this trace and the one described in "Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms" - SOSP’17, showing that they are qualitatively very similar.
The main trace characteristics and schema are:
- Dataset size: 117GB
- Compressed dataset size: 78.5GB
- Number of files: 128 files
- Duration: 30 consecutive days
- Total number of VMs: 2,013,767
- Total number of Azure subscriptions: 5,958
- Timeseries data: 5-minute VM CPU utilization readings, VM information table and subscription table (with main fields encrypted)
- Total VM hours: 104,371,713
- Total number of VM CPU utilization readings: 1,246,539,221
- Total virtual core hours: 237,815,104
- Encrypted subscription id
- Encrypted deployment id
- Timestamp in seconds (starting from 0) when first VM created
- Count VMs created
- Deployment size (we define a “deployment” differently than Azure in our paper)
- Encrypted VM id
- Timestamp VM created
- Timestamp VM deleted
- Max CPU utilization
- Avg CPU utilization
- P95 of Max CPU utilization
- VM category
- VM virtual core count
- VM memory (GBs)
- Timestamp in seconds (every 5 minutes)
- Min CPU utilization during the 5 minutes
- Max CPU utilization during the 5 minutes
- Avg CPU utilization during the 5 minutes
You can download the dataset from Azure Blob Storage using the links available here.