AzurePublicDatasetV1

VM Trace

The trace contains a representative subset of the first-party Azure VM workload in one geographical region.
This jupyter notebook directly compares the main characteristics of the this trace and the one described in "Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms" - SOSP’17, showing that they are qualitatively very similar.

The main trace characteristics and schema are:

Dataset size: 117GB
Compressed dataset size: 78.5GB
Number of files: 128 files
Duration: 30 consecutive days
Total number of VMs: 2,013,767
Total number of Azure subscriptions: 5,958
Timeseries data: 5-minute VM CPU utilization readings, VM information table and subscription table (with main fields encrypted)
Total VM hours: 104,371,713
Total number of VM CPU utilization readings: 1,246,539,221
Total virtual core hours: 237,815,104

You can download the dataset from Azure Blob Storage using the links available here.