Listed below are some monitoring and profiling tools supported on AWS Gravtion. Also listed are some differences when compared to the tools available on x86 processor architectures.
Some of the most commonly used tools such as top, htop, iostat, lstopo, hwloc, dmidecode, lmbench, Linux perf are supported on AWS Graviton processors. There are some tools such as Intel MLC, Intel VTune Profiler, PCM that are supported only on Intel processors and some tools such as turbostat supported on x86 processors.
The lscpu utility shows details of the processor features such as architecture, number of cores, active CPUs, caches, NUMA, instruction support and optionally CPU frequency.
$ lscpu
Attribute | Value |
---|---|
Architecture | aarch64 |
CPU(s) | 64 |
On-line CPU(s) list | 0-63 |
L1d | 4 MiB (64 instances) |
... | ... |
NUMA node(s) | 1 |
... | ... |
The dmidecode utility is a tool for listing details of the system's hardware components.
$ sudo dmidecode | less
Attribute | Value |
---|---|
Socket Designation | CPU00 |
Type | Central Processor |
Family | ARMv8 |
Manufacturer | AWS |
... | ... |
Max Speed | 2600 MHz |
Current Speed | 2600 MHz |
... | ... |
Shown below is sample output for these utilities on a c6g.2xlarge instance.
$ hwloc-info
depth 0: 1 Machine (type #0)
depth 1: 1 Package (type #1)
depth 2: 1 L3Cache (type #6)
depth 3: 8 L2Cache (type #5)
depth 4: 8 L1dCache (type #4)
depth 5: 8 L1iCache (type #9)
...
$ lstopo
Machine (15GB total)
Package L#0
NUMANode L#0 (P#0 15GB)
L3 L#0 (32MB)
L2 L#0 (1024KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0 (P#0)
L2 L#1 (1024KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1 (P#1)
L2 L#2 (1024KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2)
...
On AWS Graviton processors, the Linux perf tool comes handy to collect an application execution profile, hardware perf counters. Much of the functionality provided by tools such as Intel VTune Profiler and PCM are supported in Linux perf. Below are some details of its usage/ syntax.
$ perf stat command
Shown below are Linux perf stats collected at system wide on a c6g.2xlarge instance.
$ perf stat
Performance counter stats for 'system wide':
87692.26 msec cpu-clock # 8.000 CPUs utilized
441 context-switches # 5.029 /sec
13 cpu-migrations # 0.148 /sec
2 page-faults # 0.023 /sec
25115021 cycles # 0.000 GHz
28853592 instructions # 1.15 insn per cycle
68126 branch-misses
10.961122204 seconds time elapsed
Collect basic/ specific CPU hardware counters, for a specific command or system wide, for 10 seconds
One can collect hardware events/ counters for an application, on a specific CPU, for a PID or system wide as follows:
$ perf stat -e cycles,instructions,cache-references,cache-misses,bus-cycles -a sleep 10
Performance counter stats for 'system wide':
161469308 cycles (80.01%)
120685678 instructions # 0.75 insn per cycle (79.97%)
42132948 cache-references (80.01%)
2001520 cache-misses # 4.750 % of all cache refs (80.02%)
160016796 bus-cycles (60.00%)
10.002896494 seconds time elapsed
$ perf report
Overhead | Command | Shared Object | Symbol |
---|---|---|---|
72.44% | functionA | functionA | classA::functionA |
7.66% | functionB | libB.so | classB::functionB |
... | |||
0.81% | functioA | libc-2.31.so | memcmp |
More details on how to use Linux perf utility on AWS Graviton processors is available here.
Processor | x86 | Graviton2,3, and 4 |
---|---|---|
CPU frequency listing | lscpu, /proc/cpuinfo, dmidecode | dmidecode |
turbostat support | Yes | No |
hwloc support | Yes | Yes |
lstopo support | Yes | Yes |
i7z Works | Yes | No |
lmbench | Yes | Yes |
Intel MLC | Yes | No |
Performance monitoring tools | VTune Profiler, PCM, Linux perf, APerf | Linux perf, Linaro Forge, Arm Streamline CLI Tools, APerf |
Utilities such as lmbench are available here and can be built for AWS Graviton processors to obtain latency and bandwidth stats.
Notes:
1. The ARM Linux kernel community has decided not to put CPU frequency in /proc/cpuinfo which can be read by tools such as lscpu or directly.
2. On AWS Graviton processors, Turbo isn’t supported. So, utilities such as ‘turbostat’ aren’t supported/ relevant. Also, tools such as i7z for discovering CPU frequency, turbo, sockets and other information are only supported on Intel architecture/ processors. Intel MLC is a memory latency checker utility that is only supported on Intel processors.