Summary
The VZEROUPPER
instruction can be used to zero the upper 128 bits of the YMM registers. The architecture documentation recommends using it to eliminate any performance penalties caused by false dependencies when transitioning between AVX and SSE modes.
We have discovered cases where VZEROUPPER
appears to remove those dependencies speculatively, then fails to correctly undo that operation on a branch misprediction.
This issue has severe security consequences and is easily exploitable. To illustrate this, we have developed a reliable method where an attacker (including a fully sandboxed workload) is capable of leaking register contents across concurrent processes, hyper threads and virtualized guests. Exploiting this vulnerability does not require any syscalls or privileges so a fully untrusted workload is able to exploit it.
As AVX registers are used on string and memory management instructions on glibc, every code using memcpy, strlen, and similar instructions can be stolen by a local attacker. Note that REP MOV
and similar instructions also use AVX registers, so their contents are also leaked through this vulnerability.
We have confirmed this bug is reproducible on at least the following SKUs:
AMD Ryzen Threadripper PRO 3945WX 12-Cores
AMD Ryzen 7 PRO 4750GE with Radeon Graphics
AMD Ryzen 7 5700U
AMD EPYC 7B12
In general, we believe all Zen 2 processors are affected, including "Rome" server-class processors with the latest microcode patchlevel at the time of writing.
This flaw is not dependent on any particular operating system, all operating systems are affected.
Severity
We consider this issue high risk. The practical result is that you can read the registers of other processes.
Note that this is not a timing attack or a side channel, the full values can simply be read as fast as you can access them.
Proof of Concept
We have found the following short sequence will create a dependency between overlapping xmm and ymm registers.
vpxor ymm, ymm
vptest xmm, xmm
vcvtsi2s{s,d} xmm, xmm, r64
vmovupd ymm, ymm
This instruction should now clear that dependency:
However, if we gate execution with two conflicting conditional branches, that appears to remove the dependency but leave the register in an undefined state.
jcc overzero
jncc overzero
vzeroupper
overzero:
nop
Additional POC details can be found here.
Impact
The undefined portion of our ymm register will contain stale data from the register file. The register file is a resource shared by all processes, threads (i.e. hyperthreads) and virtualized guests on the same physical core.
The practical result is that you can read the registers of other processes.
Note that this is not a timing attack or a side channel, the full values can simply be read as fast as you can access them.
Timeline
2023-05-09
A component of our CPU validation pipeline generates an anomalous result.
2023-05-12
We successfully isolate and reproduce the issue. Investigation continues.
2023-05-14
We are now aware of the scope and severity of the issue.
2023-05-15
We draft a brief status report and share our findings with AMD PSIRT.
2023-05-17
AMD acknowledge our report and confirm they can reproduce the issue.
2023-05-17
We complete development of a reliable PoC and share it with AMD.
2023-07-19
AMD posts the microcode fix.
Summary
The
VZEROUPPER
instruction can be used to zero the upper 128 bits of the YMM registers. The architecture documentation recommends using it to eliminate any performance penalties caused by false dependencies when transitioning between AVX and SSE modes.We have discovered cases where
VZEROUPPER
appears to remove those dependencies speculatively, then fails to correctly undo that operation on a branch misprediction.This issue has severe security consequences and is easily exploitable. To illustrate this, we have developed a reliable method where an attacker (including a fully sandboxed workload) is capable of leaking register contents across concurrent processes, hyper threads and virtualized guests. Exploiting this vulnerability does not require any syscalls or privileges so a fully untrusted workload is able to exploit it.
As AVX registers are used on string and memory management instructions on glibc, every code using memcpy, strlen, and similar instructions can be stolen by a local attacker. Note that
REP MOV
and similar instructions also use AVX registers, so their contents are also leaked through this vulnerability.We have confirmed this bug is reproducible on at least the following SKUs:
AMD Ryzen Threadripper PRO 3945WX 12-Cores
AMD Ryzen 7 PRO 4750GE with Radeon Graphics
AMD Ryzen 7 5700U
AMD EPYC 7B12
In general, we believe all Zen 2 processors are affected, including "Rome" server-class processors with the latest microcode patchlevel at the time of writing.
This flaw is not dependent on any particular operating system, all operating systems are affected.
Severity
We consider this issue high risk. The practical result is that you can read the registers of other processes.
Note that this is not a timing attack or a side channel, the full values can simply be read as fast as you can access them.
Proof of Concept
We have found the following short sequence will create a dependency between overlapping xmm and ymm registers.
This instruction should now clear that dependency:
vzeroupper
However, if we gate execution with two conflicting conditional branches, that appears to remove the dependency but leave the register in an undefined state.
Additional POC details can be found here.
Impact
The undefined portion of our ymm register will contain stale data from the register file. The register file is a resource shared by all processes, threads (i.e. hyperthreads) and virtualized guests on the same physical core.
The practical result is that you can read the registers of other processes.
Note that this is not a timing attack or a side channel, the full values can simply be read as fast as you can access them.
Timeline
2023-05-09
A component of our CPU validation pipeline generates an anomalous result.2023-05-12
We successfully isolate and reproduce the issue. Investigation continues.2023-05-14
We are now aware of the scope and severity of the issue.2023-05-15
We draft a brief status report and share our findings with AMD PSIRT.2023-05-17
AMD acknowledge our report and confirm they can reproduce the issue.2023-05-17
We complete development of a reliable PoC and share it with AMD.2023-07-19
AMD posts the microcode fix.