XLA compilation is deterministic if persisted autotuning is used to perform autotuning once and avoid it in subsequent compilations. Otherwise due to fluctuations in measurements different kernels can be picked as the fastest ones in different compilation runs.
can be used to ensure that no
autotuning happens on repeated compilations - they either reuse compatible
results of previous runs or fail.
Programs compiled by XLA can be non-deterministic on operations like scatter,
select-and-scatter, GEMMs, convolutions, multi-headed attention. The flag
switches these operations to
deterministic and potentially slower implementations and makes compilation fail
on select-and-scatter which does not have a deterministic implementaiton.