forked from openxla/xla
-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PR openxla#1519: copy edited XLA architecture doc
Imported from GitHub PR openxla#1519 Copybara import of the project: -- 576709a by David Huntsperger <[email protected]>: copy edited XLA architecture doc Merging this change closes openxla#1519 COPYBARA_INTEGRATE_REVIEW=openxla#1519 from pcoet:doc-architecture 576709a PiperOrigin-RevId: 512713359
- Loading branch information
1 parent
4f460f9
commit 90503ec
Showing
1 changed file
with
51 additions
and
51 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,71 +1,71 @@ | ||
# XLA architecture | ||
|
||
XLA is a machine learning (ML) compiler that optimizes | ||
linear algebra (XLA = accelerated linear algebra), providing improvements in | ||
execution speed and memory usage. This page provides a brief overview of the | ||
the XLA compiler's objectives and architecture. | ||
XLA (Accelerated Linear Algebra) is a machine learning (ML) compiler that | ||
optimizes linear algebra, providing improvements in execution speed and memory | ||
usage. This page provides a brief overview of the objectives and architecture of | ||
the XLA compiler. | ||
|
||
## Objectives | ||
|
||
Today, XLA supports several ML framework frontends (such as PyTorch, TensorFlow, | ||
and JAX) and is part of the OpenXLA project—an ecosystem of open-source compiler | ||
technologies for ML that's developed collaboratively by leading ML | ||
hardware and software organizations. Before the OpenXLA project was created, XLA | ||
was developed inside the TensorFlow project, but the fundamental | ||
objectives have remained the same: | ||
Today, XLA supports several ML framework frontends (including PyTorch, | ||
TensorFlow, and JAX) and is part of the OpenXLA project – an ecosystem of | ||
open-source compiler technologies for ML that's developed collaboratively by | ||
leading ML hardware and software organizations. Before the OpenXLA project was | ||
created, XLA was developed inside the TensorFlow project, but the fundamental | ||
objectives remain the same: | ||
|
||
* **Improve execution speed.** Compile subgraphs to reduce the execution time | ||
of short-lived ops to eliminate overhead from the execution runtime, fuse | ||
pipelined operations to reduce memory overhead, and specialize known | ||
tensor shapes to allow for more aggressive constant propagation. | ||
* **Improve execution speed.** Compile subgraphs to reduce the execution time | ||
of short-lived ops and eliminate overhead from the runtime, fuse pipelined | ||
operations to reduce memory overhead, and specialize known tensor shapes to | ||
allow for more aggressive constant propagation. | ||
|
||
* **Improve memory usage.** Analyze and schedule memory usage, | ||
eliminating many intermediate storage buffers. | ||
* **Improve memory usage.** Analyze and schedule memory usage, eliminating | ||
many intermediate storage buffers. | ||
|
||
* **Reduce reliance on custom ops.** Remove the need for many custom ops by | ||
improving the performance of automatically fused low-level ops to match the | ||
performance of custom ops that were originally fused by hand. | ||
* **Reduce reliance on custom ops.** Remove the need for many custom ops by | ||
improving the performance of automatically fused low-level ops to match the | ||
performance of custom ops that were originally fused by hand. | ||
|
||
* **Improve portability.** Make it relatively easy to write a new backend for | ||
novel hardware, at which point a large fraction of ML models can | ||
run unmodified on that hardware. This is in contrast with the approach of | ||
specializing individual monolithic ops for new hardware, which requires | ||
models be rewritten to make use of those ops. | ||
* **Improve portability.** Make it relatively easy to write a new backend for | ||
novel hardware, so that a large fraction of ML models can run unmodified on | ||
that hardware. This is in contrast with the approach of specializing | ||
individual monolithic ops for new hardware, which requires models to be | ||
rewritten to make use of those ops. | ||
|
||
## How it works | ||
|
||
The XLA Compiler takes model graphs from ML frameworks defined in | ||
The XLA compiler takes model graphs from ML frameworks defined in | ||
[StableHLO](https://github.com/openxla/stablehlo) and compiles them into machine | ||
instructions for various architectures. StableHLO defines a versioned | ||
operation set (HLO = high level operations) that provides a | ||
portability layer between ML frameworks and the compiler. | ||
instructions for various architectures. StableHLO defines a versioned operation | ||
set (HLO = high level operations) that provides a portability layer between ML | ||
frameworks and the compiler. | ||
|
||
In general, the compilation process that converts the model graph into a | ||
target-optimized executable includes these steps: | ||
|
||
1. XLA performs several built-in optimization and analysis passes on the | ||
StableHLO graph that are target-independent, such as | ||
[CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination), | ||
target-independent operation fusion, and buffer analysis for allocating runtime | ||
memory for the computation. During this optimization stage, XLA also converts | ||
the StableHLO dialect into an internal HLO dialect. | ||
1. XLA performs several built-in optimization and analysis passes on the | ||
StableHLO graph that are target-independent, such as | ||
[CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination), | ||
target-independent operation fusion, and buffer analysis for allocating | ||
runtime memory for the computation. During this optimization stage, XLA also | ||
converts the StableHLO dialect into an internal HLO dialect. | ||
|
||
2. XLA sends the HLO computation to a | ||
backend for further HLO-level optimizations, this time with target-specific | ||
information and needs in mind. For example, the GPU backend may perform | ||
operation fusions that are beneficial specifically for the GPU programming model | ||
and determine how to partition the computation into streams. At this stage, | ||
backends may also pattern-match certain operations or combinations thereof to | ||
optimized library calls. | ||
2. XLA sends the HLO computation to a backend for further HLO-level | ||
optimizations, this time with target-specific information and needs in mind. | ||
For example, the GPU backend may perform operation fusions that are | ||
beneficial specifically for the GPU programming model and determine how to | ||
partition the computation into streams. At this stage, backends may also | ||
pattern-match certain operations or combinations thereof to optimized | ||
library calls. | ||
|
||
3. The backend then performs target-specific code generation. The CPU and GPU | ||
backends included with XLA use [LLVM](http://llvm.org) for low-level | ||
IR, optimization, and code-generation. These backends emit the LLVM IR necessary | ||
to represent the HLO computation in an efficient manner, and then invoke LLVM to | ||
emit native code from this LLVM IR. | ||
3. The backend then performs target-specific code generation. The CPU and GPU | ||
backends included with XLA use [LLVM](http://llvm.org) for low-level IR, | ||
optimization, and code generation. These backends emit the LLVM IR necessary | ||
to represent the HLO computation in an efficient manner, and then invoke | ||
LLVM to emit native code from this LLVM IR. | ||
|
||
Within this process, the XLA Compiler is modular in the sense that it is easy to | ||
slot-in an alternative backend to [target some novel HW | ||
architecture](./developing_new_backend.md). The GPU backend currently supports | ||
NVIDIA GPUs via the LLVM NVPTX backend; the CPU backend supports multiple CPU | ||
ISAs. | ||
Within this process, the XLA compiler is modular in the sense that it is easy to | ||
slot in an alternative backend to | ||
[target some novel HW architecture](./developing_new_backend.md). The GPU | ||
backend currently supports NVIDIA GPUs via the LLVM NVPTX backend. The CPU | ||
backend supports multiple CPU ISAs. |