Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 3 - ISA: X86 duality #44

Merged
merged 5 commits into from
Apr 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion chapters/3-CPU-Microarchitecture/3-1 ISA.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

The instruction set is the vocabulary used by software to communicate with the hardware. The instruction set architecture (ISA) defines the contract between the software and the hardware. Intel x86, ARM v8 and RISC-V are examples of current-day ISAs that are widely deployed. All of these are 64-bit architectures, i.e., all address computations use 64 bits. ISA developers and CPU architects typically ensure that software or firmware conforming to the specification will execute on any processor built using the specification. Widely deployed ISA franchises also typically ensure backward compatibility such that code written for the GenX version of a processor will continue to execute on GenX+i.

Most modern architectures can be classified as general purpose register-based, load-store architectures where the operands are explicitly specified, and memory is accessed only using load and store instructions. In addition to providing the basic functions in the ISA such as load, store, control and scalar arithmetic operations using integers and floating-point, the widely deployed architectures continue to enhance their ISA to support new computing paradigms. These include enhanced vector processing instructions (e.g., Intel AVX2, AVX512, ARM SVE) and matrix/tensor instructions (Intel AMX). Software mapped to use these advanced instructions typically provide orders of magnitude improvement in performance.
Most modern architectures can be classified as general purpose register-based, load-store architectures, such as RISC-V and ARM where the operands are explicitly specified, and memory is accessed only using load and store instructions. The X86 ISA is a register-memory architecture, where operations can be performed on registers, as well as memory operands. In addition to providing the basic functions in an ISA such as load, store, control and scalar arithmetic operations using integers and floating-point, the widely deployed architectures continue to enhance their ISA to support new computing paradigms. These include enhanced vector processing instructions (e.g., Intel AVX2, AVX512, ARM SVE, RISC-V "V" vector extension) and matrix/tensor instructions (Intel AMX, ARM SME). Software mapped to use these advanced instructions typically provide orders of magnitude improvement in performance.

Modern CPUs support 32-bit and 64-bit precision for arithmetic operations. With the fast-evolving field of machine learning and AI, the industry has a renewed interest in alternative numeric formats for variables to drive significant performance improvements. Research has shown that machine learning models perform just as good, using fewer bits to represent the variables, saving on both compute and memory bandwidth. As a result, several CPU franchises have recently added support for lower precision data types such as 8-bit integers (int8, e.g., Intel VNNI), 16-bit floating-point (fp16, bf16) in the ISA, in addition to the traditional 32-bit and 64-bit formats for arithmetic operations.
2 changes: 1 addition & 1 deletion chapters/4-Terminology-And-Metrics/4-4 UOP.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ typora-root-url: ..\..\img

## Micro-ops {#sec:sec_UOP}

Microprocessors with the x86 architecture translate complex CISC-like instructions into simple RISC-like microoperations, abbreviated as $\mu$ops or $\mu$ops. A simple addition instruction such as `ADD rax, rbx` generates only one $\mu$op, while a more complex instruction like `ADD rax, [mem]` may generate two: one for reading from the `mem` memory location into a temporary (un-named) register, and one for adding it to the `rax` register. The instruction `ADD [mem], rax` generates three $\mu$ops: one for reading from memory, one for adding, and one for writing the result back to memory.
dendibakh marked this conversation as resolved.
Show resolved Hide resolved
Microprocessors with the x86 architecture translate complex CISC-like instructions into simple RISC-like microoperations, abbreviated as $\mu$ops. A simple addition instruction such as `ADD rax, rbx` generates only one $\mu$op, while a more complex instruction like `ADD rax, [mem]` may generate two: one for loading from the `mem` memory location into a temporary (un-named) register, and one for adding it to the `rax` register. The instruction `ADD [mem], rax` generates three $\mu$ops: one for loading from memory, one for adding, and one for storing the result back to memory. Even though x86 ISA is a register-memory architecture, after $\mu$ops conversion, it becomes a load-store architecture since memory is only accessed via load/store $\mu$ops.

The main advantage of splitting instructions into micro operations is that $\mu$ops can be executed:

Expand Down
Loading