Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rtl] instruction prefetch buffer (IPB) improvements #455

Merged
merged 8 commits into from
Dec 14, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ mimpid = 0x01040312 => Version 01.04.03.12 => v1.4.3.12

| Date (*dd.mm.yyyy*) | Version | Comment |
|:-------------------:|:-------:|:--------|
| 13.12.2022 | 1.7.8.5 | code cleanup of FIFO module; improved **instruction prefetch buffer (IPB)** - IPD depth can be as small as "1" and will be adjusted automatically when enabling the `C` ISA extension; update hardware implementation results; [#455](https://github.com/stnolting/neorv32/pull/455) |
| 09.12.2022 | 1.7.8.4 | :sparkles: new option to add custom **R5-type** (4 source registers, 1 destination register) instructions to **Custom Functions Unit (CFU)**; [#452](https://github.com/stnolting/neorv32/pull/452) |
| 08.12.2022 | 1.7.8.3 | :bug: fix interrupt behavior when in user-mode; minor core rtl fixes; do not check registers specifiers in CFU instructions (i.e. using registers above `x15` when `E` ISA extension is enabled); [#450](https://github.com/stnolting/neorv32/pull/450) |
| 03.12.2022 | 1.7.8.2 | :sparkles: new option to add custom **R4-type** RISC-V instructions to **Custom Functions Unit (CFU)**; rework CFU hardware module, intrinsic library and example program; [#449](https://github.com/stnolting/neorv32/pull/449) |
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,10 +200,10 @@ for custom tightly-coupled co-processors, accelerators or interfaces
Implementation results for **exemplary CPU configurations** generated for an Intel Cyclone IV `EP4CE22F17C6` FPGA
using Intel Quartus Prime Lite 21.1 (no timing constrains, _balanced optimization_, f_max from _Slow 1200mV 0C Model_).

| CPU Configuration (version [1.7.7.8](https://github.com/stnolting/neorv32/blob/main/CHANGELOG.md)) | LEs | FFs | Memory bits | DSPs | f_max |
| CPU Configuration (version [1.7.8.5](https://github.com/stnolting/neorv32/blob/main/CHANGELOG.md)) | LEs | FFs | Memory bits | DSPs | f_max |
|:-----------------------|:----:|:----:|:----:|:-:|:-------:|
| `rv32i_Zicsr` | 1328 | 678 | 1024 | 0 | 130 MHz |
| `rv32i_Zicsr_Zicntr` | 1614 | 808 | 1024 | 0 | 130 MHz |
| `rv32i_Zicsr` | 1223 | 607 | 1024 | 0 | 130 MHz |
| `rv32i_Zicsr_Zicntr` | 1578 | 773 | 1024 | 0 | 130 MHz |
| `rv32imc_Zicsr_Zicntr` | 2338 | 992 | 1024 | 0 | 130 MHz |

Implementation results for an **exemplary SoC/Processor configurations** generated for a Xilinx Artix-7 `xc7a35ticsg324-1L` FPGA
Expand Down
10 changes: 5 additions & 5 deletions docs/datasheet/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ just _exemplary_. If not otherwise mentioned all implementations use the default
[cols="<2,<8"]
[grid="topbot"]
|=======================
| HW version: | `1.7.7.8`
| HW version: | `1.7.8.5`
| Top entity: | `rtl/core/neorv32_cpu.vhd`
| FPGA: | Intel Cyclone IV E `EP4CE22F17C6`
| Toolchain: | Quartus Prime Lite 21.1
Expand All @@ -261,10 +261,10 @@ just _exemplary_. If not otherwise mentioned all implementations use the default
[options="header",grid="rows"]
|=======================
| CPU ISA Configuration | LEs | FFs | MEM bits | DSPs | _f~max~_
| `rv32e` | 830 | 400 | 512 | 0 | 130 MHz
| `rv32i` | 834 | 400 | 1024 | 0 | 130 MHz
| `rv32i_Zicsr` | 1328 | 678 | 1024 | 0 | 130 MHz
| `rv32i_Zicsr_Zicntr` | 1614 | 808 | 1024 | 0 | 130 MHz
| `rv32e` | 720 | 360 | 512 | 0 | 130 MHz
| `rv32i` | 724 | 364 | 1024 | 0 | 130 MHz
| `rv32i_Zicsr` | 1223 | 607 | 1024 | 0 | 130 MHz
| `rv32i_Zicsr_Zicntr` | 1578 | 773 | 1024 | 0 | 130 MHz
| `rv32im_Zicsr_Zicntr` | 2087 | 983 | 1024 | 0 | 130 MHz
| `rv32imc_Zicsr_Zicntr` | 2338 | 992 | 1024 | 0 | 130 MHz
| `rv32imcb_Zicsr_Zicntr` | 3175 | 1247 | 1024 | 0 | 130 MHz
Expand Down
12 changes: 9 additions & 3 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -454,12 +454,18 @@ The state of this generic can be retrieved by software via the <<_mxisa>> CSR.
[cols="4,4,2"]
[frame="all",grid="none"]
|======
| **CPU_IPB_ENTRIES** | _natural_ | 2
| **CPU_IPB_ENTRIES** | _natural_ | 1
3+| This generic configures the number of entries in the CPU's instruction prefetch buffer.
The value has to be a power of two and has to be greater than or equal to two (>= 2).
Long linear sequences of code can benefit from an increased IPB size.
The value has to be a power of two and has to be greater than or equal to one (>= 1). The
IPB can help improving memory access latency. Furthermore, long linear code sequences will
benefit from an increased IPB size.
|======

[WARNING]
If the compressed ISA extension `_CPU_EXTENSION_RISCV_C_` (<<_cpu_extension_riscv_c>>) is enabled and the IPB depth
is set to 1, this configuration is internally overridden and the IPB will be implemented with **2** entries. This is required
for handling unaligned 32-bit instructions.


// ####################################################################################################################
:sectnums:
Expand Down
5 changes: 2 additions & 3 deletions docs/userguide/application_specific_configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ multiplications, `FAST_SHIFT_EN => true` use a fast barrel shifter for shift ope
* Use as many _internal_ memory as possible to reduce memory access latency: `MEM_INT_IMEM_EN => true` and
`MEM_INT_DMEM_EN => true`, maximize `MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE`
* Increase the CPU's instruction prefetch buffer size: if **no** instruction cache is implemented `CPU_IPB_ENTRIES` should be
quite large (recommended value is >= 8); if the instruction cache is implemented `CPU_IPB_ENTRIES` values above 4 are
rather inefficient
quite large
* _To be continued..._


Expand Down Expand Up @@ -55,7 +54,7 @@ also reduces program code size by approximately 30%.
* If not explicitly used/required, exclude the CPU standard counters `[m]instret[h]`
(number of instruction) and `[m]cycle[h]` (number of cycles) from synthesis by disabling the `Zicntr` ISA extension
(note, this is not RISC-V compliant).
* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`).
* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`) to its minimum (=1).
* Map CPU shift operations to a small and iterative shifter unit (`FAST_SHIFT_EN => false`).
* If you have unused DSP block available, you can map multiplication operations to those slices instead of
using LUTs to implement the multiplier (`FAST_MUL_EN => true`).
Expand Down
14 changes: 9 additions & 5 deletions rtl/core/neorv32_cpu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ entity neorv32_cpu is
-- Extension Options --
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -120,10 +120,14 @@ architecture neorv32_cpu_rtl of neorv32_cpu is
constant XLEN : natural := 32; -- data path width
-- ----------------------------------------------------------------------------------------------

-- local constants --
-- local constants: additional register file read ports --
constant regfile_rs3_en_c : boolean := CPU_EXTENSION_RISCV_Zxcfu or CPU_EXTENSION_RISCV_Zfinx; -- 3rd register file read port (rs3)
constant regfile_rs4_en_c : boolean := CPU_EXTENSION_RISCV_Zxcfu; -- 4th register file read port (rs4)

-- local constant: instruction prefetch buffer depth --
constant ipb_override_c : boolean := (CPU_EXTENSION_RISCV_C = true) and (CPU_IPB_ENTRIES < 2); -- override IPB size: set to 2?
constant ipb_depth_c : natural := cond_sel_natural_f(ipb_override_c, 2, CPU_IPB_ENTRIES);

-- local signals --
signal ctrl : std_ulogic_vector(ctrl_width_c-1 downto 0); -- main control bus
signal imm : std_ulogic_vector(XLEN-1 downto 0); -- immediate
Expand Down Expand Up @@ -206,8 +210,8 @@ begin
-- Instruction prefetch buffer --
assert not (is_power_of_two_f(CPU_IPB_ENTRIES) = false) report
"NEORV32 CPU CONFIG ERROR! Number of entries in instruction prefetch buffer <CPU_IPB_ENTRIES> has to be a power of two." severity error;
assert not (CPU_IPB_ENTRIES < 2) report
"NEORV32 CPU CONFIG ERROR! Number of entries in instruction prefetch buffer <CPU_IPB_ENTRIES> has to be >= 2." severity error;
assert not (ipb_override_c = true) report
"NEORV32 CPU CONFIG WARNING! Overriding <CPU_IPB_ENTRIES> configuration (setting =2) because C ISA extension is enabled." severity warning;

-- PMP --
assert not (PMP_NUM_REGIONS > 0) report
Expand Down Expand Up @@ -276,7 +280,7 @@ begin
-- Tuning Options --
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
CPU_IPB_ENTRIES => CPU_IPB_ENTRIES, -- entries is instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES => ipb_depth_c, -- entries is instruction prefetch buffer, has to be a power of 2, min 1
-- Physical memory protection (PMP) --
PMP_NUM_REGIONS => PMP_NUM_REGIONS, -- number of regions (0..16)
PMP_MIN_GRANULARITY => PMP_MIN_GRANULARITY, -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
2 changes: 1 addition & 1 deletion rtl/core/neorv32_cpu_control.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ entity neorv32_cpu_control is
-- Tuning Options --
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical memory protection (PMP) --
PMP_NUM_REGIONS : natural; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
46 changes: 32 additions & 14 deletions rtl/core/neorv32_fifo.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -140,30 +140,48 @@ begin
fifo_half_level_simple:
if (FIFO_DEPTH = 1) generate
half_o <= fifo.full;
end generate;
end generate; -- /fifo_half_level_simple

fifo_half_level_complex:
if (FIFO_DEPTH > 1) generate
level_diff <= std_ulogic_vector(unsigned(fifo.w_pnt) - unsigned(fifo.r_pnt));
half_o <= level_diff(level_diff'left-1) or fifo.full;
end generate;
end generate; -- /fifo_half_level_complex


-- FIFO Memory ----------------------------------------------------------------------------
-- FIFO Memory - Write --------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
fifo_write: process(clk_i)
begin
if rising_edge(clk_i) then
if (fifo.we = '1') then
if (FIFO_DEPTH = 1) then
fifo.buf <= wdata_i;
else
-- "real" FIFO memory (several entries) --
fifo_memory:
if (FIFO_DEPTH > 1) generate
fifo_write: process(clk_i)
begin
if rising_edge(clk_i) then
if (fifo.we = '1') then
fifo.data(to_integer(unsigned(fifo.w_pnt(fifo.w_pnt'left-1 downto 0)))) <= wdata_i;
end if;
end if;
end if;
end process fifo_write;
end process fifo_write;
fifo.buf <= (others => '0'); -- unused
end generate; -- /fifo_memory

-- simple register/buffer (single entry) --
fifo_buffer:
if (FIFO_DEPTH = 1) generate
fifo_write: process(clk_i)
begin
if rising_edge(clk_i) then
if (fifo.we = '1') then
fifo.buf <= wdata_i;
end if;
end if;
end process fifo_write;
fifo.data <= (others => (others => '0')); -- unused
end generate; -- /fifo_buffer


-- FIFO Memory - Read ---------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
-- "asynchronous" read --
fifo_read_async:
if (FIFO_RSYNC = false) generate
Expand All @@ -175,7 +193,7 @@ begin
rdata <= fifo.data(to_integer(unsigned(fifo.r_pnt(fifo.r_pnt'left-1 downto 0))));
end if;
end process fifo_read;
end generate;
end generate; -- /fifo_read_async

-- synchronous read --
fifo_read_sync:
Expand All @@ -190,7 +208,7 @@ begin
end if;
end if;
end process fifo_read;
end generate;
end generate; -- /fifo_read_sync


-- Output Gate ----------------------------------------------------------------------------
Expand Down
8 changes: 4 additions & 4 deletions rtl/core/neorv32_package.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ package neorv32_package is

-- Architecture Constants (do not modify!) ------------------------------------------------
-- -------------------------------------------------------------------------------------------
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01070804"; -- NEORV32 version - no touchy!
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01070805"; -- NEORV32 version - no touchy!
constant archid_c : natural := 19; -- official RISC-V architecture ID - hands off!

-- Check if we're inside the Matrix -------------------------------------------------------
Expand Down Expand Up @@ -1007,7 +1007,7 @@ package neorv32_package is
-- Tuning Options --
FAST_MUL_EN : boolean := false; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean := false; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural := 2; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural := 1; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural := 0; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural := 4; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -1173,7 +1173,7 @@ package neorv32_package is
-- Tuning Options --
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -1245,7 +1245,7 @@ package neorv32_package is
-- Extension Options --
FAST_MUL_EN : boolean; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural; -- entries is instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural; -- entries is instruction prefetch buffer, has to be a power of 2, min 1
-- Physical memory protection (PMP) --
PMP_NUM_REGIONS : natural; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
4 changes: 2 additions & 2 deletions rtl/core/neorv32_top.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ entity neorv32_top is
-- Tuning Options --
FAST_MUL_EN : boolean := false; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean := false; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural := 2; -- entries in instruction prefetch buffer, has to be a power of 2, min 2
CPU_IPB_ENTRIES : natural := 1; -- entries in instruction prefetch buffer, has to be a power of 2, min 1

-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural := 0; -- number of regions (0..16)
Expand Down Expand Up @@ -564,7 +564,7 @@ begin
-- Extension Options --
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
CPU_IPB_ENTRIES => CPU_IPB_ENTRIES, -- entries is instruction prefetch buffer, has to be a power of 2
CPU_IPB_ENTRIES => CPU_IPB_ENTRIES, -- entries is instruction prefetch buffer, has to be a power of 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS => PMP_NUM_REGIONS, -- number of regions (0..16)
PMP_MIN_GRANULARITY => PMP_MIN_GRANULARITY, -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
2 changes: 2 additions & 0 deletions rtl/system_integration/neorv32_ProcessorTop_stdlogic.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ entity neorv32_ProcessorTop_stdlogic is
-- Extension Options --
FAST_MUL_EN : boolean := false; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean := false; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural := 1; -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural := 0; -- number of regions (0..16)
PMP_MIN_GRANULARITY : natural := 4; -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down Expand Up @@ -311,6 +312,7 @@ begin
-- Extension Options --
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
CPU_IPB_ENTRIES => CPU_IPB_ENTRIES, -- entries in instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS => PMP_NUM_REGIONS, -- number of regions (0..16)
PMP_MIN_GRANULARITY => PMP_MIN_GRANULARITY, -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
2 changes: 1 addition & 1 deletion rtl/system_integration/neorv32_SystemTop_AvalonMM.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ entity neorv32_top_avalonmm is
-- Extension Options --
FAST_MUL_EN : boolean := false; -- use DSPs for M extension's multiplier
FAST_SHIFT_EN : boolean := false; -- use barrel shifter for shift operations
CPU_IPB_ENTRIES : natural := 2; -- entries is instruction prefetch buffer, has to be a power of 2
CPU_IPB_ENTRIES : natural := 1; -- entries is instruction prefetch buffer, has to be a power of 1, min 1

-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS : natural := 0; -- number of regions (0..16)
Expand Down
1 change: 1 addition & 0 deletions rtl/system_integration/neorv32_SystemTop_axi4lite.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,7 @@ begin
-- Extension Options --
FAST_MUL_EN => FAST_MUL_EN, -- use DSPs for M extension's multiplier
FAST_SHIFT_EN => FAST_SHIFT_EN, -- use barrel shifter for shift operations
CPU_IPB_ENTRIES => 2, -- entries is instruction prefetch buffer, has to be a power of 2, min 1
-- Physical Memory Protection (PMP) --
PMP_NUM_REGIONS => PMP_NUM_REGIONS, -- number of regions (0..16)
PMP_MIN_GRANULARITY => PMP_MIN_GRANULARITY, -- minimal region granularity in bytes, has to be a power of 2, min 4 bytes
Expand Down
Loading