Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ simplify XBUS gateway #876

Merged
merged 12 commits into from
Apr 16, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Link |
|:----:|:-------:|:--------|:----:|
| 15.04.2024 | 1.9.8.3 | :warning: simplify XBUS gateway logic and configuration generics; only "pipelined Wishbone" protocol is supported now | [#876](https://github.com/stnolting/neorv32/pull/876) |
| 14.04.2024 | 1.9.8.2 | :warning: rename SLINK data interface registers; minor CPU control logic/area optimizations | [#874](https://github.com/stnolting/neorv32/pull/874) |
| 13.04.2024 | 1.9.8.1 | minor rtl code cleanups and optimizations | [#872](https://github.com/stnolting/neorv32/pull/872) |
| 04.04.2024 | [**:rocket:1.9.8**](https://github.com/stnolting/neorv32/releases/tag/v1.9.8) | **New release** | |
Expand Down
4 changes: 1 addition & 3 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -251,9 +251,7 @@ The generic type "`suv(x:y)`" is an abbreviation for "`std_ulogic_vector(x downt
4+^| **<<_processor_external_bus_interface_xbus>> (Wishbone b4 protocol)**
| `XBUS_EN` | boolean | false | Implement the external bus interface.
| `XBUS_TIMEOUT` | natural | 255 | Clock cycles after which a pending external bus access will auto-terminate and raise a bus fault exception.
| `XBUS_PIPE_MODE` | boolean | false | Use _standard_ ("classic") Wishbone protocol when false. Use _pipelined_ Wishbone protocol when true.
| `XBUS_ASYNC_RX` | boolean | false | Disable input registers when true.
| `XBUS_ASYNC_TX` | boolean | false | Disable output registers when true.
| `XBUS_REGSTAGE_EN` | boolean | false | Implement XBUS register stages to ease timing closure.
| `XBUS_CACHE_EN` | boolean | false | Implement the external bus cache.
| `XBUS_CACHE_NUM_BLOCKS` | natural | 64 | Number of blocks ("lines"). Has to be a power of two.
| `XBUS_CACHE_BLOCK_SIZE` | natural | 32 | Size in bytes of each block. Has to be a power of two.
Expand Down
166 changes: 68 additions & 98 deletions docs/datasheet/soc_xbus.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,63 +5,57 @@
[cols="<3,<3,<4"]
[frame="topbot",grid="none"]
|=======================
| Hardware source file(s): | neorv32_xbus.vhd | External bus gateway
| | neorv32_cache.vhd | Generic cache module
| Software driver file(s): | none | _implicitly used_
| Top entity port(s): | `xbus_adr_o` | address output (32-bit)
| | `xbus_dat_i` | data input (32-bit)
| | `xbus_dat_o` | data output (32-bit)
| | `xbus_we_o` | write enable (1-bit)
| | `xbus_sel_o` | byte enable (4-bit)
| | `xbus_stb_o` | strobe (1-bit)
| | `xbus_cyc_o` | valid cycle (1-bit)
| | `xbus_ack_i` | acknowledge (1-bit)
| | `xbus_err_i` | bus error (1-bit)
| Configuration generics: | `XBUS_EN` | enable external bus interface when `true`
| | `XBUS_TIMEOUT` | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)
| | `XBUS_PIPE_MODE` | when `false` (default): classic/standard Wishbone protocol; when `true`: pipelined Wishbone protocol
| | `XBUS_ASYNC_RX` | use registered RX path when `false` (default); use async/direct RX path when `true`
| | `XBUS_ASYNC_TX` | use registered TX path when `false` (default); use async/direct TX path when `true`
| | `XBUS_CACHE_EN` | implement the external bus cache
| | `XBUS_CACHE_NUM_BLOCKS` | number of blocks ("lines"), has to be a power of two.
| | `XBUS_CACHE_BLOCK_SIZE` | size in bytes of each block, has to be a power of two.
| CPU interrupts: | none |
| Hardware source files: | neorv32_xbus.vhd | External bus gateway
| | neorv32_cache.vhd | Generic cache module
| Software driver files: | none | _implicitly used_
| Top entity ports: | `xbus_adr_o` | address output (32-bit)
| | `xbus_dat_i` | data input (32-bit)
| | `xbus_dat_o` | data output (32-bit)
| | `xbus_we_o` | write enable (1-bit)
| | `xbus_sel_o` | byte enable (4-bit)
| | `xbus_stb_o` | bus strobe (1-bit)
| | `xbus_cyc_o` | valid cycle (1-bit)
| | `xbus_ack_i` | acknowledge (1-bit)
| | `xbus_err_i` | bus error (1-bit)
| Configuration generics: | `XBUS_EN` | enable external bus interface when `true`
| | `XBUS_TIMEOUT` | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled)
| | `XBUS_REGSTAGE_EN` | implement XBUS register stages
| | `XBUS_CACHE_EN` | implement the external bus cache
| | `XBUS_CACHE_NUM_BLOCKS` | number of blocks ("lines"), has to be a power of two.
| | `XBUS_CACHE_BLOCK_SIZE` | size in bytes of each block, has to be a power of two.
| CPU interrupts: | none |
|=======================


The external bus interface provides a Wishbone b4-compatible on-chip bus interface that is
implemented if the `XBUS_EN` generic is `true`. This bus interface can be used to attach external memories,
custom hardware accelerators, additional peripheral devices or all other kinds of IP blocks.
**Overview**

The external bus interface provides a **Wishbone b4**-compatible on-chip bus interface that is
implemented if the `XBUS_EN` generic is `true`. This bus interface can be used to attach processor-external
modules like memories, custom hardware accelerators or additional peripheral devices.
An optional cache module ("XCACHE") can be enabled to improve memory access latency.

.Address Mapping
[IMPORTANT]
The external interface is **not** mapped to a specific address space. Instead, all CPU memory accesses that
do not target a specific (and actually implemented) processor-internal address region (hence, accessing the "void";
see section <<_address_space>>) are redirected to the external bus interface.
see section <<_address_space>>) are **redirected** to the external bus interface.


**Wishbone Bus Protocol**

The external bus interface either uses the **standard** (also called "classic") Wishbone protocol (default) or
**pipelined** Wishbone protocol. The protocol to be used is configured via the `XBUS_PIPE_MODE` generic:
The external bus interface complies to the **pipelined Wishbone b4** protocol. Even though this protocol
was explicitly designed to support pipelined transfers, only a single transfer will be "in fly" at once.
Hence, just two types of bus transactions are generated by the XBUS controller (see images below).

* If `XBUS_PIPE_MODE` is `false`, all bus control signals including `xbus_stb_o` are active and remain stable until the
transfer is acknowledged/terminated.
* If `XBUS_PIPE_MODE` is `true`, all bus control except `xbus_stb_o` are active and remain until the transfer is
acknowledged/terminated. In this case, `xbus_stb_o` is asserted only during the very first bus clock cycle.
.XBUS/Wishbone Write Transaction
image::xbus_write.png[700]

.Exemplary Wishbone bus accesses using "classic" and "pipelined" protocol
[cols="^2,^2"]
[grid="none"]
|=======================
a| image::wishbone_classic_read.png[700,300]
a| image::wishbone_pipelined_write.png[700,300]
| **Classic** Wishbone read access | **Pipelined** Wishbone write access
|=======================
.XBUS/Wishbone Read Transaction
image::xbus_read.png[700]

[WARNING]
If the Wishbone interface is configured to operate in classic/standard mode (`XBUS_PIPE_MODE` = false) a
**sync** RX path (`XBUS_ASYNC_RX` = false) is required for the inter-cycle pause. If `XBUS_ASYNC_RX` is
enabled while `XBUS_PIPE_MODE` is disabled the module will automatically disable the asynchronous RX option.
.Endianness
[NOTE]
Just like the processor itself the XBUS interface uses **little-endian** byte order.

.Wishbone Specs.
[TIP]
Expand All @@ -70,73 +64,49 @@ can be found in the data sheet "Wishbone B4 - WISHBONE System-on-Chip (SoC) Inte
Architecture for Portable IP Cores". A copy of this document can be found in the `docs` folder of this
project.

.Endianness
[NOTE]
Just like the processor itself the XBUS interface uses **little-endian** byte order.


**Bus Access**

The NEORV32 XBUS interface does not support burst transfers yet, so there is always just a single transfer "in fly".
Hence, the Wishbone `STALL` signal is not implemented. An accessed Wishbone device does not have to respond immediately to a bus
request by sending an ACK. Instead, there is a _time window_ where the device has to acknowledge the transfer. This time window
is configured by the `XBUS_TIMEOUT` generic that defines the maximum time (in clock cycles) a bus access can be pending
before it is automatically terminated with an error condition. If `XBUS_TIMEOUT` is set to zero, the timeout is disabled
and a bus access can take an arbitrary number of cycles to complete (this is not recommended!).
An accessed XBUS/Wishbone device does not have to respond immediately to a bus request by sending an `ACK`.
Instead, there is a **time window** where the device has to acknowledge the transfer. This time window
is configured by the `XBUS_TIMEOUT` generic and it defines the maximum time (in clock cycles) a bus access can
be pending before it is automatically terminated raising an bus fault exception. If `XBUS_TIMEOUT` is set to zero,
the timeout is disabled and a bus access can take an arbitrary number of cycles to complete. Note that this is not
recommended as a missing ACK will permanently stall the entire processor!

When `XBUS_TIMEOUT` is greater than zero, the Wishbone gateway starts an internal countdown whenever the CPU
accesses an address via the external memory interface. If the accessed device does not acknowledge (via `xbus_ack_i`)
or terminate (via `xbus_err_i`) the transfer within `XBUS_TIMEOUT` clock cycles, the bus access is automatically canceled
setting `xbus_cyc_o` low again and a CPU load/store/instruction fetch bus access fault exception is raised.
Furthermore, an accesses XBUS/Wishbone device can signal an error condition at any time by setting the `ERR` signal
high for one cycle. This will also terminate the current bus transaction before raising a CPU bus fault exception.


**Access Latency**

By default, the XBUS gateway introduces two additional latency cycles since processor-outgoing (`*_o`) and
processor-incoming (`*_i`) signals are fully registered. Thus, any access from the CPU to a processor-external devices
via the XBUS interface requires 2 additional clock cycles. This can ease timing closure when using large (combinatorial)
processor-external interconnection networks.

Optionally, the latency of the XBUS gateway can be reduced by removing the input and/or output register stages.
Enabling the `XBUS_ASYNC_RX` option will remove the input register stage; enabling `XBUS_ASYNC_TX` option will
remove the output register stages. Note that using those "async" options might impact timing closure.

.Output Gating
[NOTE]
All outgoing Wishbone signals use a "gating mechanism" so they only change if there is a actual XBUS transaction being in
progress. This can reduce dynamic switching activity in the external bus system and also simplifies simulation-based
inspection of the Wishbone transactions. Note that this output gating is only available if the output register buffer is not
disabled (`XBUS_ASYNC_TX` = `false`).
.Register Stage
[TIP]
An optional register stage can be added to the XBUS gateway to break up the critical path easing timing closure.
When `XBUS_REGSTAGE_EN` is _true_ all outgoing and incoming XBUS signals are registered increasing access latency
by two cycles. Furthermore, all outgoing signals (like the address) will be kept stable if there is no bus access
being initiated.


**External Bus Cache (X-CACHE)**

The XBUS interface provides an optional internal cache that can be used to buffer processor-external accesses.
The x-cache is enabled via the `XBUS_CACHE_EN` generic. The total size of the cache is split into the number of
cache lines or cache blocks (`XBUS_CACHE_NUM_BLOCKS` generic) and the line or block size in bytes
(`XBUS_CACHE_BLOCK_SIZE` generic).

.Simplified X-Cache Architecture
[source,asciiart]
---------------------------------------
Simplified cache architecture ("->" = direction of access requests):

Direct Access +----------+
/|-------------------------->| Register |------------------------->|\
| | +----------+ | |
Core ---->| | | |----> XBUS
| | +--------------+ +--------------+ +-------------+ | |
\|--->| Host Arbiter |---->| Cache Memory |<----| Bus Arbiter |--->|/
+--------------+ +--------------+ +-------------+
Direct Access +----------+
/|------------------------->| Register |------------------------>|\
| | +----------+ | |
Core --->| | | |---> XBUS
| | +--------------+ +--------------+ +-------------+ | |
\|--->| Host Arbiter |--->| Cache Memory |<---| Bus Arbiter |--->|/
+--------------+ +--------------+ +-------------+
---------------------------------------

The XBUS interface provides an optional cache module that can be used to buffer and improve processor-external accesses.
The cache uses a direct-mapped architecture that implements "write-allocate" and "write-back" strategies.

The **write-allocate** strategy will fetch the entire referenced block from main memory when encountering
a cache write-miss. The **write-back** strategy will gather all writes locally inside the cache until the according
cache block is about to be replaced. In this case, the entire modified cache block is written back to main memory.

The x-cache is enabled via the `XBUS_CACHE_EN` generic. The total size of the cache is split into the number of cache lines
or cache blocks (`XBUS_CACHE_NUM_BLOCKS` generic) and the line or block size in bytes (`XBUS_CACHE_BLOCK_SIZE` generic).

The x-cache also provides "direct accesses" that bypass the cache. For example, this can be used to access processor-external
memory-mapped IO. All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF` will always bypass the cache
(see section <<_address_space>>). Furthermore, load-reservate and store conditional <<_atomic_accesses>> will also always bypass the
cache **regardless of the accessed address**.


The x-cache also provides "direct accesses" that bypass the cache. For example, this can be used to access
processor-external memory-mapped IO. All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
will always bypass the cache (see section <<_address_space>>). Furthermore, load-reservate and store conditional
<<_atomic_accesses>> will also always bypass the cache **regardless of the accessed address**.
Binary file removed docs/figures/wishbone_classic_read.png
Binary file not shown.
Binary file removed docs/figures/wishbone_pipelined_write.png
Binary file not shown.
Binary file added docs/figures/xbus_read.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figures/xbus_write.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
{name: 'xbus_dat_o', wave: 'x....|.x.'},
{name: 'xbus_we_o', wave: 'x0...|.x.'},
{name: 'xbus_sel_o', wave: 'x....|.x.'},
{name: 'xbus_stb_o', wave: '01...|.0.'},
{name: 'xbus_stb_o', wave: '010..|...'},
{name: 'xbus_cyc_o', wave: '01...|.0.'},
{name: 'xbus_ack_i', wave: '0....|10.'},
{name: 'xbus_err_i', wave: '0....|...'},
{name: 'xbus_ack_i', wave: 'x0...|1x.'},
{name: 'xbus_err_i', wave: 'x0...|.x.'},
]}
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@
{name: 'xbus_sel_o', wave: 'x3...|.x.', data: ['byte_enable']},
{name: 'xbus_stb_o', wave: '010..|...'},
{name: 'xbus_cyc_o', wave: '01...|.0.'},
{name: 'xbus_ack_i', wave: '0....|10.'},
{name: 'xbus_err_i', wave: '0....|...'},
]}
{name: 'xbus_ack_i', wave: 'x0...|1x.'},
{name: 'xbus_err_i', wave: 'x0...|.x.'},
]}
6 changes: 2 additions & 4 deletions rtl/core/neorv32_package.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ package neorv32_package is

-- Architecture Constants -----------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01090802"; -- hardware version
constant hw_version_c : std_ulogic_vector(31 downto 0) := x"01090803"; -- hardware version
constant archid_c : natural := 19; -- official RISC-V architecture ID
constant XLEN : natural := 32; -- native data path width

Expand Down Expand Up @@ -758,9 +758,7 @@ package neorv32_package is
-- External bus interface (XBUS) --
XBUS_EN : boolean := false;
XBUS_TIMEOUT : natural := 255;
XBUS_PIPE_MODE : boolean := false;
XBUS_ASYNC_RX : boolean := false;
XBUS_ASYNC_TX : boolean := false;
XBUS_REGSTAGE_EN : boolean := false;
XBUS_CACHE_EN : boolean := false;
XBUS_CACHE_NUM_BLOCKS : natural := 64;
XBUS_CACHE_BLOCK_SIZE : natural := 32;
Expand Down
12 changes: 4 additions & 8 deletions rtl/core/neorv32_top.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,7 @@ entity neorv32_top is
-- External bus interface (XBUS) --
XBUS_EN : boolean := false; -- implement external memory bus interface?
XBUS_TIMEOUT : natural := 255; -- cycles after a pending bus access auto-terminates (0 = disabled)
XBUS_PIPE_MODE : boolean := false; -- protocol: false=classic/standard wishbone mode, true=pipelined wishbone mode
XBUS_ASYNC_RX : boolean := false; -- use register buffer for RX data when false
XBUS_ASYNC_TX : boolean := false; -- use register buffer for TX data when false
XBUS_REGSTAGE_EN : boolean := false; -- add XBUS register stage
XBUS_CACHE_EN : boolean := false; -- enable external bus cache (x-cache)
XBUS_CACHE_NUM_BLOCKS : natural := 64; -- x-cache: number of blocks (min 1), has to be a power of 2
XBUS_CACHE_BLOCK_SIZE : natural := 32; -- x-cache: block size in bytes (min 4), has to be a power of 2
Expand Down Expand Up @@ -933,13 +931,11 @@ begin
neorv32_xbus_inst_true:
if XBUS_EN generate

-- bus gateway (Wishbone) --
-- external bus gateway (XBUS) --
neorv32_xbus_inst: entity neorv32.neorv32_xbus
generic map (
BUS_TIMEOUT => XBUS_TIMEOUT,
PIPE_MODE => XBUS_PIPE_MODE,
ASYNC_RX => XBUS_ASYNC_RX,
ASYNC_TX => XBUS_ASYNC_TX
TIMEOUT_VAL => XBUS_TIMEOUT,
REGSTAGE_EN => XBUS_REGSTAGE_EN
)
port map (
clk_i => clk_i,
Expand Down
Loading