Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working Calyx Implementation of AXI Read channels #1820

Merged
merged 29 commits into from
Jan 6, 2024
Merged

Conversation

nathanielnrn
Copy link
Contributor

This PR adds a working implementation of the read portion of an AXI interface according to the latest spec and RTL kernel specifications for XRT.

Currently the tests are a bit of a pain to run, requiring some replacement of the verilog code generated. To this end here is a zipped .fst showing the waveforms of the three tests, in case anyone is interested.

The calyx is heavily commented, so hopefully not too bad to go through. However I know this is a lot so happy to hop on a call sometime to go over things.

Some thoughts about this so far:

  1. There are definitely some inefficiencies in our interface. The immediate one that comes
    to mind is that handshakes require 2 cycles instead of one. Servicing reads (writing them to the seq_mem_d1 in the calyx program takes ~9 cycles. And we also utilize a data_bus only as wide as our memory (32 bits for an eventual integer dot-product test case).
  2. Some of the grouping we were initially trying to have, a blocking group, a servicing group, an assert and deassert group, had to be combined together due to the time it takes to transfer between groups. @sampsyo and I spoke very broadly about there possibly being future work here to enable faster transitions between groups (1 cycle as opposed to 2 right now IIRC).

These will be added to #1733 and tracked there but for now some TODOs:

  1. Make cocotb tests interface with runt
  2. Create tests that perform more than 1 transaction
  3. Work on write channels.

Hopefully this last one will be a bit more straightforward now that there is some infrastructure in place and working channels to pattern match off of.

cc @andrewb1999 @rachitnigam @sampsyo

@nathanielnrn nathanielnrn added the C: FPGA Changes for the FPGA backend label Dec 21, 2023
@sampsyo
Copy link
Contributor

sampsyo commented Dec 21, 2023

Awesome!!!!! Really cool that this is minimally working for the read side.

To slightly expand on this:

Some of the grouping we were initially trying to have, a blocking group, a servicing group, an assert and deassert group, had to be combined together due to the time it takes to transfer between groups.

I actually think this is a super interesting problem that has implications for static/dynamic Calyx assignments. Artistically, we would like to be able to manage an AXI transaction with a control program like this:

while <more transactions to do> {
  seq {
    assert_ready;  // tell the remote side we are ready for their data
    wait_for_valid;  // wait for them to send us stuff
    par {
      do_transfer;  // actually do things the data on the bus
      deassert_ready;  // tell them to wait while we process the data
    }
  }
}

The problem is that AXI, of course, is super cycle-timing-sensitive. Remember that one of the core invariants is:

On any cycle when both ready=1 and valid=1, a transaction occurs.

So the remote side can schedule these transaction cycles back-to-back, i.e., 3 transactions can happen in exactly 3 cycles. Most saliently for us, say the remote side just keeps its valid signal high all the time and relies on us to assert/deassert ready to do transfers. Then, if we need to take some time to process a transfer, we have to be absolutely sure we only assert ready for one cycle at a time.

The problem with dynamic Calyx is that it offers no guarantees about how long it takes to go from wait_for_valid above to deassert_ready. And in practice, it actually does waste at least one cycle, meaning that the remote side sees us do two transfers when we only wanted one.

The big upshot of static Calyx is that it does offer these guarantees, so we would like to be able to just sprinkle the static everywhere get the timing we want. There is a critical problem, however: the wait_for_valid group is fundamentally dynamic. Our "dynamic islands" approach does not support sequencing a dynamic group with a static one with a zero-cycle transition latency.

So the big design question that I think this opens up is: is there some rule we can make up that would allow this kind of sequencing? That is, to allow some version of seq { a ; b } where a is dynamic and b is static where we can guarantee that b runs in the same cycle where a says it is done. It seems hard to do in general but maybe there is something we can do for restricted cases?

Copy link
Contributor

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great. For others reading this, the current thing is a hand-written, standalone Calyx program; getting this tested and working is a stepping stone toward "wiring it up" to core Calyx logic.

What do you think about renaming the top-level directory from fpgas to yxi to reflect the broader scope of work here?

nathanielnrn and others added 23 commits December 22, 2023 11:46
TBD if this actually implements AXI correctly.

There are currently some hacks in place (marked with todos)
to get this to compile, namely some splicing
that doesn't consider what we actually want to splice
(it just takes [31:0]) as opposed to dynamically considering actual
bits we want.

A few other things that should be cleaned up eventually

Need to create a cocotb testbench to test correctness
Maybe this shouldn't be here, but for now (having deleted my
working directory earlier) putting it here
Simply run make from the cocotb directory and axi-read-tests
will be executed
We tie ARID low in our manager
Prefixes should not contain trailing "_"
Got rid of "assert_val" and "block_transfer" groups
and instead perform these things inside "do_ar_transfer", this is
required because we cant assert valid before we drive the data
correctly, so needs to happen in parallel.

Currently: This seems to write 16 times to same place, this is due to
hardcoding of 16 in ar transfer, not sure why address doesn't
increment this is tbd (and next TODO)
This is part of read channel control sequence
Also reduces data bus width to 32
@rachitnigam
Copy link
Contributor

@sampsyo Interesting point! Do you want to move this into an issue? Seems like there is some level of deeper semantics tweaking needed to make this work because we'd need to think about providing timing guarantees in a dynamic context.

@sampsyo
Copy link
Contributor

sampsyo commented Jan 4, 2024

FYI, I think this is good to merge whenever you're ready, @nathanielnrn!

@nathanielnrn nathanielnrn merged commit 662cb88 into main Jan 6, 2024
5 checks passed
@nathanielnrn nathanielnrn deleted the axi-calyx-gen branch January 6, 2024 15:45
rachitnigam added a commit that referenced this pull request Feb 16, 2024
* init commit of hardcoded axi wrapper for a 'main' kernel

* add axi-reads-calyx

* hook up inputs to channels in the wrapper. tbd if this works

* Working calyx verison of AR and R

TBD if this actually implements AXI correctly.

There are currently some hacks in place (marked with todos)
to get this to compile, namely some splicing
that doesn't consider what we actually want to splice
(it just takes [31:0]) as opposed to dynamically considering actual
bits we want.

A few other things that should be cleaned up eventually

Need to create a cocotb testbench to test correctness

* Track output of compiled calyx read channel

Maybe this shouldn't be here, but for now (having deleted my
working directory earlier) putting it here

* update gitignore to get rid of sim_build and other cocotb artifacts

* Working make files  for running cocotb tests

Simply run make from the cocotb directory and axi-read-tests
will be executed

* Add xID signals for cocotb compatability

We tie ARID low in our manager

* Fix prefix issue on cocotb axi test bench

Prefixes should not contain trailing "_"

* commit to repro 'make WAVES=1' cocotb error from axi-reads-calyx.futil

* axi-reads patch

* sync debug

* Add txn_len initialization to 16 in calyx program

* AXI Read fixed to get to read channel start

Got rid of "assert_val" and "block_transfer" groups
and instead perform these things inside "do_ar_transfer", this is
required because we cant assert valid before we drive the data
correctly, so needs to happen in parallel.

Currently: This seems to write 16 times to same place, this is due to
hardcoding of 16 in ar transfer, not sure why address doesn't
increment this is tbd (and next TODO)

* Add integer byte conversion for tests on Calyx AXI testharness

* WIP get reads to work. Add incr_curr_addr group

This is part of read channel control sequence

* remove .fst from tracking

* Add more data to testbench to make waveform viewing easier

* Reads seem to be terminating correctly at RLAST

* AR transfers seem to work, valid is high for 1 cycle

* Unreduced axi-reads-calyx.futil

Also reduces data bus width to 32

* Cocotb testbench now passes

* Formatted and passing axi-read-tests

* Reduce and comment axi-reads-calyx.futil

* remove axi-reads.v from being tracked

* add a todo

* add required ARPROT signal. This is hardcoded to be priviliged

* rename directories to yxi/axi-calyx

* remove a guard in favor of 1'b1 to simplify reading of source code

---------

Co-authored-by: Rachit Nigam <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: FPGA Changes for the FPGA backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants