Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the Asyncfifo in the storage class #26

Closed
cklarhorst opened this issue Aug 6, 2020 · 8 comments
Closed

Question about the Asyncfifo in the storage class #26

cklarhorst opened this issue Aug 6, 2020 · 8 comments

Comments

@cklarhorst
Copy link
Contributor

There is an AsyncFifo in the storage class (core.py:176):
cdc = stream.AsyncFIFO([("data", data_width)], 4)
which I think is responsible to hand over the data from the scope domain into the sys clock domain.

As expected the code includes a ClockDomainRenamer (core:177):
cdc = ClockDomainsRenamer({"write": "scope", "read": "sys"})(cdc)
but to me it looks like the output of the fifo is still wrongly clocked (in the final design).

Why?

  • I think the cdc_asyncfifo_dout often appears in the critical path (for a very fast scope clk)
  • The synthesis log shows:
  ---------------------------------------------------------------------------------------------------
    | ram_type           | Distributed                                                   |          |
    -------------------------------------------------------------------------------------------------
    | Port A                                                                             |          |
    |     aspect ratio   | 32-word x 22-bit                                              |          |
    |     clkA           | connected to signal <[scope clk]>                             | rise     |
    |     weA            | connected to signal <storage_cdc_graycounter0_ce>             | high     |
    |     addrA          | connected to signal <storage_cdc_graycounter0_q_binary<4:0>>  |          |
    |     diA            | connected to signal <storage_cdc_asyncfifo_din>               |          |
    -------------------------------------------------------------------------------------------------
    | Port B                                                                             |          |
    |     aspect ratio   | 32-word x 22-bit                                              |          |
    |     addrB          | connected to signal <memadr_6>                                |          |
    |     doB            | connected to signal <storage_cdc_asyncfifo_dout>              |          |
    -------------------------------------------------------------------------------------------------

==> I would expect that Port B should have clkB = [sys clk]

  • The generated verilog code shows:
reg [21:0] storage_8[0:3];
reg [1:0] memadr_5;
reg [1:0] memadr_6;
always @(posedge scope_clk) begin
	if (storage_cdc_wrport_we)
		storage_8[storage_cdc_wrport_adr] <= storage_cdc_wrport_dat_w;
	memadr_5 <= storage_cdc_wrport_adr;
end

always @(posedge sys_clk) begin
	memadr_6 <= storage_cdc_rdport_adr;
end

assign storage_cdc_wrport_dat_r = storage_8[memadr_5];
assign storage_cdc_rdport_dat_r = storage_8[memadr_6];
[cut]
assign storage_cdc_asyncfifo_dout = storage_cdc_rdport_dat_r;
[cut]
storage_mem_data_status <= storage_cdc_source_payload_data;

==> I would expect to see a "<=" in the sys_clk block.

I then tried to change core.py:220 from

self.comb += [
	self.mem_valid.status.eq(cdc.source.valid),
	cdc.source.ready.eq(self.mem_data.we | ~self.enable.storage),
	self.mem_data.status.eq(cdc.source.data)
]

into

self.sync += [
	self.mem_valid.status.eq(cdc.source.valid),
	cdc.source.ready.eq(self.mem_data.we | ~self.enable.storage),
	self.mem_data.status.eq(cdc.source.data)
]

in hope to generate such a sys_clk block around the "<=" but it didn't change anything.

So my questions:

  1. Shouldn't self.sync always generate always @(posedge sys_clk) begin blocks around the statements?
  2. How do I get the output of the fifo clocked from the sys domain?

I hope I provided enough and helpful information.
Thanks in advance for your helpful comments.

@enjoy-digital
Copy link
Owner

@cklarhorst: would you mind sharing a minimal design to have a look at it? Just to be sure to analyze things on a similar design.

@cklarhorst
Copy link
Contributor Author

Sure, I will try to create a minimal example

@cklarhorst
Copy link
Contributor Author

cklarhorst commented Aug 7, 2020

Modified minispartan6 target: minispartan6.py.txt

I changed:

  • Remove RAM
  • The CRG create a sys2x_ps clk
  • Create a simple counter in sys2x_ps domain
  • Used LiteScope to analyze the counter

Results:

trce -e 5 minispartan6.ncd:

Slack:                  -2.842ns (requirement - (data path - clock path skew + uncertainty))
  Source:               Mram_storage_313/DP (RAM)
  Destination:          trigger_mem_graycounter1_q_3 (FF)
  Requirement:          1.562ns
  Data Path Delay:      3.833ns (Levels of Logic = 4)(Component delays alone exceeds constraint)
  Clock Path Skew:      -0.339ns (1.489 - 1.828)
  Source Clock:         clkout_buf0 rising at 0.000ns
  Destination Clock:    clkout_buf2 rising at 1.562ns
  Clock Uncertainty:    0.232ns

  Clock Uncertainty:          0.232ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter (TSJ):  0.070ns
    Discrete Jitter (DJ):       0.211ns
    Phase Error (PE):           0.120ns

  Maximum Data Path at Slow Process Corner: Mram_storage_313/DP to trigger_mem_graycounter1_q_3
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    SLICE_X34Y59.B       Tshcko                0.885   trigger_mem_asyncfifo_dout<12>
                                                       Mram_storage_313/DP
    SLICE_X39Y61.B6      net (fanout=4)        0.548   trigger_mem_asyncfifo_dout<12>
    SLICE_X39Y61.B       Tilo                  0.259   xilinxmultiregimpl4_regs0<3>
                                                       trigger_mem_graycounter1_ce1131
    SLICE_X38Y60.A4      net (fanout=1)        0.436   trigger_mem_graycounter1_ce113
    SLICE_X38Y60.A       Tilo                  0.203   trigger_mem_graycounter1_ce111
                                                       trigger_mem_graycounter1_ce7
    SLICE_X38Y59.B4      net (fanout=1)        0.430   trigger_mem_graycounter1_ce7
    SLICE_X38Y59.B       Tilo                  0.203   trigger_mem_graycounter1_q<4>
                                                       trigger_mem_graycounter1_ce8
    SLICE_X37Y60.C4      net (fanout=9)        0.547   trigger_mem_graycounter1_ce8
    SLICE_X37Y60.CLK     Tas                   0.322   trigger_mem_graycounter1_q<3>
                                                       Mxor_trigger_mem_graycounter1_q_next_3_xo<0>1
                                                       trigger_mem_graycounter1_q_3
    -------------------------------------------------  ---------------------------
    Total                                      3.833ns (1.872ns logic, 1.961ns route)
                                                       (48.8% logic, 51.2% route)

So this time it is the other AsyncFifo (the one in the trigger). For me it seems that both AsyncFifos have that problem. I would expect that the dout of the fifo should already be clocked by clkout_buf2 in this case.

Synthesis log:

INFO:Xst:3231 - The small RAM <Mram_storage_3> will be implemented on LUTs in order to maximize performance and save block RAM resources. If you want to force its implementation on block, use option/constraint ram_style.
    -----------------------------------------------------------------------
    | ram_type           | Distributed                         |          |
    -----------------------------------------------------------------------
    | Port A                                                              |
    |     aspect ratio   | 16-word x 22-bit                    |          |
    |     clkA           | connected to signal <clkout_buf0>   | rise     |
    |     weA            | connected to signal <trigger_mem_graycounter0_ce> | high     |
    |     addrA          | connected to signal <trigger_mem_graycounter0_q_binary<3:0>> |          |
    |     diA            | connected to signal <("00",trigger_mem_value_storage,trigger_mem_mask_storage)> |          |
    -----------------------------------------------------------------------
    | Port B                                                              |
    |     aspect ratio   | 16-word x 22-bit                    |          |
    |     addrB          | connected to signal <memadr_3>      |          |
    |     doB            | connected to signal <trigger_mem_asyncfifo_dout> |          |

So no clk on Port B.

The relevant verilog part for the asyncfifo looks like:

reg [21:0] storage_3[0:15];
reg [3:0] memadr_2;
reg [3:0] memadr_3;
always @(posedge sys_clk) begin
	if (trigger_mem_wrport_we)
		storage_3[trigger_mem_wrport_adr] <= trigger_mem_wrport_dat_w;
	memadr_2 <= trigger_mem_wrport_adr;
end

always @(posedge scope_clk) begin
	memadr_3 <= trigger_mem_rdport_adr;
end

assign trigger_mem_wrport_dat_r = storage_3[memadr_2];
assign trigger_mem_rdport_dat_r = storage_3[memadr_3];

so maybe the assign trigger_mem_rdport_dat_r = storage_3[memadr_3]; made the synthesis tool think that this equals a Dual-Port RAM with Asynchronous Read?

@cklarhorst
Copy link
Contributor Author

I just found out that changing the read & write port mode to READ_FIRST yield the expected behavior (migen/genlib/fifo.py:229):
rdport = storage.get_port(clock_domain="read", mode=READ_FIRST) and
wrport = storage.get_port(write_capable=True, clock_domain="write", mode=READ_FIRST)

results in:

reg [21:0] storage_3[0:15];
reg [21:0] memdat_5;
reg [21:0] memdat_6;
always @(posedge sys_clk) begin
	if (trigger_mem_wrport_we)
		storage_3[trigger_mem_wrport_adr] <= trigger_mem_wrport_dat_w;
	memdat_5 <= storage_3[trigger_mem_wrport_adr];
end

always @(posedge scope_clk) begin
	memdat_6 <= storage_3[trigger_mem_rdport_adr];
end

assign trigger_mem_wrport_dat_r = memdat_5;
assign trigger_mem_rdport_dat_r = memdat_6;

But it would increase latency by one.

@cklarhorst
Copy link
Contributor Author

Update:
I have created a smaller example: minispartan6.py.txt

It contains a small module:

class TestModule(Module, AutoCSR):
    def __init__(self):
        self.counter = counter = Signal(10)
        self.sync.sys2x_ps += counter.eq(counter + 1)

        self.counterCSR = CSRStatus(10)

        mem = stream.AsyncFIFO([("data", 10)], 4)
        mem = ClockDomainsRenamer({"write": "sys2x_ps", "read": "sys"})(mem)
        self.submodules += mem
        self.comb += [
            mem.source.ready.eq(self.counterCSR.we),self.counterCSR.status.eq(mem.source.data),
            mem.sink.valid.eq(1),
            mem.sink.data.eq(counter)
        ]

Architecture: Fast Counter -> AsyncFIFO -> CSR
And this already creates a timing problem with ISE.

Now I think that there might be a false path in the AsyncFIFO. I found out that the vivado backend would add an attribute to one of the involved FF (mr_ff) and the ISE backend doesn't do that. In fact the ISE backend doesn't create any false path constraints and also no async_reg attributes.

I'm not sure if I'm now really on the right track here btw.

@enjoy-digital
Copy link
Owner

@cklarhorst: a false path should indeed be applied on the gray counters of the AsyncFIFO. The false paths are not applied automatically in LiteScope and you have to use Platform.add_false_path_constraint in your design (or Platform.add_platform_command).

@cklarhorst
Copy link
Contributor Author

Regarding the automatic false paths:

Vivado:
I think they are automatically created because all MultiRegs get a special annotation here and afterwards here there is a special xdc line added that just marks all those paths as false paths.

Altera:
This backend uses a more global method, it just marks all paths interacting between clocks as false paths here.

Others:
I didn't found anything about those automatic false paths for the lattice, microsemi and ise backends.

So if I understood everything correctly I would say that I like that vivado approach the most because I would expect that it breaks early if someone misses a MultiReg.
Sadly, I think I'm not able to backport that solution for ISE because it uses get_nets to find all nets but in ISE the UCF file doesn't allow filtering nets by attributes.

For testing, I just added false path constraints to the minispartan6 by hand:

platform.add_platform_command("TIMESPEC TS_A1=TO FFS(xilinxmultiregimpl0_regs0) TIG;")
platform.add_platform_command("TIMESPEC TS_A2=TO FFS(xilinxmultiregimpl1_regs0*) TIG;")
platform.add_platform_command("TIMESPEC TS_A3=TO FFS(xilinxmultiregimpl2_regs0*) TIG;")

and it was working.

@enjoy-digital
Isn't it possible to somehow scan through the design (at the time when the ucf file is generated here) and find all MultiRegs? Then I would only need to figure out how they get named in the verilog file and I would be able to call add_platform_command and create a false path for each MultiReg.

Thanks for your help and sorry for the long post.

@cklarhorst
Copy link
Contributor Author

Hi again and sorry for the long delay.

I wrote a short function for the ISE backend that scans through the generated verilog code in order to call add_platform_command (to create a false path constraint) for each found mr_ff annotation.
Additionaly, I found that post which says

ISE and Diamond do not consider all clocks related by default. So, there is no need to automatically define false path constraints in these toolchains; although one might need to manually add false path constraints for clocks that go through PLLs, etc.

I know think that I stumbled upon that special case because I used LiteScope (and the AsyncFifo) class to analyze signals in different clocking regions (possible created through PLLs).

Function:

def gen_constraints_from_verilog(self, verilog, platform):
  import re
  mr_ff_pattern = re.compile('.*mr_ff.*reg (?:\[.*\] )?(\S+)')
  unique_id = 0
  for line in str(verilog).splitlines():
    try:
      reg_name = mr_ff_pattern.search(line).group(1)
      platform.add_platform_command("TIMESPEC TS_FALSE%d=TO FFS(%s*) TIG;" % (unique_id, reg_name))
      unique_id += 1
    except AttributeError:
      pass

@enjoy-digital Do you want to have a PR for this? (I think scanning through the generated source might not be the most elegant solution. Additionally, there doesn't seem to be many people affected by this?) Otherwise, I would like to close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants