Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support independent operator start and output collection #1274

Closed
andrewb1999 opened this issue Nov 16, 2022 · 4 comments
Closed

Support independent operator start and output collection #1274

andrewb1999 opened this issue Nov 16, 2022 · 4 comments
Labels
AMC Needed for Andrew's memory compiler C: Calyx Extension or change to the Calyx IL S: Discussion needed Issues blocked on discussion

Comments

@andrewb1999
Copy link
Collaborator

In the current version of Calyx it is generally assumed that each group issues a start to some component, waits for a done signal, and registers the outputs. However, there are many cases when using pipelined operators where we want to perform other computation while waiting for the output of that operator to be available.

One example of this is shown in PR #1270. We want to be able to feed new input data into a pipelined multiplier on every cycle to achieve an II of 1. This is the most obvious case where independent operator start and output collection is required. If we have to wait for the entire pipelined multiplier to finish before the group is done, we will never be able to achieve a real II of 1.

This issue isn't unique to pipelined loops however. A very common practice in unpipelined scheduling is resource sharing of pipelined resources. This is especially important for reducing the DSP usage for designs with a large number of multipliers.

As an example, let's consider the following scheduled, unpipelined design:
Unpipelined-schedule drawio(1)

By unpipelined here, we really mean the II is equal to the latency of the loop body. As a result, there are major sharing opportunities available between the different operators. We only need 1 adder and 1 multiplier to implement these operations with this schedule, instead of 2 adders and 2 multipliers if we did the native implementation.

In particular, to share the multiplier we need to set input values on cycle 0 and 1 and read the outputs on cycles 2 and 3.

@andrewb1999 andrewb1999 added the C: Calyx Extension or change to the Calyx IL label Nov 16, 2022
@rachitnigam rachitnigam added AMC Needed for Andrew's memory compiler S: Discussion needed Issues blocked on discussion labels Nov 17, 2022
@sampsyo
Copy link
Contributor

sampsyo commented Nov 19, 2022

Nice writeup of the underlying issue, @andrewb1999. In practice, the (only) way we've achieved pipelining in the past is with carefully-constructed par/seq nests, which turns out to be fairly limiting. From a very high level, this is related to the sharing-safety property Filament addresses… but how to wed that with normal Calyx programming is still an open question IMO.

@rachitnigam
Copy link
Contributor

For reference, here is what the corresponding Filament program looks like:

import "./primitives/core.fil";

extern "dummy.sv" {
    component FastMult<G>(
        @interface[G, G+1] go: 1,
        @[G, G+1] l: 32,
        @[G, G+1] r: 32,
    ) -> (
        @[G+2, G+3] out: 32,
    );
}

component main<G>(
    @interface[G, G+3] go: 1,
    @[G, G+1] a: 32,
    @[G, G+1] b: 32,
) -> (
    @[G+3, G+4] out0: 32,
    @[G+3, G+4] out1: 32
) {
    A := new Add[32];
    M := new FastMult;
    a0 := A<G>(a, a);
    r0 := new Delay[32]<G>(a0.out);
    m0 := M<G>(b, b);
    a1 := A<G+2>(m0.out, m0.out);
    m1 := M<G+1>(r0.out, r0.out);
    r1 := new Delay[32]<G+2>(a1.out); // Only necessary to make outputs available at the same time 
    out0 = r1.out;
    out1 = m1.out;
}

@rachitnigam
Copy link
Contributor

@andrewb1999 can we consider this "solved" for now since the #1338 pass showed you can represent these kinds of computations with precise timing guarantees?

@andrewb1999
Copy link
Collaborator Author

@andrewb1999 can we consider this "solved" for now since the #1338 pass showed you can represent these kinds of computations with precise timing guarantees?

Sure, that makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AMC Needed for Andrew's memory compiler C: Calyx Extension or change to the Calyx IL S: Discussion needed Issues blocked on discussion
Projects
None yet
Development

No branches or pull requests

3 participants