-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more memory primitives #1038
Comments
Thanks for starting this @andrewb1999! I think short term it makes sense to add these things as individual primitives. I'm going to loop in @sampsyo and @EclecticGriffin on the discussion here too but I think in general, I'm expecting the following flow:
Unfortunately, this still runs into the problem of defining "one primitive for every combination of address count, port semantics, etc." However, at least this way, the process does not committing a billion primitives into the repo and instead only generates them on demand. There is also @sampsyo proposal for Modules in Calyx that could be a possible solution but will require more work: #419 |
This is really exciting! I would love to chat more about this; it seems potentially really useful. The vision exactly as @andrewb1999 & @rachitnigam lay it out here sounds perfect to me. To summarize, the idea is that we will have a trillion Calyx-exposed Verilog primitives for every combination of parameters. This would be intensely painful to code up by hand, so instead this will be a generator. You specify the number of ports, the memory geometry, latencies, etc., and it generates a Calyx declaration and Verilog implementation for you. We'll come up with a way of specifying an entire range of such parameters to generate a whole swath of memory primitives at once—corresponding, for example, to all the memory configuration available on a given family of FPGAs. @rachitnigam also makes a really good point about the connection to #419. This generator will be useful without that "modules" concept, but it would be a perfect use case for it in some hypothetical future after the basic version works. I also want to bring up another farther-future idea that could build on this foundation: interfaces to off-chip memories. We could conceivably use a similar framework to this one to expose DRAM, HBM, etc. within Calyx. But obviously no need to worry about that for now; just covering on-chip memories (BRAMs, LUTRAMs, and UltraRAMs) is plenty for this stage. |
Sounds great. A few more questions I have:
|
I think the "JIT" technique would work best, especially because One other thing that occurs to me is that each program will have a particular set of memories it defines with particular WIDTH and # of ports, etc. I wonder if it would be possible to represent each such memory as a plain Calyx component. Differently said, it would be worth figuring out what the minimal set of primitives needed for building such memories would be. In the case where each program truly needs a different, parameterizable memory, we will need to use |
Indeed, good questions. Building off of @rachitnigam's answer, here's one plausible way to draw a roadmap:
Anyway, up to you of course! But that somewhat decoupled generator tool thing could be a useful way to keep the problem simple at first… As far as where the code lives, anything is 100% fine with me, but we'd be happy to put it in this monorepo! |
Thanks @sampsyo! I was actually in the middle of typing out some more questions but that answers a lot of them. That plan sounds like a reasonable method to move forward for now. I also thought some about @rachitnigam's point. I think latencies can be added by a component that wraps some generated primitive and adds std_reg where necessary. I think vivado will infer when these registers can be moved into the BRAM hardware itself when applicable. If we know exactly how the synthesis tool converts multidimensional addresses, we could also implement that this conversion to single dimension memories in Calyx. I think it makes sense to want to implement as much as possible in Calyx, but I wonder if this will increase the complexity unnecessarily (need to generate a primitive and then a component that wraps it). I also worry that these components that wrap primitives will be too fragile and depend on the specific synthesis tool being used. We already have to worry about fragile inference semantics within the verilog, but adding another layer seems like another thing that could break the inference. |
Very much in agreement with this:
That is, I'd err on the side of putting stuff in Calyx-land, except when that seems onerous and annoying, in which case practicality prevails. |
@andrewb1999 Anything we need to do In the repository for this issue? |
Here is a prototype of the memory primitive generation stuff: https://github.com/andrewb1999/calyx-memgen-prototype. I haven't implemented everything yet, specifically rams with a latency greater than 1 and multidimensional rams, but this should be enough to test simple dual port semantics. I think the primary thing left here before dual port memories work fully is supporting multiple static paths through a component/primitive. It's unrealistic to have combinational reads from a memory so we need a way to describe the read_latency such that calyx can optimize these into a static fsm (extremely important for performance reasons). For now I'll try to write some tests for these memories and ensure everything works properly. |
Also, is there some way for fud to include external verilog? (i.e. the verilog definition of the primitive) If not, that's something we should look at adding. |
Awesome!!
So the We can revisit this choice if that makes certain things easier for you. |
@andrewb1999 if its okay with you, I'm going to close this issue for now. We've added a new primitive for sequential reads (#1145) and your compiler can generate new memories using the external stuff |
Just wanted to start some discussion about adding more memory primitives to Calyx. I will be putting development effort in here, but want some feedback before I really get started. The main additions I want to make are memories with multiple ports (simple dual port, true dual port, etc.), memories with variable latency, and memories that map to different primitives (BRAM, LUTRAM, UltraRAM).
Obviously dual port memories need separate primitives as they have different port configurations, but it would be good to limit the number of new primitives somehow. It seems like bad form to have a separate primitive for every combination of address count, port semantics, latency, and primitive.
Looking forward to any ideas on how to best implement this.
The text was updated successfully, but these errors were encountered: