-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile sync without sync reg #1415
Conversation
Seems like this PR still has conflicts with |
@rachitnigam fixed merging conflict. Please take a look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of thoughts on reorganizing some code. High-level suggestions:
- Make
BarrierMap
a newtype and define methods on it instead of doing thetrait
trick - One of the big pitches for this was to support multiple threads (more than 2), so let's make sure that this PR can actually do that
I think this will make a cool case study of the paper but we need to make sure that the baseline implementation is good enough for a reasonable comparison.
Changes look good! Looks like two tests are failing because #1422 improved the cycle count. Merge at your discretion! Great job! We should make this the default |
Fixed the pass
compile-sync-without-sync-reg
The original problem was simply that the
clear
group did not have adone
condition, if I ranvalidate
sooner, this could be immediately discovered.Compiles @sync without use of std_sync_reg
Upon encountering @sync, it first instantiates a
std_reg(1)
for each thread(bar
) and astd_reg(1)
for each barrier (s
)It then continuously assigns the value of (
s.in
) to 1'd1 guarded by theexpression that all values of
bar
for threads under the barrier areset to 1'd1
Then it replaces the @sync control operator with
barrier
simply sets the value ofbar
to 1'd1 and then waits fors.out
to be upclear
resets the value ofbar
to 1'd0 for reuse of barrierUsing this method, each thread only incurs 3 cycles of latency overhead for the barrier, and we theoretically won't have a limit for number of threads under one barrier