Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frep-loop: PHI problem with reductions #12

Open
huettern opened this issue Jun 21, 2021 · 0 comments
Open

frep-loop: PHI problem with reductions #12

huettern opened this issue Jun 21, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@huettern
Copy link

For parallel reductions where the accumulators are all initialized to the same value, the PHI nodes are merged into the instructions and the reduction is made on the register of initial value, instead on "new" registers

  for(unsigned i=0; i < MATSIZE; ++i) {
    #pragma unroll 1
    for(unsigned j=0; j < MATSIZE; j+=UNROLL) {
      register double acc[UNROLL];
      
      #pragma unroll
      for (int u = 0; u < UNROLL; ++u) acc[u] = 0.0; // <- All is fine if this is e.g. c[i*MATSIZE+j+u];
      
      #pragma frep infer
      for(unsigned k=0; k < MATSIZE; ++k) {
        #pragma unroll
        for (int u = 0; u < UNROLL; ++u)
        {
          acc[u] += __builtin_ssr_pop(0)*__builtin_ssr_pop(1);
        }
      }
      
      #pragma unroll
      for (int u = 0; u < UNROLL; ++u) c[i*MATSIZE+j+u] = acc[u];
    }
  }

With 0.0 this results in the wrong assembly

fmadd.d	ft5, ft1, ft0, ft3
fmadd.d	ft6, ft1, ft0, ft3
fmadd.d	ft7, ft1, ft0, ft3
# ...

For inididual initial values (c[i*MATSIZE+j+u], the problem disappears

fmadd.d	ft3, ft1, ft0, ft3
fmadd.d	ft4, ft1, ft0, ft4
fmadd.d	ft5, ft1, ft0, ft5
fmadd.d	ft6, ft1, ft0, ft6
@huettern huettern added the bug Something isn't working label Jun 21, 2021
SamuelRiedel pushed a commit that referenced this issue Nov 1, 2022
We experienced some deadlocks when we used multiple threads for logging
using `scan-builds` intercept-build tool when we used multiple threads by
e.g. logging `make -j16`

```
(gdb) bt
#0  0x00007f2bb3aff110 in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f2bb3af70a3 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f2bb3d152e4 in ?? ()
#3  0x00007ffcc5f0cc80 in ?? ()
#4  0x00007f2bb3d2bf5b in ?? () from /lib64/ld-linux-x86-64.so.2
#5  0x00007f2bb3b5da27 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007f2bb3b5dbe0 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f2bb3d144ee in ?? ()
#8  0x746e692f706d742f in ?? ()
#9  0x692d747065637265 in ?? ()
#10 0x2f653631326b3034 in ?? ()
#11 0x646d632e35353532 in ?? ()
#12 0x0000000000000000 in ?? ()
```

I think the gcc's exit call caused the injected `libear.so` to be unloaded
by the `ld`, which in turn called the `void on_unload() __attribute__((destructor))`.
That tried to acquire an already locked mutex which was left locked in the
`bear_report_call()` call, that probably encountered some error and
returned early when it forgot to unlock the mutex.

All of these are speculation since from the backtrace I could not verify
if frames 2 and 3 are in fact corresponding to the `libear.so` module.
But I think it's a fairly safe bet.

So, hereby I'm releasing the held mutex on *all paths*, even if some failure
happens.

PS: I would use lock_guards, but it's C.

Reviewed-by: NoQ

Differential Revision: https://reviews.llvm.org/D118439

(cherry picked from commit d919d02)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant