Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

"It is illegal to set a Read-After-Write dependency on a memory store op", on unmodified sass #10

Open
hughperkins opened this issue May 23, 2016 · 12 comments

Comments

@hughperkins
Copy link

hughperkins commented May 23, 2016

using microbench, with .cu modified to be simply:

extern "C" __global__ void  microbench(int *out, int *clocks, int *in)
{
  out[0] = 7.0f;
  out[2] = 5.0f;
  clocks[1] = 9.0f;
}

.cpp is modified slightly, since I'm on a 5.0, so I hacked microbench.cpp to accept this. It could be this is root cause for the issue in this issue?

        if (major >= 5 && minor >= 0)
        {

Then I do:

set -e

nvcc -l cuda -o microbench microbench.cpp
nvcc -arch sm_50 -cubin microbench.cu
maxas.pl -e microbench.cubin > microbench.sass

maxas.pl -i microbench.sass microbench.cubin
./microbench

... however it fails on the maxas.pl -i line with:

It is illegal to set a Read-After-Write dependency on a memory store op (store ops don't write to a register)
STG.E [R4], R6;

The .sass generated by maxas.pl -i looks like:

--:-:-:-:6      MOV R1, c[0x0][0x20];
--:-:-:-:1      MOV R0, param_1[0];
--:-:-:-:1      MOV R2, param_0[0];
--:-:-:-:4      MOV R3, param_0[1];
--:-:-:-:1      IADD32I R4.CC, R0, 0x4;
--:-:-:-:2      MOV32I R0, 0x7;
20:3:1:Y:7      IADD.X R5, RZ, param_1[1];
--:1:1:Y:4      STG.E [R4], R6;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;

Is this:

  • because I'm using 5.0 system, and maxas only supports 5.2?
  • a bug in maxas.pl -e (since I cant see the 9.0f constant anywhere in the sass?, I'm immediately suspicious?)
  • because I'm using not the latest versoin of maxas, which is eg in neon repo?
  • something I'm doing wrong (other than the cc5.0 thing) ?
  • something else?
@hughperkins
Copy link
Author

Update: the sass generated by cuobjdump --dump-sass does include the 0x9 constant, looks fairly different:

    code for sm_50
        Function : microbench
    .headerflags    @"EF_CUDA_SM50 EF_CUDA_PTX_SM(EF_CUDA_SM50)"
                                                                           /* 0x001fc400fe2007f6 */
        /*0008*/                   MOV R1, c[0x0][0x20];                   /* 0x4c98078000870001 */
        /*0010*/                   MOV R0, c[0x0][0x148];                  /* 0x4c98078005270000 */
        /*0018*/                   MOV R2, c[0x0][0x140];                  /* 0x4c98078005070002 */
                                                                           /* 0x001fc800fe2007f4 */
        /*0028*/                   MOV R3, c[0x0][0x144];                  /* 0x4c98078005170003 */
        /*0030*/                   IADD32I R4.CC, R0, 0x4;                 /* 0x1c10000000470004 */
        /*0038*/                   MOV32I R0, 0x7;                         /* 0x010000000077f000 */
                                                                           /* 0x001fc0001e4007f0 */
        /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */
        /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */
        /*0058*/         {         MOV32I R6, 0x9;                         /* 0x010000000097f006 */
        /*0068*/                   STG.E [R2+0x8], R7;        }            /* 0x0007c400fc4000f1 */
                                                                           /* 0xeedc200000870207 */
        /*0070*/                   IADD.X R5, RZ, c[0x0][0x14c];           /* 0x4c1008000537ff05 */
        /*0078*/                   STG.E [R4], R6;                         /* 0xeedc200000070406 */
                                                                           /* 0x001f8000ffe007ff */
        /*0088*/                   EXIT;                                   /* 0xe30000000007000f */
        /*0090*/                   BRA 0x90;                               /* 0xe2400fffff87000f */
        /*0098*/                   NOP;                                    /* 0x50b0000000070f00 */
                                                                           /* 0x001f8000fc0007e0 */
        /*00a8*/                   NOP;                                    /* 0x50b0000000070f00 */
        /*00b0*/                   NOP;                                    /* 0x50b0000000070f00 */
        /*00b8*/                   NOP;                                    /* 0x50b0000000070f00 */
        ...........................

I'm tentatively convinced it's something to do with the 5.0 thing?

@hughperkins
Copy link
Author

Oh... I guess that the 5.0 code is interpreted as control codes, and being absorbed into 20:3:1:Y:7, rather than becoming:

        /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */
        /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */
        /*0058*/         {         MOV32I R6, 0x9;  

Fair analysis? Surmountable? Only option is either to rewrite maxas to support 5.0 or get a 5.2 card?

@hughperkins
Copy link
Author

(by the way, in case useful, I get the following warnings when running maxas:

Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/ { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?}/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 239.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^(?^:\@(?<predNot>\!)?P(?<predNum>[0-6]) )?DEPBAR(?^: { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?});/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 275.

Maybe it is a perl versioning issue? I'm using:

$ perl --version

This is perl 5, version 22, subversion 1 (v5.22.1) built for x86_64-linux-gnu-thread-multi
(with 58 registered patches, see perl -V for more detail)

@hughperkins
Copy link
Author

Seems maybe an issue with the regex on line 1521 of MaxAsGrammar.pm?

Hack this file to put some prints:

    print "psl check match\n";
    if ($line =~ m"^\s+/\*(?<num>[0-9a-f]+)\*/\s+$InstRe\s+/\* (?<code>0x[0-9a-f]+)"o)
    {
        print "   ... matched!\n";

Output:

psl check match
   ... matched!
   ctrl inst:
$VAR1 = {
          'code' => '5519169589765144579',
          'ins' => 'MOV R3, c[0x0][0x144];',
          'inst' => 'MOV R3, c[0x0][0x144];',
          'num' => 40,
          'op' => 'MOV',
          'pred' => undef
        };
ctrl 2033
  ctrl line         /*0030*/                   IADD32I R4.CC, R0, 0x4;                 /* 0x1c10000000470004 */

line before processSassLine         /*0030*/                   IADD32I R4.CC, R0, 0x4;                 /* 0x1c10000000470004 */

psl check match
   ... matched!
   ctrl inst:
$VAR1 = {
          'code' => '2022116232694005764',
          'ins' => 'IADD32I R4.CC, R0, 0x4;',
          'inst' => 'IADD32I R4.CC, R0, 0x4;',
          'num' => 48,
          'op' => 'IADD32I',
          'pred' => undef
        };
ctrl 2034
  ctrl line         /*0038*/                   MOV32I R0, 0x7;                         /* 0x010000000077f000 */

line before processSassLine         /*0038*/                   MOV32I R0, 0x7;                         /* 0x010000000077f000 */

psl check match
   ... matched!
   ctrl inst:
$VAR1 = {
          'code' => '72057594045788160',
          'ins' => 'MOV32I R0, 0x7;',
          'inst' => 'MOV32I R0, 0x7;',
          'num' => 56,
          'op' => 'MOV32I',
          'pred' => undef
        };
# line                                                                            /* 0x001fc0001e4007f0 */

ctrl 2032
  ctrl line         /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */

line before processSassLine         /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */

psl check match
   no match
ctrl 242
  ctrl line         /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */

line before processSassLine         /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */

psl check match
   no match
ctrl 2032
  ctrl line         /*0058*/         {         MOV32I R6, 0x9;                         /* 0x010000000097f006 */

Repeated displays of no match suggest the regex is failing for some reason?

@hughperkins
Copy link
Author

Oh... I reckon it doesnt like the { and } signs in the cuobjdump output:

        /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */
        /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */
        /*0058*/         {         MOV32I R6, 0x9;                         /* 0x010000000097f006 */
        /*0068*/                   STG.E [R2+0x8], R7;        }            /* 0x0007c400fc4000f1 */

@hughperkins
Copy link
Author

Adding

            $line =~ s/{//g;
            $line =~ s/}//g;

to MaxAs.pl, just before my $inst = processSassLine($line) or next CTRL; makes the generated .sass now contain the missing instructions:

--:-:-:-:6      MOV R1, c[0x0][0x20];
--:-:-:-:1      MOV R0, param_1[0];
--:-:-:-:1      MOV R2, param_0[0];
--:-:-:-:4      MOV R3, param_0[1];
--:-:-:-:1      IADD32I R4.CC, R0, 0x4;
--:-:-:-:2      MOV32I R0, 0x7;
--:-:-:-:0      MOV32I R7, 0x5;
--:1:-:-:2      STG.E [R2], R0;
--:-:-:-:0      MOV32I R6, 0x9;
20:3:1:Y:7      IADD.X R5, RZ, param_1[1];
--:1:1:Y:4      STG.E [R4], R6;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;

... but same error about STG.E as before:

It is illegal to set a Read-After-Write dependency on a memory store op (store ops don't write to a register)
STG.E [R4], R6;

I'll submit a PR for the braces though

@hughperkins
Copy link
Author

Mmmmm, this is interesting. There should be one control code followed by three operations right? But for the short example, with the parentheses in hte output, this rule seems to be broken temporarily:

cuobjdump --dump-sass output:

                                                                           /* 0x001fc0001e4007f0 */
        /*0048*/         {         MOV32I R7, 0x5;                         /* 0x010000000057f007 */
        /*0050*/                   STG.E [R2], R0;        }                /* 0xeedc200000070200 */
        /*0058*/         {         MOV32I R6, 0x9;                         /* 0x010000000097f006 */
        /*0068*/                   STG.E [R2+0x8], R7;        }            /* 0x0007c400fc4000f1 */
                                                                           /* 0xeedc200000870207 */
        /*0070*/                   IADD.X R5, RZ, c[0x0][0x14c];           /* 0x4c1008000537ff05 */
        /*0078*/                   STG.E [R4], R6;                         /* 0xeedc200000070406 */
                                                                           /* 0x001f8000ffe007ff */
        /*0088*/                   EXIT;                                   /* 0xe30000000007000f */
        /*0090*/                   BRA 0x90;                               /* 0xe2400fffff87000f */
        /*0098*/                   NOP;                                    /* 0x50b0000000070f00 */

4 contiguous non-control lines, hten a control, then only 2 control-lines. Maybe the parenthesis sort of moves around the contorl codes, so that the control code after the block of 4 actually applies retroactively to the previous line???

@hughperkins
Copy link
Author

Hmmm... except that... 0xeedc200000870207 looks not like a control code:

$ python ../printcode.py 0xeedc200000870207
stall 7 thisyield 0 write 0 read 2 watdb 32
32:2:0:0:7
stall 4 thisyield 0 write 0 read 0 watdb 0
0:0:0:0:4
stall 8 thisyield 0 write 0 read 7 watdb 54
54:7:0:0:8

... but if we use the code from the previous line instead, ie 0x0007c400fc4000f1, that looks remarkably control-code like, and stays consistent with the 1:3 rule:

$ python ../printcode.py 0x0007c400fc4000f1
stall 1 thisyield 1 write 7 read 0 watdb 0
0:0:7:1:1
stall 2 thisyield 0 write 7 read 7 watdb 0
0:7:7:0:2
stall 1 thisyield 1 write 7 read 1 watdb 0
0:1:7:1:1

Hypothesis: in the presence of parantheses one operation and a control code can be swapped in the human-readable assembler output, in the center column, but continue in the same 1:3 order as normal, in the code output in the right-hand column.

@scott-gray
Copy link
Collaborator

To fix this, I was planning on using the output from nvdisasm -raw and pulling the control codes directly from the binary cubin. But in the meantime I'm just using the nvdisasm from cuda 6.5 (cuobjdump calls nvdisasm internally). You can still use cuda 7.5 you just need to overwrite the new version of that file with the old.

@hughperkins
Copy link
Author

Fixed :-)

ubuntu@peach:~/git/maxas/microbench$ ./run-sass.sh 
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/ { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?}/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 239.
Unescaped left brace in regex is deprecated, passed through in regex; marked by <-- HERE in m/^(?^:\@(?<predNot>\!)?P(?<predNum>[0-6]) )?DEPBAR(?^: { <-- HERE (?<db5>5)?,?(?<db4>4)?,?(?<db3>3)?,?(?<db2>2)?,?(?<db1>1)?,?(?<db0>0)?});/ at /usr/local/share/perl/5.22.1/MaxAs/MaxAsGrammar.pm line 275.
Kernel: microbench, Instructions: 0, Register Count: 8, Bank Conflicts: 0, Reuse: 0.0% (0/1)
Using: Id:0 GeForce 940M (5.0)

b:00 w:000 t:0000 l:00 clocks:00000000 out:00000007
b:00 w:001 t:0032 l:00 clocks:00000000 out:00000000
b:00 w:002 t:0064 l:00 clocks:00000000 out:00000000
b:00 w:003 t:0096 l:00 clocks:00000000 out:00000000
average: 0.000, min 0, max: 0

@hughperkins
Copy link
Author

(Oh, hmmm, might be a slight buggette in the loop definition )

@hughperkins
Copy link
Author

(should be ok now though perhaps?)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants