Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small discriminants are causing bad code generation #10613

Closed
alexcrichton opened this issue Nov 22, 2013 · 11 comments · Fixed by #12207
Closed

Small discriminants are causing bad code generation #10613

alexcrichton opened this issue Nov 22, 2013 · 11 comments · Fixed by #12207
Labels
A-codegen Area: Code generation

Comments

@alexcrichton
Copy link
Member

This code

enum Test<T> {                    
    A(T), B                       
}                                 
struct Node {                     
    value: Test<int>,             
}                                 

#[inline(never)]                  
pub fn foo(n: &mut Node, v: int) {
  n.value = A(v);                 
}                                 

Will generate this IR under -O

define void @_ZN3foo19hbd6078a6918ab2b9as4v0.0E({ i64, %tydesc*, i8*, i8*, i8 }* nocapture readnone, %struct.Node* nocapture, i64) unnamed_addr #1 {
"function top level":                                                                                                                               
  %.sroa.2 = alloca [7 x i8], align 1                                                                                                               
  %.sroa.0.0.idx = getelementptr inbounds %struct.Node* %1, i64 0, i32 0, i32 0                                                                     
  store i8 0, i8* %.sroa.0.0.idx, align 8                                                                                                           
  %.sroa.2.0.idx = getelementptr inbounds %struct.Node* %1, i64 0, i32 0, i32 1, i64 0                                                              
  %.sroa.2.0.idx2 = getelementptr inbounds [7 x i8]* %.sroa.2, i64 0, i64 0                                                                         
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %.sroa.2.0.idx, i8* %.sroa.2.0.idx2, i64 7, i32 1, i1 false)                                             
  %.sroa.23.0.idx = getelementptr inbounds %struct.Node* %1, i64 0, i32 0, i32 2, i64 0                                                             
  store i64 %2, i64* %.sroa.23.0.idx, align 8                                                                                                       
  ret void                                                                                                                                          
}                                                                                                                                                   

Yes, that is a memcpy of length 7. I have no idea why this is getting generated, I'm trying to narrow down which pass is generating this, but that is absolutely awful code to generate for such a simple operation. I would expect this function to generate two stores, one of the discriminant and one of the value.

Now change the program to have #[repr(uint)] in front of the enum, and we get this code:

define void @_ZN3foo19hbd6078a6918ab2b9as4v0.0E({ i64, %tydesc*, i8*, i8*, i8 }* nocapture readnone, %struct.Node* nocapture, i64) unnamed_addr #0 {
"function top level":                                                                                                                               
  %.sroa.0.0.idx = getelementptr inbounds %struct.Node* %1, i64 0, i32 0, i32 0                                                                     
  store i64 0, i64* %.sroa.0.0.idx, align 8                                                                                                         
  %.sroa.2.0.idx2 = getelementptr inbounds %struct.Node* %1, i64 0, i32 0, i32 2, i64 0                                                             
  store i64 %2, i64* %.sroa.2.0.idx2, align 8                                                                                                       
  ret void                                                                                                                                          
}                                                                                                                                                   
@alexcrichton
Copy link
Member Author

cc @jld

@alexcrichton
Copy link
Member Author

I would expect similar code generation with #[repr(uint)] as well as with no #[repr] annotation. Is this because we're attempting to zero-out the discriminant padding?

@pcwalton
Copy link
Contributor

Look at the asm. This may actually not be so bad unless the memcpy is getting lowered to a true libc memcpy.

@pcwalton
Copy link
Contributor

Here is the asm. It is not actually generating a call to memcpy:

                                           ; Basic Block Input Regs: rsp -  Killed Regs: <nothing>
                                           ; 
                                           ; Section __text
                                           ; 
                                           ; Range 0x100000db0 - 0x100000ea1 (241 bytes)
                                           ; File offset 3504 (241 bytes)
                                           ; Flags : 0x80000400
                                           ; 
                                                __ZN3foo18h0e119d36d38e5ceat4v0.0E:        // foo::h0e119d36d38e5ceat::v0.0
    0000000100000db0 65483B242530030000              cmp        rsp, qword [gs:0x330]         ; XREF=0x100000e45
    0000000100000db9 771A                            jnbe       0x100000dd5
                                           ; Basic Block Input Regs: rsp -  Killed Regs: r10 r11
    0000000100000dbb 49BA0F00000000000000            mov        r10, 0xf
    0000000100000dc5 49BB0000000000000000            mov        r11, 0x0
    0000000100000dcf E8C0000000                      call       ___morestack
    0000000100000dd4 C3                              ret        
                                           ; Basic Block Input Regs: rbp -  Killed Regs: rax rsp rbp rdi
    0000000100000dd5 55                              push       rbp                           ; XREF=0x100000db9
    0000000100000dd6 4889E5                          mov        rbp, rsp
    0000000100000dd9 4883EC07                        sub        rsp, 0x7
    0000000100000ddd C60700                          mov        byte [ds:rdi], 0x0
    0000000100000de0 8A45FF                          mov        al, byte [ss:rbp-0x0+var_m1]
    0000000100000de3 884707                          mov        byte [ds:rdi+0x7], al
    0000000100000de6 668B45FD                        mov        ax, word [ss:rbp-0x0+var_m3]
    0000000100000dea 66894705                        mov        word [ds:rdi+0x5], ax
    0000000100000dee 8B45F9                          mov        eax, dword [ss:rbp-0x0+var_m7]
    0000000100000df1 894701                          mov        dword [ds:rdi+0x1], eax
    0000000100000df4 48C7470801000000                mov        qword [ds:rdi+0x8], 0x1
    0000000100000dfc 4883C407                        add        rsp, 0x7
    0000000100000e00 5D                              pop        rbp
    0000000100000e01 C3                              ret        

@huonw
Copy link
Member

huonw commented Nov 23, 2013

@alexcrichton the equivalent to no #[repr] in this case would be #[repr(u8)], fwiw.

@alexcrichton
Copy link
Member Author

I agree that the assembly doesn't look that bad, but this is still completely unnecessary, no? I was looking at assembly with huge waterfalls of moving al/ax/eax/rax over and over again.

I don't think this is a closed bug because this is a real problem. This may not be a high priority bug, but our assembly is certainly doing things that it doesn't need to do.

@alexcrichton alexcrichton reopened this Nov 23, 2013
@huonw
Copy link
Member

huonw commented Nov 23, 2013

Is this #6736 ?

@pcwalton
Copy link
Contributor

Aha, I see the problem:

%"enum.Test<int>" = type { i8, [7 x i8], [1 x i64] }

SROA happened, split up our struct into 3 elements, one of which is the padding, and stored the padding. The solution is to add tbaa.struct, like #6736. Closing again as a dupe.

@alexcrichton
Copy link
Member Author

It may be, but I don't know enough about LLVM's optimizations to know whether that would help or not.

@jld
Copy link
Contributor

jld commented Nov 24, 2013

It would be possible to make the type { [8 x i8], [1 x i64] }, and that might make the code seen here less bad, but I don't know if LLVM would be less able to optimize code that accesses the discriminant in that case.

It would also be possible to make type_of more complex and have it avoid filling in alignment padding for bytes where none of the variants have fields (so Option<i64>{ i8, i64 }, Either<i8, i64>{ i8, i8, i64 }). We might need this information anyway for tbaa.struct.

@pcwalton pcwalton reopened this Feb 4, 2014
@pcwalton
Copy link
Contributor

pcwalton commented Feb 5, 2014

Patch has been submitted to LLVM upstream.

@huonw huonw mentioned this issue Feb 12, 2014
@bors bors closed this as completed in 92c5738 Feb 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants