Makefile consolidation and LLVM bug fix #267
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The Makefile changes are just to make it easier (more obvious) to handle cases where OpenBLAS doesn't detect the processor type.
The change to alloc.c works around an LLVM optimization bug on Apple's default complier. By re-ordering the layout of the byte cache to be dealing with unsigned indicies, we can safely ignore the sign bit later.
With optimizations on my 64-bit machine, the code originally generated was: (note the movslq instruction which is a 32 to 64 bit sign extend move operation, versus gcc which produces the correct 8 to 64 bit sign extend move instruction movsbq)
with gcc:
0x0000000100046ac0 <+0000> push %rbp
0x0000000100046ac1 <+0001> mov %rsp,%rbp
0x0000000100046ac4 <+0004> movsbq %dil,%rdi
0x0000000100046ac8 <+0008> lea 0x7055d1(%rip),%rax # 0x10074c0a0 <boxed_int8_cache+1024>
0x0000000100046acf <+0015> mov (%rax,%rdi,8),%rax
0x0000000100046ad3 <+0019> leaveq
0x0000000100046ad4 <+0020> retq
0x0000000100046ad5 <+0021> nopl 0x0(%rax,%rax,1)
0x0000000100046ada <+0026> nopw 0x0(%rax,%rax,1)
with llvm:
0x00000001000367c0 <+0000> push %rbp
0x00000001000367c1 <+0001> mov %rsp,%rbp
0x00000001000367c4 <+0004> movslq %edi,%rax
0x00000001000367c7 <+0007> lea 0x6e83f2(%rip),%rcx # 0x10071ebc0 <boxed_int8_cache>
0x00000001000367ce <+0014> mov 0x400(%rcx,%rax,8),%rax
0x00000001000367d6 <+0022> pop %rbp
0x00000001000367d7 <+0023> retq
0x00000001000367d8 <+0024> nopl 0x0(%rax,%rax,1)