Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float128 calling convention and alignment #4

Closed
simonbyrne opened this issue Feb 18, 2019 · 14 comments
Closed

Float128 calling convention and alignment #4

simonbyrne opened this issue Feb 18, 2019 · 14 comments

Comments

@simonbyrne
Copy link
Member

simonbyrne commented Feb 18, 2019

At the moment we fake the calling convention on Mac and Linux by reinterpreting as NTuple{2,VecElement{Float64}}. Unfortunately this doesn't seem to work on Windows.

Upstream:

Also mentioned in

@simonbyrne simonbyrne changed the title Float128 alignment Float128 calling convention and alignment Feb 26, 2019
@simonbyrne
Copy link
Member Author

cc: @vchuravy @vjtnash @yuyichao

@yuyichao
Copy link

Just a thought, have you try to add a x86_vectorcallcc on the ccall? Tht seems to produce the correct calling convention assuming you only have fp128.

@simonbyrne
Copy link
Member Author

how do I do that? is that an undocumented calling convention?

@simonbyrne
Copy link
Member Author

Ah, we don't seem to support it. Should I open an issue?

@yuyichao
Copy link

I don't think our ccall supports it and I'm not sure where it's documented in LLVM but you can use this trick

@yuyichao
Copy link

yuyichao commented Feb 26, 2019

Note that I think you won't be able to use it in ccall directly even if we support it since supporting it in ccall would probably need to come with automatic name mangling support...

@simonbyrne
Copy link
Member Author

I see, thanks.

@RalphAS
Copy link
Collaborator

RalphAS commented Mar 4, 2019

I tried some experiments following @yuyichao 's suggestion, for example

function baz(x::Float128, y::Float128)
    r = Base.llvmcall("""%f = inttoptr i64 %2 to <2 x double> (<2 x double>, <2 x double>)*
                    %vv = call x86_vectorcallcc <2 x double> %f(<2 x double> %0, <2 x double> %1)
                    ret <2 x double> %vv""",
                 Cfloat128, Tuple{Cfloat128,Cfloat128,Ptr{Cvoid}},
                 x.data, y.data, cglobal((:__addtf3,quadoplib)))
    return Float128(r)
end

Replacing methods in Quadmath with patterns like this works fine for Linux and OSX. It runs on Windows (no segfaults) but produces incorrect answers, apparently because the Windows version of libgcc_s doesn't return Float128 results in xmm0 as expected from the ABI. (AFAICT the value of xmm0 is preserved across the call.)

@yuyichao
Copy link

yuyichao commented Mar 4, 2019

Come to think about it, depending on if LLVM and GCC agrees on the calling convention, if you are already using llvmcall, you can probably directly use fp128 in it ;-p....

Other than that, I have no idea what calling convention gcc uses..... It is worth noting that GCC does NOT use the same vector calling convention as clang or msvc. (I didn't realize you are calling gcc compiled library....) I just finished dealing with a similar issue with vector calls on windows so I'm pretty positive on that....... You might have to check the assembly code to figure out what calling convention gcc is using..... =(

@simonbyrne
Copy link
Member Author

(I didn't realize you are calling gcc compiled library....)

we're calling into libquadmath, which is bundled as part of the gcc runtime (which we ship with Julia apparently)

@simonbyrne
Copy link
Member Author

Replacing methods in Quadmath with patterns like this works fine for Linux and OSX. It runs on Windows (no segfaults) but produces incorrect answers, apparently because the Windows version of libgcc_s doesn't return Float128 results in xmm0 as expected from the ABI. (AFAICT the value of xmm0 is preserved across the call.)

Not sure if it makes a difference, but did you change how Cfloat128 is defined on Windows to be the same as linux and mac?

@RalphAS
Copy link
Collaborator

RalphAS commented Mar 4, 2019

Yes, I changed the Float128/Cfloat128 structure for Windows to resemble the others.

Note that the Quadmath conversions, comparisons, and arithmetic are passed to libgcc_s (not libquadmath) in Windows and Linux.

I was surprised to learn that LLVM fp128 instructions are also compiled into libgcc_s calls (according to code_native). My original thought was that "direct" fp128 IR might help the Windows issue, but that turned out to be pointless (other than helping me to learn LLVM).

FWIW, I found that IR using fp128 is rather fragile, in that it can cause LLVM to crash Julia or go into a coma (presumably a recurrent loop). That's why the above example just uses <2 x double>.

@RalphAS
Copy link
Collaborator

RalphAS commented Mar 5, 2019

It seems that we have overthought this. The Windows libraries treat Float128 as a struct, so pointers should be used. I've started on this in PR #16. (Perhaps someone should notify the LLVM developers.)

@simonbyrne
Copy link
Member Author

Fixed by #16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants