Ensure there's a 16-byte aligned 128-bit int in the heap #50

dra27 · 2020-10-27T17:44:16Z

This sprang from the unexpected discovery in a library that Stdint.Uint32.max_int was -1! This PR consists of 3 sort of related pieces of work:

Long_val does not return long - it returns intnat. OCaml uses long to mean 64-bits (defined in its headers) but C does not. Uint32.max_int now returns 0xFFFFFFFF (and other fixes)
Windows with GCC 9.2 reveals alignment problems with 128-bit numbers which manifests itself by segfaulting on load as the top-level values in Stdint.Uint128 and Stdint.Int128 initialise (see Segfaults with 128-bit integers on Windows #45). I believe the problem is architecture-specific, just being triggered on Windows. It's impossible to store 128 bit values in the OCaml heap and expect them to be 16-byte aligned. Having tried various experiments and concluded that the fallbacks functions (for when compiler support isn't available) are simply too incomplete/buggy, I propose here a halfway house. Values are manipulated as __int128_t and __uint128_t, but are stored as a struct with two 64-bit members. There's obviously a performance impact - some functions may well be faster manipulating the struct directly, others (division, for example) will still be faster this way. This version at least works!
The custom_operations weren't setting up the compare_ext field and also weren't taking advantage of 4.08.0's fixed_length field. I'm not sure if changing fixed_length now means that the library cannot deserialize older values (@stedolan?)

Deals with 128-bit alignment. The value may move both if it is promoted to the major heap and subsequently by the compactor and 16-byte alignment is not guaranteed. This approach allocated a 5 word custom value and stores the 128 bit number in words 1+2 and 4+5. It will always be the case that one is aligned and one is not.

rgrinberg · 2020-10-27T23:03:51Z

Looks good to me. Let's get a second opinion from @rixed

rixed · 2020-10-28T09:02:24Z

I do not understand what is meant by "Stdint.Uint32.max_int is -1". On a ARCH_64 I have Uint32.(max_int |> to_int) = 4294967295. On a 32bits machine I would indeed expect this to be -1.

Regarding the patch, and without spending too much thought on it, since the problem lies in the allocator and since there are only 2 calls to caml_alloc_custom for int128s, I would favor a fix in the allocator, for instance asking for 24 bytes instead of 16 and returning the address that's 16 bytes aligned. I think it would also lead to faster code, as allocation happens probably less often than the getter, and address arithmetic is faster than 128bits arithmetic.

I'd be happy to propose a patch implementing that so we can compare ((un?)fortunately I won't be able to test against the bug as I cannot repro it).

Other than that, don't you think the high and low members of int128_ocaml struct should be swapped in the hope the compiler might optimize the shift away on those architectures where it's safe to do so?

rixed · 2020-10-28T09:33:07Z

a fix in the allocator

Problem is, the allocator initializes the value header itself so we cannot move it to the next aligned address. So that offsetting would have to be done in Int128_val, which I believe would be quite more expensive.

rixed · 2020-10-28T09:51:40Z

This is what I was thinking about: #52
Honestly, I don't think it's any better, but I'm curious if there is any obvious drawback/advantage or any noticeable difference.

dra27 · 2020-10-28T10:17:04Z

I do not understand what is meant by "Stdint.Uint32.max_int is -1". On a ARCH_64 I have Uint32.(max_int |> to_int) = 4294967295. On a 32bits machine I would indeed expect this to be -1.

On a 64-bit Windows system, 0xffffffff gets passed to int_of_uint32 which contains (long)Uint32_val(v). long on Windows is 32-bit (even on a 64-bit system) and so 0xffffffff gets turned into -1 (i.e. 0x7fffffffffffffff).

dra27 · 2020-10-28T10:39:33Z

This is what I was thinking about: #52
Honestly, I don't think it's any better, but I'm curious if there is any obvious drawback/advantage or any noticeable difference.

This doesn't work. The alignment may change both when the value is promoted to the major heap and if/when the compactor runs.

dra27 · 2020-10-28T10:39:36Z

It is not possible to store a 16-byte aligned value in the heap but it would be possible to store 2 16-byte values where one of them is aligned. That may work, given that they're immutable - at the cost of more expensive creation and of course using 40 bytes per int128 instead of 16!

dra27 · 2020-10-28T12:21:37Z

I've split the other two parts off, since they should be entirely "non-controversial"! Pushed now is a revised implementation of the suggestion above. The original struct-based version is in #55. There may be a good Hacker's Delight way of getting 0 from 0 and 3 from 8 for calculating the offset!

I can confirm that both approaches work on Windows with the same 11 tests failing as fail elsewhere. The approach here is certainly slower to create 128 bit numbers but should be faster for reading them than the struct-based approach. Both need benchmarking. My personal intuition is that the overhead of storing the int128 twice and the complicated pointer manipulation to read the aligned version will outweigh the saving over the struct version, but I have no evidence!

At the start of this, I was faced with a library whose initialisations were segfaulting - both of these fix it and allow me to use the 32-bit arithmetic I was actually after! I actually don't have any interest in the 128-bit stuff at the moment, so I'm nearing the end of my time on this, if that's OK 🙂 I am very interested in having a released version of this library which works on Windows (slow is better than not-at-all).

@msprotz - do you have anything which can provide any kind of benchmark for these two? If you're trying it on Windows, I'd pull #53 in as well!

rixed · 2020-10-28T12:54:01Z

This doesn't work. The alignment may change both when the value is promoted to the major heap and if/when the compactor runs.

Very true! My erroneous mental model told me the custom data was allocated in the heap outside of the custom block but it is actually embedded within the custom block so it is indeed subject to relocation with compaction.
This makes this alternative approach rather awful.
Thank you for fixing my bad assumption.

Unless we do run some bench and there is a large (and unexpected) difference in perf, my favorite implementation is therefore #55.

msprotz · 2020-10-28T14:19:33Z

@dra27 unfortunately I don't have anything to benchmark, I'm at the same stage as you: as long as we get something that compiles and doesn't require our setup script to fixup the opam-installed package, I'm a happier man

msprotz · 2020-10-28T14:24:37Z

Concerning two versions vs. struct: I think your struct-based patch is much more readable and easier to maintain in the long run, so I would advocate for merging #53, #55 as soon as possible, and issuing a new release. That would solve everyone's issues quite nicely I think. Thanks for digging into this @dra27! Much appreciated.

rgrinberg · 2020-10-28T18:46:58Z

Do we still need this PR now that #53 and #55 are merged?

dra27 · 2020-10-28T19:04:05Z

It’s an alternative which someone might like to pick up to benchmark... up to you if it wants keeping open or not!

This was referenced Oct 28, 2020

Long_val returns intnat not long (Windows fix) #53

Merged

Update custom_operations initialization #54

Merged

Always store OCaml representation as a struct #55

Merged

dra27 force-pushed the aligned-128 branch from e28ee25 to d3e5fb2 Compare October 28, 2020 12:10

dra27 changed the title ~~128-bit alignment, Windows and custom operations~~ Ensure there's a 16-byte aligned 128-bit int in the heap Oct 28, 2020

dra27 closed this Mar 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure there's a 16-byte aligned 128-bit int in the heap #50

Ensure there's a 16-byte aligned 128-bit int in the heap #50

dra27 commented Oct 27, 2020

rgrinberg commented Oct 27, 2020

rixed commented Oct 28, 2020

rixed commented Oct 28, 2020

rixed commented Oct 28, 2020

dra27 commented Oct 28, 2020 •

edited

Loading

dra27 commented Oct 28, 2020

dra27 commented Oct 28, 2020 •

edited

Loading

dra27 commented Oct 28, 2020

rixed commented Oct 28, 2020

msprotz commented Oct 28, 2020

msprotz commented Oct 28, 2020

rgrinberg commented Oct 28, 2020

dra27 commented Oct 28, 2020

Ensure there's a 16-byte aligned 128-bit int in the heap #50

Ensure there's a 16-byte aligned 128-bit int in the heap #50

Conversation

dra27 commented Oct 27, 2020

rgrinberg commented Oct 27, 2020

rixed commented Oct 28, 2020

rixed commented Oct 28, 2020

rixed commented Oct 28, 2020

dra27 commented Oct 28, 2020 • edited Loading

dra27 commented Oct 28, 2020

dra27 commented Oct 28, 2020 • edited Loading

dra27 commented Oct 28, 2020

rixed commented Oct 28, 2020

msprotz commented Oct 28, 2020

msprotz commented Oct 28, 2020

rgrinberg commented Oct 28, 2020

dra27 commented Oct 28, 2020

dra27 commented Oct 28, 2020 •

edited

Loading

dra27 commented Oct 28, 2020 •

edited

Loading