-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure there's a 16-byte aligned 128-bit int in the heap #50
Conversation
Deals with 128-bit alignment. The value may move both if it is promoted to the major heap and subsequently by the compactor and 16-byte alignment is not guaranteed. This approach allocated a 5 word custom value and stores the 128 bit number in words 1+2 and 4+5. It will always be the case that one is aligned and one is not.
Looks good to me. Let's get a second opinion from @rixed |
I do not understand what is meant by "Stdint.Uint32.max_int is -1". On a ARCH_64 I have Regarding the patch, and without spending too much thought on it, since the problem lies in the allocator and since there are only 2 calls to caml_alloc_custom for int128s, I would favor a fix in the allocator, for instance asking for 24 bytes instead of 16 and returning the address that's 16 bytes aligned. I think it would also lead to faster code, as allocation happens probably less often than the getter, and address arithmetic is faster than 128bits arithmetic. I'd be happy to propose a patch implementing that so we can compare ((un?)fortunately I won't be able to test against the bug as I cannot repro it). Other than that, don't you think the high and low members of int128_ocaml struct should be swapped in the hope the compiler might optimize the shift away on those architectures where it's safe to do so? |
Problem is, the allocator initializes the value header itself so we cannot move it to the next aligned address. So that offsetting would have to be done in Int128_val, which I believe would be quite more expensive. |
This is what I was thinking about: #52 |
On a 64-bit Windows system, 0xffffffff gets passed to |
This doesn't work. The alignment may change both when the value is promoted to the major heap and if/when the compactor runs. |
It is not possible to store a 16-byte aligned value in the heap but it would be possible to store 2 16-byte values where one of them is aligned. That may work, given that they're immutable - at the cost of more expensive creation and of course using 40 bytes per int128 instead of 16! |
I've split the other two parts off, since they should be entirely "non-controversial"! Pushed now is a revised implementation of the suggestion above. The original I can confirm that both approaches work on Windows with the same 11 tests failing as fail elsewhere. The approach here is certainly slower to create 128 bit numbers but should be faster for reading them than the At the start of this, I was faced with a library whose initialisations were segfaulting - both of these fix it and allow me to use the 32-bit arithmetic I was actually after! I actually don't have any interest in the 128-bit stuff at the moment, so I'm nearing the end of my time on this, if that's OK 🙂 I am very interested in having a released version of this library which works on Windows (slow is better than not-at-all). @msprotz - do you have anything which can provide any kind of benchmark for these two? If you're trying it on Windows, I'd pull #53 in as well! |
Very true! My erroneous mental model told me the custom data was allocated in the heap outside of the custom block but it is actually embedded within the custom block so it is indeed subject to relocation with compaction. Unless we do run some bench and there is a large (and unexpected) difference in perf, my favorite implementation is therefore #55. |
@dra27 unfortunately I don't have anything to benchmark, I'm at the same stage as you: as long as we get something that compiles and doesn't require our setup script to fixup the opam-installed package, I'm a happier man |
Concerning two versions vs. struct: I think your struct-based patch is much more readable and easier to maintain in the long run, so I would advocate for merging #53, #55 as soon as possible, and issuing a new release. That would solve everyone's issues quite nicely I think. Thanks for digging into this @dra27! Much appreciated. |
It’s an alternative which someone might like to pick up to benchmark... up to you if it wants keeping open or not! |
This sprang from the unexpected discovery in a library that
Stdint.Uint32.max_int
was-1
! This PR consists of 3 sort of related pieces of work:Long_val
does not returnlong
- it returnsintnat
. OCaml useslong
to mean 64-bits (defined in its headers) but C does not.Uint32.max_int
now returns0xFFFFFFFF
(and other fixes)Stdint.Uint128
andStdint.Int128
initialise (see Segfaults with 128-bit integers on Windows #45). I believe the problem is architecture-specific, just being triggered on Windows. It's impossible to store 128 bit values in the OCaml heap and expect them to be 16-byte aligned. Having tried various experiments and concluded that the fallbacks functions (for when compiler support isn't available) are simply too incomplete/buggy, I propose here a halfway house. Values are manipulated as__int128_t
and__uint128_t
, but are stored as a struct with two 64-bit members. There's obviously a performance impact - some functions may well be faster manipulating the struct directly, others (division, for example) will still be faster this way. This version at least works!custom_operations
weren't setting up thecompare_ext
field and also weren't taking advantage of 4.08.0'sfixed_length
field. I'm not sure if changingfixed_length
now means that the library cannot deserialize older values (@stedolan?)