-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General discussion about XXH3 #175
Comments
Canonical representation : for the 64-bit variant, canonical representation is the same as Nothing is wrong with big endian output. I suggest for
and
I say 2x big endian. That would be easiest to parse (in the case of being in a text file), fscanf(hashfile, "%016llx%016llx-3", &hashLo, &hashHi); and the easiest to output: printf("%016llx%016llx-3\n", hashLo, hashHi); Similar to how my XXH32a's canonical representation ended in |
/* pointer or value? also, should we perhaps give a saturated subtraction for sorting
* functions that check values? */
XXH_PUBLIC_API int
XXH128_hash_cmp(const XXH128_hash_t *a, const XXH128_hash_t *b)
{
if (a->ll2 == b->ll2) {
if (a->ll1 == b->ll1) {
return 0;
} else if (a->ll1 < b->ll1) {
return -1;
} else {
return 1;
}
}
if (a->ll2 < b->ll2) {
return -1;
} else {
return 1;
}
}
/* xxhash.h */
#ifdef __cplusplus
static inline bool operator==(const XXH128_hash_t &a, const XXH128_hash_t &b) {
return XXH128_hash_cmp(&a, &b) == 0;
}
static inline bool operator!=(const XXH128_hash_t &a, const XXH128_hash_t &b) {
return XXH128_hash_cmp(&a, &b) != 0;
}
static inline bool operator>(const XXH128_hash_t &a, const XXH128_hash_t &b) {
return XXH128_hash_cmp(&a, &b) > 0;
}
static inline bool operator<(const XXH128_hash_t &a, const XXH128_hash_t &b) {
return XXH128_hash_cmp(&a, &b) < 0;
}
static inline bool operator<=(const XXH128_hash_t &a, const XXH128_hash_t &b) {
return XXH128_hash_cmp(&a, &b) <= 0;
}
static inline bool operator>=(const XXH128_hash_t &a, const XXH128_hash_t &b) {
return XXH128_hash_cmp(&a, &b) >= 0;
}
#endif /* __cplusplus */ |
That's a great idea !
I prefer by value when structure is of limited size. |
Regarding length == 0 return: I'd go for returning the seed (even if it is 0) instead of always returning 0. If one likes to differ the hashes of the same input data by providing a seed, you get still different hashes even if length == 0. |
Pointer or value BTW: Why "ll1" and "ll2", IMHO it should be suffixed by "h" and "l". |
regarding |
It doesn't return the lower 64 bits, it adds the lower bits to the higher bits. This is pretty much the best way of doing it. You can see it in the GCC extension code: __uint128_t lll = (__uint128_t)ll1 * ll2;
return (U64)lll + (U64)(lll >> 64); or if we break it up: __uint128_t ll1_u128 = (__uint128_t) ll1; // cast to __uint128_t
__uint128_t ll2_u128 = (__uint128_t) ll2; // promoted
__uint128_t lll = ll1_u128 * ll2_u128; // long multiply
U64 lll_hi = (U64) (lll >> 64); // high bits
U64 lll_lo = (U64) (lll & 0xFFFFFFFFFFFFFFFFULL); // low bits
return lll_hi + lll_lo; // 64-bit add together In x86_64 assembly, we get this: _mul128:
mov rax, rsi
mul rdi # note: rax stores high bits now
add rax, rdx
ret The reason I use inline assembly is because neither GCC nor Clang will emit the right code and prefer to spew out nonsense, and the The code is quite efficient: umull r12, lr, r0, r2
mov r5, #0
mov r4, #0
umaal r5, lr, r1, r2
umaal r5, r4, r0, r3
umaal lr, r4, r1, r3
adds r0, lr, r12
adc r1, r4, r5 TBH |
I also kinda agree about the pointer thing. On 32-bit, those two structs will take up 8 registers. ARM for example can only pass up to 4 in a function call before it has to push to the stack. And x86 only has 7 registers. |
Actually, I benchmarked it via qsort, and it seems like passing value is in fact better on both 32-bit and 64-bit (for x86, haven't tested ARM yet): #include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdint.h>
#include <string.h>
typedef struct {
uint64_t ll1;
uint64_t ll2;
} XXH128_hash_t;
__attribute__((__noinline__)) int
XXH128_hash_cmp_ptr(const XXH128_hash_t *a, const XXH128_hash_t *b)
{
if (a->ll2 == b->ll2) {
if (a->ll1 == b->ll1) {
return 0;
} else if (a->ll1 < b->ll1) {
return -1;
} else {
return 1;
}
}
if (a->ll2 < b->ll2) {
return -1;
} else {
return 1;
}
}
__attribute__((__noinline__)) int
XXH128_hash_cmp(const XXH128_hash_t a, const XXH128_hash_t b)
{
if (a.ll2 == b.ll2) {
if (a.ll1 == b.ll1) {
return 0;
} else if (a.ll1 < b.ll1) {
return -1;
} else {
return 1;
}
}
if (a.ll2 < b.ll2) {
return -1;
} else {
return 1;
}
}
int XXH128_hash_cmp_ptr_qsort(const void *start, const void *end)
{
return XXH128_hash_cmp_ptr((const XXH128_hash_t *)start, (const XXH128_hash_t *)end);
}
int XXH128_hash_cmp_qsort(const void *start, const void *end)
{
return XXH128_hash_cmp(*(const XXH128_hash_t *)start, *(const XXH128_hash_t *)end);
}
#define ROUNDS 1000000
static XXH128_hash_t arr[ROUNDS];
static XXH128_hash_t *arr_p[ROUNDS];
int main()
{
srand(time(NULL));
printf("arch: %zu-bit\n", sizeof(void *) * 8);
for (int i = 0; i < ROUNDS; i++) {
arr[i].ll1 = rand() | (uint64_t)rand() << 32;
arr[i].ll2 = rand() | (uint64_t)rand() << 32;
arr_p[i] = malloc(sizeof(XXH128_hash_t));
memcpy(arr_p[i], &arr[i], sizeof(XXH128_hash_t));
}
double start, end;
start = (double)clock();
qsort(arr, ROUNDS, sizeof(XXH128_hash_t), XXH128_hash_cmp_qsort);
end = (double)clock();
printf("pass by value: %lf\n", (end - start) / CLOCKS_PER_SEC);
start = (double)clock();
qsort(arr_p, ROUNDS, sizeof(XXH128_hash_t *), XXH128_hash_cmp_ptr_qsort);
end = (double)clock();
printf("pass by pointer: %lf\n", (end - start) / CLOCKS_PER_SEC);
}
(note: compiled with clang 7.0.1 on a 2.0 GHz 2nd Gen Core i7, GCC 8.3:
|
On ARM64 there are enough registers for ABI to pass it as value (r0..r7), but for 32Bit ARM the pointer-version is better as there are only 4 registers. |
Nevermind that, this is the way to do it: #include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <stdint.h>
#include <string.h>
typedef struct {
uint64_t ll1;
uint64_t ll2;
} XXH128_hash_t;
__attribute__((__noinline__)) int
XXH128_hash_cmp_ptr(const XXH128_hash_t *a, const XXH128_hash_t *b)
{
if (a->ll2 == b->ll2) {
if (a->ll1 == b->ll1) {
return 0;
} else if (a->ll1 < b->ll1) {
return -1;
} else {
return 1;
}
}
if (a->ll2 < b->ll2) {
return -1;
} else {
return 1;
}
}
__attribute__((__noinline__)) int
XXH128_hash_cmp(const XXH128_hash_t a, const XXH128_hash_t b)
{
if (a.ll2 == b.ll2) {
if (a.ll1 == b.ll1) {
return 0;
} else if (a.ll1 < b.ll1) {
return -1;
} else {
return 1;
}
}
if (a.ll2 < b.ll2) {
return -1;
} else {
return 1;
}
}
int XXH128_hash_cmp_ptr_qsort(const void *start, const void *end)
{
return XXH128_hash_cmp_ptr(*(const XXH128_hash_t **)start, *(const XXH128_hash_t **)end);
}
int XXH128_hash_cmp_qsort(const void *start, const void *end)
{
return XXH128_hash_cmp(*(const XXH128_hash_t *)start, *(const XXH128_hash_t *)end);
}
#define ROUNDS 10000000
static XXH128_hash_t arr[ROUNDS];
static XXH128_hash_t *arr_p[ROUNDS];
static XXH128_hash_t arr_p_alt[ROUNDS];
int main()
{
srand(time(NULL));
printf("arch: %zu-bit\n", sizeof(void *) * 8);
for (int i = 0; i < ROUNDS; i++) {
arr[i].ll1 = rand() | (uint64_t)rand() << 32;
arr[i].ll2 = rand() | (uint64_t)rand() << 32;
arr_p[i] = malloc(sizeof(XXH128_hash_t));
memcpy(arr_p[i], &arr[i], sizeof(XXH128_hash_t));
memcpy(&arr_p_alt[i], &arr[i], sizeof(XXH128_hash_t));
}
double start, end;
start = (double)clock();
qsort(arr, ROUNDS, sizeof(XXH128_hash_t), XXH128_hash_cmp_qsort);
end = (double)clock();
printf("pass by value: %lf\n", (end - start) / CLOCKS_PER_SEC);
start = (double)clock();
qsort(arr_p, ROUNDS, sizeof(XXH128_hash_t *), XXH128_hash_cmp_ptr_qsort);
end = (double)clock();
printf("pass by pointer: %lf\n", (end - start) / CLOCKS_PER_SEC);
start = (double)clock();
qsort(arr_p_alt, ROUNDS, sizeof(XXH128_hash_t), (int (*)(const void *, const void *)) XXH128_hash_cmp_ptr);
end = (double)clock();
printf("direct pass to qsort: %lf\n", (end - start) / CLOCKS_PER_SEC);
} (I increased the count to make it more noticable)
Use the pointer, and to use it for /* To use it as a comparator for qsort, bsearch, or the like, the recommended way to use this is to
* cast to int (*)(const void*, const void*). This has the best performance.
* qsort(array, size, sizeof(XXH128_hash_t), (int (*)(const void*, const void*)) XXH128_hash_cmp)
* This assumes an array of XXH128_hash_t values. */ Because a compare function for pointers is going to be the most important usage of this, as I presume it will be used in stuff like |
As for the C++ version, we are going to want to inline the comparisons, as it makes static inline bool operator ==(const XXH128_hash_t &a, const XXH128_hash_t &b)
{
return (a.ll1 == b.ll1) && (a.ll2 == b.ll2);
}
static inline bool operator <(const XXH128_hash_t &a, const XXH128_hash_t &b)
{
if (a.ll2 == b.ll2) {
return a.ll1 < b.ll1;
} else {
return a.ll2 < b.ll2;
}
}
static inline bool operator !=(const XXH128_hash_t &a, const XXH128_hash_t &b)
{
return !(a == b);
}
static inline bool operator >(const XXH128_hash_t& a, const XXH128_hash_t& b)
{
return b < a;
}
static inline bool operator <=(const XXH128_hash_t& a, const XXH128_hash_t& b)
{
return !(a > b);
}
static inline bool operator >=(const XXH128_hash_t& a, const XXH128_hash_t& b)
{
return !(a < b);
} And by passing by
Edit: the first std::sort uses this: static inline bool operator<(const XXH128_hash_t &a, const XXH128_hash_t &b) {
return XXH128_hash_cmp_ptr(&a, &b) < 0;
} |
Even better, and this is all C++11 compatible constexpr which while it isn't that important, it seems to help optimization a bit. int
XXH128_hash_cmp_ptr(const XXH128_hash_t *a, const XXH128_hash_t *b)
{
if (a->ll2 == b->ll2) {
return (a->ll1 < b->ll1) ? -1 : (a->ll1 > b->ll1);
}
if (a->ll2 < b->ll2) {
return -1;
} else {
return 1;
}
}
static inline constexpr bool
operator ==(const XXH128_hash_t &a, const XXH128_hash_t &b)
{
return (a.ll1 == b.ll1) && (a.ll2 == b.ll2);
}
static inline constexpr bool
operator<(const XXH128_hash_t &a, const XXH128_hash_t &b)
{
#if (defined(_WIN32) || defined(__LITTLE_ENDIAN__)) && (defined(__SIZEOF_INT128__) || (defined(_INTEGRAL_MAX_BITS) && _INTEGRAL_MAX_BITS >= 128))
/* convert to uint128_t and compare */
return (a.ll1 | (static_cast<__uint128_t>(a.ll2) << 64))
< (b.ll1 | (static_cast<__uint128_t>(b.ll2) << 64));
#else
return (a.ll2 == b.ll2) ? (a.ll1 < b.ll1) : (a.ll2 < b.ll2);
#endif
}
static inline constexpr bool
operator !=(const XXH128_hash_t &a, const XXH128_hash_t &b)
{
return !(a == b);
}
static inline constexpr bool
operator >(const XXH128_hash_t& a, const XXH128_hash_t& b)
{
return b < a;
}
static inline constexpr bool
operator <=(const XXH128_hash_t& a, const XXH128_hash_t& b)
{
return !(a > b);
}
static inline constexpr bool
operator >=(const XXH128_hash_t& a, const XXH128_hash_t& b)
{
return !(a < b);
} It is incredibly fast. Unfortunately, the uint128_t compare for the C-style sorting isn't as good as the other method.
On x86_64 with clang, __ZltRK13XXH128_hash_tS1_: ## @operator<(XXH128_hash_t const&, XXH128_hash_t const&)
mov rax, qword ptr [rdi]
mov rcx, qword ptr [rdi + 8]
cmp rax, qword ptr [rsi]
sbb rcx, qword ptr [rsi + 8]
setb al
ret and on i386 it compiles to this: __ZltRK13XXH128_hash_tS1_: ## @operator<(XXH128_hash_t const&, XXH128_hash_t const&)
push ebp
push ebx
push edi
push esi
mov esi, dword ptr [esp + 24]
mov ecx, dword ptr [esp + 20]
mov eax, dword ptr [ecx + 8]
mov edx, dword ptr [ecx + 12]
mov ebx, dword ptr [esi + 8]
mov edi, dword ptr [esi + 12]
mov ebp, edx
xor ebp, edi
mov esi, eax
xor esi, ebx
or esi, ebp
jne LBB5_2
mov eax, dword ptr [ecx]
mov ecx, dword ptr [ecx + 4]
mov edx, dword ptr [esp + 24]
cmp eax, dword ptr [edx]
sbb ecx, dword ptr [edx + 4]
jmp LBB5_3
LBB5_2:
cmp eax, ebx
sbb edx, edi
LBB5_3:
setb al
pop esi
pop edi
pop ebx
pop ebp
ret Edit: GCC 8.3
|
As mentioned by @easyaspi314 , it's bit a more than just returning the lower bits. I'm fine with a rewrite of the function if it can lead to better performance.
Good point |
Consequences of implementing a comparator : In saying that one hash value is "bigger" than the other one, it makes Possible answer :
Yes, I agree. |
I'd go for this. This would be compatible with a natural uint128_t if it exists.
Also if you consider the struct as replacement for uint128_t |
// 0x00112233445566778899AABBCCDDEEFF
const __uint128_t x = ((__uint128_t)0x0011223344556677ULL<< 64) | 0x8899AABBCCDDEEFFULL;
For uint128_t Doing it this way would allow someone to theoretically do this on a 64-bit little endian machine: XXH128_hash_t hash = XXH3_128b(...);
__uint128_t val;
memcpy(&val, &hash, sizeof(__uint128_t));
It is supposed to be like one, and if the C standard had 128-bit integers, we would definitely use them. But it doesn't, and the only platforms that have it are 64-bit. |
Yes, that's indeed the problem. At a binary level :
At a canonical level, I believe it makes sense to preserve the "natural" byte order, as if a But binary and canonical levels do not have to match. Later on, a pair of conversion functions, such as So let's concentrate on binary level now, It would seem a matter to select which platform is more important, little or big endian. And these days, little endian is pretty much a winner. That being said, even that does not matter much : Consistency is also a topic. But at the end of the day, both solutions work. As long as I'm fine with both. |
I went ahead and proceeded with renaming I used the little-endian binary convention, hence I also believe it's not opposed to using big-endian for the canonical representation. |
"by value" vs "by reference" : I would really not feel too much concerned by a Moreover, whenever the comparator function can be inlined, it all becomes moot. One can expect the comparison to be performed directly on source data. Btw, maybe it's important to offer an inlinable comparator. In contrast, I would feel more concerned by branches, which can be costly when they mispredict. If we do not constrain the comparator to return |
In this godbolt example, there is a comparison between a classical "branchy" comparator and a branchless one. On 64-bit, the branchless variant looks good. It uses less instructions than the "branchy" one and avoids a There is also a fully branchless comparator proposed at However, as mentioned before, it's unlikely that the high 64 bits of Now it's a matter of comparing these implementations... |
I've used @easyaspi314 Some conclusions : One indirection vs 2 indirections make a difference, though a moderate one (~10%). So, the I still believe that passing arguments by value feel more natural, |
typedef union {
struct {
uint64_t ll1;
uint64_t ll2;
};
struct {
/* don't use it directly! */
size_t __val[(sizeof(uint64_t)*2)/sizeof(size_t)];
};
} XXH128_hash_t;
int
XXH128_hash_cmp_ptr(const XXH128_hash_t *a, const XXH128_hash_t *b)
{
#if defined(__LITTLE_ENDIAN__) || defined(_WIN32)
/* Only compare the word sizes. This only works on little endian. */
int cmp;
size_t i = (sizeof(a->__val)/sizeof(a->__val[0]));
while (i-- > 0) {
cmp = (a->__val[i] > b->__val[i]) - (a->__val[i] < b->__val[i]);
if (cmp) {
return cmp;
}
}
return cmp;
#else
/* normal comparison, don't really care */
#endif
} That has the best performance on 32-bit. XXH128_hash_cmp_ptr_alt:
mov rax, qword ptr [rsi+0x8]
cmp qword ptr [rdi+0x8], rax
je L7
sbb eax, eax
or eax, 0x1
L1:
ret
L7:
mov rdx, QWORD PTR [rsi]
mov eax, 0x1
cmp QWORD PTR [rdi], rdx
ja L1
sbb eax, eax
ret That clever bit of assembly has the best I've seen on 64-bit, and it is what GCC emits with the naive implementation: __attribute__((__noinline__)) int
XXH128_hash_cmp_ptr(const XXH128_hash_t *a, const XXH128_hash_t *b)
{
if (a->high64 == b->high64) {
if (a->low64 > b->low64) {
return 1;
} else if (a->low64 < b->low64) {
return -1;
} else {
return 0;
}
}
if (a->high64 < b->high64) {
return -1;
} else {
return 1;
}
} |
As for qsort: I guess the most impact comes from swaping the 128bit values if sorting does not use an array of pointers. |
Will there be createState/update/freeState APIs for XXH3? |
Yes, that's basically next step. |
I wasn't aware of this hash, thanks for the link ! |
There is currently a huge bug in the implementation, so some parts need to be redesigned. |
I have to disagree with the mention of "huge bug" here. I maintain that a non-cryptographic hash should not even try to "resist" an attack, by adding layers of obfuscations that will just be defeated later on. In the meantime, there will always be a temptation to think "I don't understand what's going on in this algorithm; sure, it's labelled non-cryptographic, but (wink) it seems it's almost as good ", then it gets used in places where it should not, and a bit later, we've got an effective attack ready to roll on all these brittle implementations. A non cryptographic algorithm should not hide the fact that it's non cryptographic, not even in its implementations. In the case of The objective of a non-cryptographic hash algorithm is just to quickly generate a bunch of bits, to be used in a hash table, bloom filter, etc. The changes I'm implementing are aimed at the space reduction problem, which is far from severe. They will, as a side effect, make it a bit more difficult to generate intentional collisions, but it certainly does not make the algorithm any more secure, and is not the goal. |
@Cyan4973 might be a good idea to put that patch into |
Just went to go update my XXH3 implementation in easyaspi314/xxhash-clean:xxh3. Oh my how XXH3 has changed… |
Just broke the 6 GB/s barrier on my phone with Clang 9 by moving the
Clang seems to interleave it better that way. |
GCC 9 is slightly slower, although it is slower in general (4.5 GB/s) due to using inline assembly everywhere in However, screw GCC AArch64. Everybody uses clang anyways. 😂 Add proper builtins and then I might change my mind |
Updated xxhash-clean with the latest XXH3. https://github.com/easyaspi314/xxhash-clean Not planning to do streaming — not yet anyways. The other cool thing is that it can compile really small, especially since XXH3_64bits and XXH3_128bits are isolated.
XXH_INLINE_ALL wrapping the same functions:
ABI compatible with the single shot functions. Edit: Happy Pi Day! |
RISC-V 64-bit on qemu-riscv64 4.1.1 on my Pixel 2 XL (because my PC is in use)
Note that Clang uses byteshift loads, and it seems that RISC-V has optional but recommended unaligned access, saying it can be managed by the execution environment or implemented in hardware. This was intended to make implementing unaligned access easier/having the ability to disable it on embedded. If I do
Also, here's mips64el (MIPS VI) on qemu-mips64el 2.11.50 (because 4.4.0 isn't in the repos and I already had that version patched for Termux)
Also, damn, generating a constant is torture on RISC-V... int64_t load_constant(void)
{
return 0x165667919E3779F9;
} load_constant:
lui a0, 45
addiw a0, a0, -1331
slli a0, a0, 14
addi a0, a0, -883
slli a0, a0, 14
addi a0, a0, -913
slli a0, a0, 15
addi a0, a0, -1543
ret int64_t load_constant(void)
{
int32_t a0_32 = 45 << 12;
a0_32 = a0_32 - 1331;
int64_t a0 = (int64_t)a0_32; // sign extend
a0 = a0 << 14;
a0 = a0 - 883;
a0 = a0 << 14;
a0 = a0 - 913;
a0 = a0 << 15;
a0 = a0 - 1543;
return a0;
} |
I spent the past days trying to find and exploit other weaknesses, either in implementation or in the algorithm itself, using a few hunches where I expected to find exploitable problems, but ultimately couldn't trigger any additional issue. So I presume the scope is now good enough to contemplate a new release By far, the most important remaining decision is : is it worth fixing the avalanche issue detected by The use case seems minor, though not necessarily "useless". And this is more or less the last chance to fix something like that, as next planned release, Also, indirectly, this could open the door to additional minor changes that were kept at bay because they were also impacting the output, such as for example @easyaspi314 's suggestion to change the default secret to use the decimal of These changes are not expecting to impact the planning much, a few days at most. So this is more about taking a decision. If you want to voice your opinion on this topic, now is a great time. AnnexFor the record, list of other topics that can still be worked upon, but are not required for next release :
|
If some tweak is done that changes the computed values anyway, then would/could #342 be rolled into the same release too? |
OK, time to take a decision, I lean on making the #395 change part of the release. Here is my line of reasoning:
Now, speaking of planning, if I merge this change, then other breaking changes can be introduced too. Btw, @easyaspi314 , are you still interested in updating edit : created new branch edit 2 : pushed branch |
All breaking changes are now merged into staging branch |
I'm trying to find a good define that will let the C preprocessor differentiate if the XXH3 routines are available via the xxhash.h include. This is so that my code can auto-enable/disable support of the newer hash algorithms no matter what xxhash-devel package the user has installed (I'm hoping to avoid a compile test via autoconf). Obviously there's not a released version that has XXH3 algorithms in the stock xxhash.h yet, so what does the future hold? Any chance for a define that would be supported currently via xxh3.h and would also be present in the future via xxhash.h? Or is the best differentiator going to be via some future XXH_VERSION_NUMBER value check? At the moment I settled on a combination of my own define for if xxh3.h should be included combined with a version check:
|
xxHash installs Also, don't include xxh3.h directly. It is likely going to disappear, and you can just do this: #define XXH_INLINE_ALL
#include <xxhash.h> |
Is there a plan for what the first version number will be that includes XXH3 & XXH128 by default? Is it going to be 1.0.0 (aka 10000)? |
It will either be |
Time to clearly define the scope to upgrade to "stable" status for Stable at v0.8.0This is the minimal scope which, I believe, is beyond dispute and will be labelled stable for next release. 64-bit one shot functions : State management functions, from streaming : Streaming functions, with equal scope as one-shot functions : 128-bit one-shot hash function : Helpers for Not part of stabilizationThe following functions will not be stabilized at
|
Current proposal for All existing |
Is it okay if we set This would match /* LP64 defines uint64_t as unsigned long, try to match it. */
# if defined(__LP64__) && ULONG_MAX == 0xFFFFFFFFFFFFFFFFULL
typedef unsigned long XXH64_hash_t;
# else
typedef unsigned long long XXH64_hash_t;
# endif |
I presume it is |
Ok, how is this? It should cover all cases. #ifndef XXH64_HASH_T_DEFINED
# define XXH64_HASH_T_DEFINED
# if !defined (__VMS) \
&& (defined (__cplusplus) /* C++ */ \
|| (defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 199901L) /* C99 */) \
|| defined(UINT64_MAX) /* stdint.h already included */)
# include <stdint.h>
typedef uint64_t XXH64_hash_t;
# else
# include <limits.h>
/* Fake stdint.h */
# if defined(__UINT64_TYPE__) /* usually predefined by GCC */
typedef __UINT64_TYPE__ XXH64_hash_t;
# elif defined(_MSC_VER) /* MSVC */
typedef unsigned __int64 XXH64_hash_t;
# elif ULONG_MAX == 0xFFFFFFFFFFFFFFFFULL /* LP64 ABI */
typedef unsigned long XXH64_hash_t;
# else
typedef unsigned long long XXH64_hash_t;
# endif
# endif
#endif /* XXH64_HASH_T_DEFINED */ The #include <xxhash.h>
#include <stdint.h>
#define XXH_INLINE_ALL
#include <xxhash.h> |
Okay, it looks complex, but I guess it indeed adapts to a lot of situations.
Have you checked that this is actually required ? |
Actually, this still isn't good enough. The check for stdint.h being included is overkill, and old MSVC versions don't have This should work: #if defined(_MSC_VER) /* MSVC has always supported these types */
typedef unsigned __int32 XXH32_hash_t;
typedef unsigned __int64 XXH64_hash_t;
#elif !defined(__VMS) \
&& ( \
(defined(__cplusplus) && __cplusplus >= 201103L) /* C++11 */ \
|| (defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L) /* C99 */ \
)
# include <stdint.h>
...
#else
....
#endif I tested on VC++2005 and sure enough, when compiled in C++ mode, it errors because it can't find It found two other issues:
If we want to be compatible with basically any compiler we put at it, why not add a few CI checks against uncommon/really old compiler/libc combos? It is good for testing I have figured out how to programmatically download and install the VC++2005 CLI tools from PowerShell, and I can probably set up a GCC 3.3/glibc 2.2 toolchain for Linux. We can cache these. |
If the test becomes so complex, it may be better to create a new build macro like
There are some limits to this exercise. If an old or rare compiler requires some larger code change to adjust, with several modifications spread out into the code base, then it's best to leave that out. We don't want to impact general code readability and maintenance for the sake of a very rare scenario. |
Good point. It is a little complicated. I will make a PR with the fixes for those though, as well as obeying strict aliasing as best as I can in the main loop. |
xxHash v0.8.0 has been released this morning, it stabilizes output values of Side-question : is it still necessary to keep this "general discussion" thread open ? |
I'd say close this one; more targeted issues are better once things are stable & settled down. |
answering : Cyan4973/xxHash#175 (comment) The probability of receiving an empty string is larger than random (> 1 / 2^64), making the generated hash more "common". For some algorithm, it's an issue if this "more common" value is 0. Maps it instead to an avalanche of an arbitrary start value (prime64). The start value is blended with the `seed` and the `secret`, so that the result is dependent on those ones too.
This is going to be a tracker for discussion, questions, feedback, and analyses about the new XXH3 hashes, found in the
xxh3
branch.@Cyan4973's comments (from
xxhash.h
):XXH3
is a new hash algorithm, featuring vastly improved speed performance for both small and large inputs.A full speed analysis will be published, it requires a lot more space than this comment can handle.
In general, expect
XXH3
to run about ~2x faster on large inputs, and >3x faster on small ones, though exact difference depend on platform.The algorithm is portable, will generate the same hash on all platforms. It benefits greatly from vectorization units, but does not require it.
XXH3
offers 2 variants,_64bits
and_128bits
.The first 64-bits field of the
_128bits
variant is the same as_64bits
result. However, if only 64-bits are needed, prefer calling the_64bits
variant. It reduces the amount of mixing, resulting in faster speed on small inputs.The
XXH3
algorithm is still considered experimental. It's possible to use it for ephemeral data, but avoid storing long-term values for later re-use. While labelled experimental, the produced result can still change between versions.The API currently supports one-shot hashing only. The full version will include streaming capability, and canonical representation Long term optional feature may include custom secret keys, and secret key generation.
There are still a number of opened questions that community can influence during the experimental period. I'm trying to list a few of them below, though don't consider this list as complete.
XXH64()
(aka big-endian).XXH32
/XXH64
, but may be more natural for little-endian platforms.XXH128_hash_t
?XXH128_hash_t
which would be desirable ?_128bits
variant is the same as the result of_64bits
.XXH128_hash_t
, in ways which may block other possibilities.doubleSeed
).XXH128
?len==0
: Currently, the result of hashing a zero-length input is the seed.XXH32
/XXH64
).The text was updated successfully, but these errors were encountered: