Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use __builtin_ctz and __builtin_clz in dlmalloc #18186

Merged
merged 3 commits into from
Nov 12, 2022

Conversation

binji
Copy link
Contributor

@binji binji commented Nov 11, 2022

These operations map to the WebAssembly instructions i32.ctz and i32.clz. Without this, the various macros fall back to a C implementation which does not seem to be detected as ctz or clz operations by clang at -O3.

This operation maps to the WebAssembly instruction `i32.ctz`. Without this, the
`compute_bit2idx` macro falls back to a C implementation which does not seem to
be detected as a ctz operation by clang at `-O3`.
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@kripken kripken enabled auto-merge (squash) November 11, 2022 20:50
@kleisauke
Copy link
Collaborator

kleisauke commented Nov 11, 2022

Could the compute_tree_index macro use something similar? IIUC, that would ensure the use of i32.clz.

#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
#define compute_tree_index(S, I)\
{\
unsigned int X = S >> TREEBIN_SHIFT;\
if (X == 0)\
I = 0;\
else if (X > 0xFFFF)\
I = NTREEBINS-1;\
else {\
unsigned int K = (unsigned) sizeof(X)*__CHAR_BIT__ - 1 - (unsigned) __builtin_clz(X); \
I = (bindex_t)((K << 1) + ((S >> (K + (TREEBIN_SHIFT-1)) & 1)));\
}\
}

Otherwise, it falls back on this variant:
#define compute_tree_index(S, I)\
{\
size_t X = S >> TREEBIN_SHIFT;\
if (X == 0)\
I = 0;\
else if (X > 0xFFFF)\
I = NTREEBINS-1;\
else {\
unsigned int Y = (unsigned int)X;\
unsigned int N = ((Y - 0x100) >> 16) & 8;\
unsigned int K = (((Y <<= N) - 0x1000) >> 16) & 4;\
N += K;\
N += K = (((Y <<= K) - 0x4000) >> 16) & 2;\
K = 14 - N + ((Y <<= K) >> 15);\
I = (K << 1) + ((S >> (K + (TREEBIN_SHIFT-1)) & 1));\
}\
}

(though, perhaps the compiler might already optimize this, untested)

@binji
Copy link
Contributor Author

binji commented Nov 11, 2022

@kleisauke ah, good catch! I'll add that in to this commit

auto-merge was automatically disabled November 11, 2022 21:10

Head branch was pushed to by a user without write access

@binji binji changed the title Use __builtin_ctz in dlmalloc Use __builtin_ctz and __builtin_clz in dlmalloc Nov 11, 2022
@kripken kripken enabled auto-merge (squash) November 11, 2022 21:18
auto-merge was automatically disabled November 11, 2022 22:56

Head branch was pushed to by a user without write access

@@ -1 +1 @@
4525
4224
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, nice improvements!

@kripken kripken enabled auto-merge (squash) November 11, 2022 23:30
@kripken kripken merged commit 1eb457b into emscripten-core:main Nov 12, 2022
@binji binji deleted the dmalloc-ctz branch November 12, 2022 00:22
@@ -3020,7 +3020,7 @@ I = (K << 1) + ((S >> (K + (TREEBIN_SHIFT-1)) & 1));\

/* index corresponding to given bit. Use x86 asm if possible */

#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
#if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__) || defined(__EMSCRIPTEN__))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder this doesn't use #if __has_builtin(__builtin_ctz) .. I guess maybe dlmalloc was written before such a macro existed?

I also wonder if we should use __wasm__ here instead of __EMSCRIPTEN__? I guess it doesn't matter much since if there is almost no chance of upstreaming these chagnes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, __wasm__ probably is a better choice; I used __EMSCRIPTEN__ since that was used by other modifications in that file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants