-
Notifications
You must be signed in to change notification settings - Fork 739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x86: Enable batch clearing by default #18778
Conversation
I would appreciate a review from @0xdaryl on this to make sure we have considered what we need to, before making this change |
Side note (you aware about this probably): non-zeroed TLH can be used for allocation of primitive arrays only. And yes, for so many years we can not achieve perf win of pre-zeroing TLH in x86, so it was not enabled. |
Don't we also need to switch on |
Everything should work if this line in
And seems this change does this, |
I am only approving the fact that the enablement looks okay. I'll wait for Daryl's review and run tests once the dependent OMR PR is merged first. |
Another side question is can memory zeroing performance be improved by customizing |
I am experimenting with that |
Please note that de facto OMRZeroMemory() should have start address/size alined to processor word size. This is natural limitation from other platforms implementations. It might simplify customization task a little bit. |
I think the code paths that need to be aware of dual TLH are aware of it. Can you describe how you tested this? Presumably on systems and workloads within IBM. Since you enable this for 32-bit as well this should include tests on 32-bit Java 8.
Please report your conclusions from that. I suspect memset() performance is quite good, but would like to hear your conclusion. |
We may need to consider some architecture version check (e.g. >= skylake) to enable only on architecture versions where performance was vetted. It may be that this can be added afterwards (before the release code complete window) once performance testing has been done more thoroughly and let this commit get tested for functional correctness in the meantime on as many machines as possible. Are results expected to be different based on memset (glibc) version on a given machine ? If so, maybe this is one more reason why having our own well tuned routine for clearing would be preferred. |
The port library doesn't support this yet, but its being worked on. |
e7caaeb
to
b642a7f
Compare
Signed-off-by: Bradley Wood <[email protected]>
b642a7f
to
8d366c1
Compare
@0xdaryl Previous builds got deleted on jenkins but looked OK. I had ran sanity,extended builds of function/system/openjdk test buckets. I have relaunched testing, including a 32-bit Java 8 build. I can't draw any conclusions on OMRZeroMemory() performance yet. As for a microarchitecture cutoff, I will leave it unrestricted for now, especially as we don't know where to draw the line, nor can the JIT/port library even detect anything newer than skylake. I will soon open a PR in OMR to support this. |
This change needs to be tested with eclipse-omr/omr#7234