x86: Enable batch clearing by default #18778

BradleyWood · 2024-01-18T15:17:16Z

This change needs to be tested with eclipse-omr/omr#7234

vijaysun-omr · 2024-01-18T17:05:44Z

I would appreciate a review from @0xdaryl on this to make sure we have considered what we need to, before making this change

dmitripivkine · 2024-01-18T18:00:44Z

Side note (you aware about this probably): non-zeroed TLH can be used for allocation of primitive arrays only. And yes, for so many years we can not achieve perf win of pre-zeroing TLH in x86, so it was not enabled.

vijaysun-omr · 2024-01-18T18:05:33Z

Don't we also need to switch on -Xgc:batchClearTLH by default to properly enable this feature ?

dmitripivkine · 2024-01-18T18:17:55Z

Don't we also need to switch on -Xgc:batchClearTLH by default to properly enable this feature ?

Everything should work if this line in DLLMain.cpp is called eventually:

vm->memoryManagerFunctions->allocateZeroedTLHPages(vm, true);

And seems this change does this,

vijaysun-omr · 2024-01-18T23:35:02Z

I am only approving the fact that the enablement looks okay. I'll wait for Daryl's review and run tests once the dependent OMR PR is merged first.

dmitripivkine · 2024-01-19T19:52:26Z

Another side question is can memory zeroing performance be improved by customizing OMRZeroMemory() for x86. Current default implementation rely on standard memset().

BradleyWood · 2024-01-19T19:54:23Z

Another side question is can memory zeroing performance be improved by customizing OMRZeroMemory() for x86. Current default implementation rely on standard memset().

I am experimenting with that

dmitripivkine · 2024-01-19T20:05:27Z

Another side question is can memory zeroing performance be improved by customizing OMRZeroMemory() for x86. Current default implementation rely on standard memset().

I am experimenting with that

Please note that de facto OMRZeroMemory() should have start address/size alined to processor word size. This is natural limitation from other platforms implementations. It might simplify customization task a little bit.

0xdaryl · 2024-01-22T18:22:30Z

I think the code paths that need to be aware of dual TLH are aware of it.

Can you describe how you tested this? Presumably on systems and workloads within IBM. Since you enable this for 32-bit as well this should include tests on 32-bit Java 8.

I am experimenting with that

Please report your conclusions from that. I suspect memset() performance is quite good, but would like to hear your conclusion.

runtime/compiler/control/DLLMain.cpp

vijaysun-omr · 2024-01-22T21:13:27Z

We may need to consider some architecture version check (e.g. >= skylake) to enable only on architecture versions where performance was vetted. It may be that this can be added afterwards (before the release code complete window) once performance testing has been done more thoroughly and let this commit get tested for functional correctness in the meantime on as many machines as possible.

Are results expected to be different based on memset (glibc) version on a given machine ? If so, maybe this is one more reason why having our own well tuned routine for clearing would be preferred.

BradleyWood · 2024-01-23T14:35:28Z

We may need to consider some architecture version check (e.g. >= skylake)

The port library doesn't support this yet, but its being worked on.

Signed-off-by: Bradley Wood <[email protected]>

BradleyWood · 2024-01-25T17:38:42Z

@0xdaryl Previous builds got deleted on jenkins but looked OK. I had ran sanity,extended builds of function/system/openjdk test buckets. I have relaunched testing, including a 32-bit Java 8 build. I can't draw any conclusions on OMRZeroMemory() performance yet.

As for a microarchitecture cutoff, I will leave it unrestricted for now, especially as we don't know where to draw the line, nor can the JIT/port library even detect anything newer than skylake. I will soon open a PR in OMR to support this.

BradleyWood added depends:omr Pull request is dependent on a corresponding change in OMR comp:jit arch:x86 labels Jan 18, 2024

BradleyWood assigned BradleyWood and vijaysun-omr and unassigned BradleyWood Jan 18, 2024

vijaysun-omr approved these changes Jan 18, 2024

View reviewed changes

0xdaryl reviewed Jan 22, 2024

View reviewed changes

runtime/compiler/control/DLLMain.cpp Outdated Show resolved Hide resolved

BradleyWood force-pushed the enableBatchClear branch from e7caaeb to b642a7f Compare January 25, 2024 15:43

x86: Enable batch clearing by default

8d366c1

Signed-off-by: Bradley Wood <[email protected]>

BradleyWood force-pushed the enableBatchClear branch from b642a7f to 8d366c1 Compare January 25, 2024 17:26

BradleyWood closed this Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86: Enable batch clearing by default #18778

x86: Enable batch clearing by default #18778

BradleyWood commented Jan 18, 2024

vijaysun-omr commented Jan 18, 2024

dmitripivkine commented Jan 18, 2024

vijaysun-omr commented Jan 18, 2024

dmitripivkine commented Jan 18, 2024

vijaysun-omr commented Jan 18, 2024 •

edited

Loading

dmitripivkine commented Jan 19, 2024

BradleyWood commented Jan 19, 2024

dmitripivkine commented Jan 19, 2024

0xdaryl commented Jan 22, 2024 •

edited

Loading

vijaysun-omr commented Jan 22, 2024

BradleyWood commented Jan 23, 2024

BradleyWood commented Jan 25, 2024

x86: Enable batch clearing by default #18778

x86: Enable batch clearing by default #18778

Conversation

BradleyWood commented Jan 18, 2024

vijaysun-omr commented Jan 18, 2024

dmitripivkine commented Jan 18, 2024

vijaysun-omr commented Jan 18, 2024

dmitripivkine commented Jan 18, 2024

vijaysun-omr commented Jan 18, 2024 • edited Loading

dmitripivkine commented Jan 19, 2024

BradleyWood commented Jan 19, 2024

dmitripivkine commented Jan 19, 2024

0xdaryl commented Jan 22, 2024 • edited Loading

vijaysun-omr commented Jan 22, 2024

BradleyWood commented Jan 23, 2024

BradleyWood commented Jan 25, 2024

vijaysun-omr commented Jan 18, 2024 •

edited

Loading

0xdaryl commented Jan 22, 2024 •

edited

Loading