Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86: Enable batch clearing by default #18778

Closed

Conversation

BradleyWood
Copy link
Member

This change needs to be tested with eclipse-omr/omr#7234

@BradleyWood BradleyWood added depends:omr Pull request is dependent on a corresponding change in OMR comp:jit arch:x86 labels Jan 18, 2024
@vijaysun-omr
Copy link
Contributor

I would appreciate a review from @0xdaryl on this to make sure we have considered what we need to, before making this change

@dmitripivkine
Copy link
Contributor

Side note (you aware about this probably): non-zeroed TLH can be used for allocation of primitive arrays only. And yes, for so many years we can not achieve perf win of pre-zeroing TLH in x86, so it was not enabled.

@vijaysun-omr
Copy link
Contributor

Don't we also need to switch on -Xgc:batchClearTLH by default to properly enable this feature ?

@dmitripivkine
Copy link
Contributor

Don't we also need to switch on -Xgc:batchClearTLH by default to properly enable this feature ?

Everything should work if this line in DLLMain.cpp is called eventually:

vm->memoryManagerFunctions->allocateZeroedTLHPages(vm, true);

And seems this change does this,

@vijaysun-omr
Copy link
Contributor

vijaysun-omr commented Jan 18, 2024

I am only approving the fact that the enablement looks okay. I'll wait for Daryl's review and run tests once the dependent OMR PR is merged first.

@dmitripivkine
Copy link
Contributor

Another side question is can memory zeroing performance be improved by customizing OMRZeroMemory() for x86. Current default implementation rely on standard memset().

@BradleyWood
Copy link
Member Author

Another side question is can memory zeroing performance be improved by customizing OMRZeroMemory() for x86. Current default implementation rely on standard memset().

I am experimenting with that

@dmitripivkine
Copy link
Contributor

Another side question is can memory zeroing performance be improved by customizing OMRZeroMemory() for x86. Current default implementation rely on standard memset().

I am experimenting with that

Please note that de facto OMRZeroMemory() should have start address/size alined to processor word size. This is natural limitation from other platforms implementations. It might simplify customization task a little bit.

@0xdaryl
Copy link
Contributor

0xdaryl commented Jan 22, 2024

I think the code paths that need to be aware of dual TLH are aware of it.

Can you describe how you tested this? Presumably on systems and workloads within IBM. Since you enable this for 32-bit as well this should include tests on 32-bit Java 8.

I am experimenting with that

Please report your conclusions from that. I suspect memset() performance is quite good, but would like to hear your conclusion.

@vijaysun-omr
Copy link
Contributor

We may need to consider some architecture version check (e.g. >= skylake) to enable only on architecture versions where performance was vetted. It may be that this can be added afterwards (before the release code complete window) once performance testing has been done more thoroughly and let this commit get tested for functional correctness in the meantime on as many machines as possible.

Are results expected to be different based on memset (glibc) version on a given machine ? If so, maybe this is one more reason why having our own well tuned routine for clearing would be preferred.

@BradleyWood
Copy link
Member Author

We may need to consider some architecture version check (e.g. >= skylake)

The port library doesn't support this yet, but its being worked on.

@BradleyWood
Copy link
Member Author

@0xdaryl Previous builds got deleted on jenkins but looked OK. I had ran sanity,extended builds of function/system/openjdk test buckets. I have relaunched testing, including a 32-bit Java 8 build. I can't draw any conclusions on OMRZeroMemory() performance yet.

As for a microarchitecture cutoff, I will leave it unrestricted for now, especially as we don't know where to draw the line, nor can the JIT/port library even detect anything newer than skylake. I will soon open a PR in OMR to support this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch:x86 comp:jit depends:omr Pull request is dependent on a corresponding change in OMR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants