Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8343249: [Windows] Implement SpinPause #21781

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

TheShermanTanker
Copy link
Contributor

@TheShermanTanker TheShermanTanker commented Oct 30, 2024

SpinPause is currently not implemented on any Windows platforms, due to a lack of access to assembly in the Microsoft compiler. The YieldProcessor macro can act as a stand in for this purpose, as it compiles down to a single pause instruction on x64 (Which seems to be all that one needs to implement SpinPause in HotSpot). I am less certain about the Windows/ARM64 implementation. There, YieldProcessor compiles down to dmb ishst; yield and I am unsure whether that is a correct SpinPause implementation. If need be, I can retract the ARM64 implementation and only have this for x86


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8343249: [Windows] Implement SpinPause (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21781/head:pull/21781
$ git checkout pull/21781

Update a local copy of the PR:
$ git checkout pull/21781
$ git pull https://git.openjdk.org/jdk.git pull/21781/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21781

View PR using the GUI difftool:
$ git pr show -t 21781

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21781.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 30, 2024

👋 Welcome back jwaters! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 30, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title 8343249 8343249: [Windows] Implement SpinPause Oct 30, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 30, 2024
@openjdk
Copy link

openjdk bot commented Oct 30, 2024

@TheShermanTanker The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Oct 30, 2024

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is worthwhile. The specification for YieldProcessor states it only works for hyperthreaded processors. [1] This change doesn't obviously make anything better so I'd suggest leaving it alone in case it makes something worse. The DMB on Aarch64 seems a bit concerning and not obviously needed just to yield the processor.

[1] https://learn.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-yieldprocessor

@dholmes-ora
Copy link
Member

Also the Aarch64 folk went to a lot of trouble to provide a flexible implementation via OnSpinWaitInst, so if anything Windows-Aarch64 should use that. But again "if it ain't broke ..."

@TheShermanTanker
Copy link
Contributor Author

Hmm, in practice the macro for x86 reliably expands to either _mm_pause or __asm { rep nop } as mentioned in the docs (__yield is for ARM64 to my knowledge). Since the 32 bit Windows Port is no more (Or at least is going to die once Magnus commits the removal PR), the only relevant one is __mm_pause, which is basically just the pause instruction. I'm a little skeptical that it would break anything, since the docs do say that it safely has no effect on processors that do not support it, such as non hyperthreaded processors, after all. It seems the main concern here is that YieldProcessor might expand to something that isn't pause, in which case, it technically could be entirely replaced with __mm_pause itself. I simply chose the macro since it is technically more portable

No comment one the ARM64 one, I don't have any ARM assembly knowledge unfortunately. It's likely that I'll have to drop support for SpinPause for ARM64 entirely if this is to go forward, if it even does at all, unless someone with ARM64 knowledge steps in

@stooart-mon
Copy link
Contributor

The linux aarch64 code implements this as a call to a stub generated at runtime:

As this is using the macroassembler, etc, and not static inline assembly, I presume this could be ported to Windows.
I don't believe "dmb ishst" gives us the result we would like. While the YIELD instruction is targeted primarily for SMT CPUs, in practice it will effectively be a NOP .

@TheShermanTanker
Copy link
Contributor Author

Linux/ARM64 confuses me a little, since the implementation for the spin_wait stub seems to be completely empty:

address StubRoutines::aarch64::_spin_wait = CAST_FROM_FN_PTR(address, empty_spin_wait);

@dholmes-ora
Copy link
Member

Linux/ARM64 confuses me a little, since the implementation for the spin_wait stub seems to be completely empty:

address StubRoutines::aarch64::_spin_wait = CAST_FROM_FN_PTR(address, empty_spin_wait);

That's the initial value but it gets overwritten in ./cpu/aarch64/stubGenerator_aarch64.cpp:

    StubRoutines::aarch64::_spin_wait = generate_spin_wait();

@TheShermanTanker
Copy link
Contributor Author

Linux/ARM64 confuses me a little, since the implementation for the spin_wait stub seems to be completely empty:

address StubRoutines::aarch64::_spin_wait = CAST_FROM_FN_PTR(address, empty_spin_wait);

That's the initial value but it gets overwritten in ./cpu/aarch64/stubGenerator_aarch64.cpp:

    StubRoutines::aarch64::_spin_wait = generate_spin_wait();

Ah, thanks for the correction. I don't know how I missed that

@TheShermanTanker
Copy link
Contributor Author

I've switched ARM64 to use the existing assembler written for Linux. Let me know if this is correct or not

@TheShermanTanker
Copy link
Contributor Author

I've switched ARM64 to use the existing assembler written for Linux. Let me know if this is correct or not

Bumping

@TheShermanTanker
Copy link
Contributor Author

Bumping. @stooart-mon does ARM64 look good to you? Also paging for @shipilev since we were discussing about SpinPause in a different place some time ago

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. I think we need to see if there is a compelling performance reason to do this. There must be SpinPause benchmarks somewhere.
  2. I don't think we implement SpinPause for Linux x86, or are we?

@stooart-mon
Copy link
Contributor

Apologies - AFAICT that looks functional.

I'm not aware of specific SpinPause benchmarks, however, it is mostly used just for object monitors, so realistically you want to run benchmarks with a reliance on them.

@TheShermanTanker
Copy link
Contributor Author

TheShermanTanker commented Dec 11, 2024

Two things:

  1. I think we need to see if there is a compelling performance reason to do this. There must be SpinPause benchmarks somewhere.
  2. I don't think we implement SpinPause for Linux x86, or are we?

Both Linux 32 bit and 64 bit x86 implement SpinPause in pure assembly

DECLARE_FUNC(SpinPause):
        rep
        nop
        movq   $1, %rax
        ret

@TheShermanTanker
Copy link
Contributor Author

Sorry for leaving this behind. I am currently not able to perform any benchmarking for SpinPause, so this probably has to wait a little (Again)

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 8, 2025

@TheShermanTanker This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 5, 2025

@TheShermanTanker This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Feb 5, 2025
@TheShermanTanker
Copy link
Contributor Author

/open

@openjdk openjdk bot reopened this Feb 7, 2025
@openjdk
Copy link

openjdk bot commented Feb 7, 2025

@TheShermanTanker This pull request is now open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime [email protected] rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

4 participants