Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spmi replay pipeline #56871

Merged
66 commits merged into from
Aug 11, 2021
Merged

Spmi replay pipeline #56871

66 commits merged into from
Aug 11, 2021

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Aug 4, 2021

Introduce superpmi-replay public pipeline that will be triggered after JIT changes. Here is how it works:

  1. It will perform windowx-x64 build to cross-compile following jit binaries:
    • clrjit_win_x64_x64.dll
    • clrjit_win_arm64_x64.dll
    • clrjit_unix_x64_x64.dll
    • clrjit_unix_arm64_x64.dll
  2. It will perform windowx-x86 build to cross-compile following jit binaries:
    • clrjit_win_x86_x86.dll
    • clrjit_unix_arm_x86.dll
  3. It will spawn 6 machines corresponding to the 6 configurations about to run superpmi replay for below environment variables:
    • "JitStressRegs=0" (default - no flags)
    • "JitStressRegs=1"
    • "JitStressRegs=2"
    • "JitStressRegs=3"
    • "JitStressRegs=4"
    • "JitStressRegs=8"
    • "JitStressRegs=0x10"
    • "JitStressRegs=0x80"
    • "JitStressRegs=0x1000"
  4. In each of the 6 runs, it will download all the collections corresponding to the OS/architecture and do replay for each of the flags and upload a consolidated superpmi__arch.log as an artifact. Currently the pipeline fails because of a bug in helix tracked by https://github.com/dotnet/core-eng/issues/13983.

Changes:

  • Add superpmi-replay public pipeline
  • Added option for --no_progress in "superpmi download" so we do not see the progress logs during mch downloading.
  • Added a template of download-specific-artifacts.yml.

Since I have introduced superpmi-replay** files, I will submit a follow-up PR to rename existing superpmi* files that does the collection to superpmi-collect**.

Also, as a follow-up work, we would trigger this pipeline after superpmi-collect pipeline completes so the collection are fresh and we don't get "missing key errors".

Fixes: #52392

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 4, 2021
@kunalspathak kunalspathak changed the title Spmi replay Spmi replay pipeline Aug 5, 2021
@kunalspathak kunalspathak marked this pull request as ready for review August 5, 2021 16:27
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@kunalspathak
Copy link
Member Author

Can someone review this? Want to make sure that this goes in before we freeze the main.

displayName: Upload JIT to Azure Storage
env:
CLRJIT_AZ_KEY: $(clrjit_key1) # secret key stored as variable in pipeline
- ${{ if eq(parameters.uploadAsArtifacts, false) }}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- ${{ if eq(parameters.uploadAsArtifacts, false) }}:
- ${{ if not(parameters.uploadAsArtifacts) }}:

eng/pipelines/coreclr/templates/build-jit-job.yml Outdated Show resolved Hide resolved
eng/pipelines/coreclr/templates/build-jit-job.yml Outdated Show resolved Hide resolved
@ghost
Copy link

ghost commented Aug 11, 2021

Hello @kunalspathak!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good to Me

@ghost ghost merged commit 0812322 into dotnet:main Aug 11, 2021
@BruceForstall
Copy link
Member

@kunalspathak Love this work!

Are you planning to move the pipeline from internal to public, so it can be auto-triggered or manually triggered on JIT PR's? Or is there some reason why it must remain on "internal" and run post-merge?

@kunalspathak
Copy link
Member Author

We could definitely make it public but there is one scenario in which it doesn't work - If the PR changes the JITEE guid, then we don't get new collection for new guid because we haven't collected it. What should ideally happen in such cases is that collection pipeline should trigger the replay pipeline. But other than that, this can be moved to public (let me know if you plan on doing it).

@BruceForstall
Copy link
Member

  1. The "run" legs of the pipeline take 1.5 or 2 hours. Is that because they do all the work on one Helix machine each? Would it be hard to split the work up to run more in parallel? We'd need to do this if we want to run in a PR.
  2. Did implementing this find any asserts in the JIT? Did you test that if an assert is found, the job is marked as failing?

@kunalspathak
Copy link
Member Author

  • The "run" legs of the pipeline take 1.5 or 2 hours. Is that because they do all the work on one Helix machine each? Would it be hard to split the work up to run more in parallel? We'd need to do this if we want to run in a PR.

Currently, the partitions are hardcoded in superpmi-replay.proj. Further splitting based on collection would definitely reduce the run time, but the list in below file will grow (no. of collection X 6 entries), but is definitely do-able.

<ItemGroup Condition="'$(Architecture)' == 'x64'">
<SPMI_Partition Include="win-x64" Platform="windows" Architecture="x64" />
<SPMI_Partition Include="win-arm64" Platform="windows" Architecture="arm64" />
<SPMI_Partition Include="unix-x64" Platform="Linux" Architecture="x64" />
<SPMI_Partition Include="unix-arm64" Platform="Linux" Architecture="arm64" />
</ItemGroup>
<ItemGroup Condition="'$(Architecture)' == 'x86'">
<SPMI_Partition Include="win-x86" Platform="windows" Architecture="x86" />
<SPMI_Partition Include="unix-arm" Platform="Linux" Architecture="arm" />
</ItemGroup>

  • Did implementing this find any asserts in the JIT? Did you test that if an assert is found, the job is marked as failing?

I didn't see any assert yet. The pipeline failed once and reported when we didn't had .mch files to run on.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 25, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create SuperPMI replay job
4 participants