Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Windows Arm64 tests in CI #3046

Closed
niyas-sait opened this issue Oct 5, 2022 · 90 comments · Fixed by #3250
Closed

Enable Windows Arm64 tests in CI #3046

niyas-sait opened this issue Oct 5, 2022 · 90 comments · Fixed by #3250

Comments

@niyas-sait
Copy link

Node CI had a pipeline for testing Windows Arm64 platform sometimes back but it got disabled as some tests were failing. See #2540 (comment)

Linaro has been looking at reviving the CI job and now we have a new standalone job ( https://ci.nodejs.org/job/windows-arm-simple/ ) that is running with all tests passing.

As discussed during the build meet-up in #2994, Re-enabling the CI again would be crucial to avoid regressions and to officially support the Windows Arm64 platform as requested in #2540

We will need to re-enable the Windows Arm64 test stage to https://ci.nodejs.org/job/node-test-binary-windows-js-suites/

@niyas-sait
Copy link
Author

Cc @pbo-linaro, @joaocgreis, @mhdawson, @sxa

Moving the email thread to GitHub issue for visibility and tracking

@pbo-linaro
Copy link

pbo-linaro commented Oct 5, 2022

Job windows-arm-simple is now set to run daily, to evaluate stability in time.
For now, build for arm64 is already done here, but not testing.

@niyas-sait
Copy link
Author

@pbo-linaro Last nightly run was failing https://ci.nodejs.org/job/windows-arm-simple/nodes=win10-vs2019-arm64/39/. Could you have a look ?

@pbo-linaro
Copy link

pbo-linaro commented Oct 12, 2022

After investigation, I can confirm it is flaky tests, randomly failing because of wrong order in output expected.

Those tests are concerned:

  • sequential/test-watch-mode-inspect.mjs
  • sequential/test-watch-mode.mjs

@niyas-sait
Copy link
Author

After investigation, I can confirm it is flaky tests, randomly failing because of wrong order in output expected.

Those tests are concerned:

  • sequential/test-watch-mode-inspect.mjs
  • sequential/test-watch-mode.mjs

Thanks, @pbo-linaro ! Is it possible to fix the tests to not depend on a particular order?

@pbo-linaro
Copy link

I'm trying to contact author of those commits to see if that is normal, and what we can do with it.

@sxa
Copy link
Member

sxa commented Oct 12, 2022

Have you got a link to the relevant commits/PRs for reference - is it the ones that added those tests? And do they have the same problem on win-x64?

The bug theme of this year in multiple projects I've been on has been sorting/ordering problems!

@pbo-linaro
Copy link

pbo-linaro commented Oct 12, 2022

PR:

@MoLow (author). Do you know if something could create trouble with this?

On my machine, I can make some failure appear by increasing the charge of machine, so something clearly has problem with async dependencies.
Why does it appear much more on Windows on Arm machine? Honestly, I don't know, because they are slower, or the presence of fast/slow cores, that create some specific race conditions.

@pbo-linaro
Copy link

@sxa I suppose win-x64 CI did not fail with this, else those tests should already been marked as flaky. On our WoA machines, we reproduced that with x64 binaries (instead of arm64), so it's not architecture/code specific.

@richardlau
Copy link
Member

After investigation, I can confirm it is flaky tests, randomly failing because of wrong order in output expected.

Those tests are concerned:

  • sequential/test-watch-mode-inspect.mjs
  • sequential/test-watch-mode.mjs

nodejs/node#44898?

@pbo-linaro
Copy link

Thanks @richardlau, I missed that one, but that's exactly the issue.

@aduh95
Copy link
Contributor

aduh95 commented Oct 13, 2022

Not sure if it's the correct thread to report this, but I'm getting insufficient disk space to complete link on https://ci.nodejs.org/job/node-compile-windows/47458/nodes=win-vs2019-arm64/, which blocks nodejs/node#44250 from landing.

@pbo-linaro
Copy link

pbo-linaro commented Oct 13, 2022

@aduh95 It is not built on an arm64 machine, but on this agent: test-rackspace-win2012r2_vs2019-x64-1

you can probably open another issue, or simply ask on nodejs slack build channel, if you have access to it.

@joaocgreis
Copy link
Member

ARM64 Windows was never enabled by default in the main node-test-* jobs. It has been there as an option but has to be explicitly selected.

I'm not sure we can enable it by default for all runs with the current resources we have now. The machines we have might be enough, but I expect many jobs will queue up when CI is under load. Also, if the machines fail, it would be good to have someone around who knows how to disable only ARM64, without disabling the whole Windows job.

I suggest we try enabling it, but be ready to disable it quickly if it starts causing issues. We only need to edit the main fanned job and change the default value of RUN_ARM64_TESTS to be set by default.

If tests are failing on master, at some point the procedure was to mark them flaky in https://github.com/nodejs/node/blob/main/test/sequential/sequential.status until a fix lands.

I tried running CI for v18.11.0 with ARM64 enabled, but it didn't run any of the relevant test-binary configurations. For example for https://ci.nodejs.org/job/node-test-binary-windows-js-suites/17311/: I see RUN_ARM64_TESTS is true in https://ci.nodejs.org/job/node-test-binary-windows-js-suites/17311/parameters/ but it is considered false in https://ci.nodejs.org/job/node-test-binary-windows-js-suites/17311/console . I checked the safeParameters list and it was not there, so I added it. I believe Jenkins needs to be restarted to update that variable, I'll do that over the weekend if it doesn't happen before. Anyway, unless that variable was recently updated, the cause might be anything else.

@pbo-linaro
Copy link

That would be very nice to start running them.
We can try for 24 hours, and see if jobs queue gets larger and larger.

For flaky tests reported here, I created this PR nodejs/node#45049. Different platforms reported problems, so I assume it was good to mark them flaky for all of them.

@pbo-linaro
Copy link

@joaocgreis To clarify things a bit more, neither @nsait-linaro nor me have access to what you describe, so that would be nice if you could change that value to enable tests for this platform.

@niyas-sait
Copy link
Author

Sounds good. Let's start the run and we can keep an eye on the load and stability.

We can add more machines/agents if more resources are required.

@pbo-linaro I guess we have to wait for your PR nodejs/node#45049 to land before setting RUN_ARM64_TESTS. @joaocgreis or @sxa - I hope you can help with that.

@pbo-linaro
Copy link

Thanks to nodejs/node#45049 merged, the windows-arm-simple is now green again! 👍

Can we start enabling official job for this platform (despite current limitation with only two machines available)? Thanks

@niyas-sait
Copy link
Author

Builds have been working fine for the last few days, I think we should enable it and see it in action.

@joaocgreis @sxa Could you please help to enable it ?

@sxa
Copy link
Member

sxa commented Nov 8, 2022

Hi - yes I've been tied up with release work elsewhere but I should be able to look at this in the next few days, under the caveat that if there are any problems with it the first course of action will likely be to disable it again immediately to avoid any problems with contributors being able to run PR tests :-)

@pbo-linaro
Copy link

Yes sure, we don't want to block anything for contributors! The windows-arm-simple job has been green since we fixed it, so it should be fine.

@joaocgreis
Copy link
Member

Just to give an update from my side, I still don't understand why the test-binary jobs don't invoke any of the ARM configurations, the fix I tried above didn't work. Enabling the checkbox by default will not do anything until we're able to fix this.

@pbo-linaro
Copy link

@joaocgreis Is there someone responsible for NodeJS jenkins instance that could help to debug this configuration?

@sxa
Copy link
Member

sxa commented Nov 15, 2022

I'm back from vacation now :-) I had a shot at re-enabling in the CI and it's running at https://ci.nodejs.org/job/node-compile-windows/nodes=win-vs2019-arm64/ albeit a bit slow so is causing some delays (I just re-enabled the cross-compiled jobs that were already in the system to get something running) but that should be resolved when nodejs/node#45432 goes in.

We should probably also persue native builds (consistent with what we've been doing in https://ci.nodejs.org/job/windows-arm-simple/) potentially as a separate pipeline for now so should work on getting https://ci.nodejs.org/job/node-test-commit-windows-arm running and use that as our primary build+test pipeline for windows/arm instead of the cross-compilation now.

@pbo-linaro
Copy link

Thanks @sxa. Is there anything we can do to help with this? Seems like its jenkins configuration work, for which we don't have access for now.

@niyas-sait
Copy link
Author

niyas-sait commented Nov 16, 2022

We should probably also persue native builds (consistent with what we've been doing in https://ci.nodejs.org/job/windows-arm-simple/) potentially as a separate pipeline for now so should work on getting https://ci.nodejs.org/job/node-test-commit-windows-arm running and use that as our primary build+test pipeline for windows/arm instead of the cross-compilation now.

Yes, who has the privilege to set it up?

@niyas-sait
Copy link
Author

@sxa @joaocgreis @pbo-linaro @StefanStojanovic Can we have a sync call to figure out the next steps for this? This has been pending for quite some time and I think we could probably find a way forward with a short sync-up.

@sxa
Copy link
Member

sxa commented Nov 24, 2022

There was a bug which hadn't been spotted from when we upgraded the jenkins server to Java 11 that prevented the RUN_ARM64_TESTS flag from being honoured. I've got https://github.com/nodejs/build/pull/3097/files that will fix it, then we can do some runs of it using the existing "fanned" jobs - I'm running some of those just now in a private job to verify that they work

@sxa
Copy link
Member

sxa commented Nov 24, 2022

Second step is to try and get this running as a "normal" (non-fanned) combined build+test job like we do on most of the other platforms, which is what node-test-commit-window-arm will do - with the limited amount of machines here, and certainly not the multitude of windows versions that we have for x64 it doesn't make too much sense to use the fanned jobs.

It's likely to be trivial stuff to fix given that we've got this all working in the simple job (the current thing that's stopping it is the absence of JENKINS_TMP_KEY which

for /f "delims=" %%F in ('cygpath -u %JENKINS_TMP_KEY%') do set "GIT_SSH_COMMAND=ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -i %%F"
expects.

@joaocgreis @StefanStojanovic I'm guessing we don't have a natural path through the Windows build scripts that be activated for a combined build+test at the moment (e.g. excluding the need for JENKINS_TMP_KEY?

@joaocgreis
Copy link
Member

@StefanStojanovic and I are testing the Ansible scripts for ARM64 Windows, we're getting close to having VMs in CI. As part of this, Stefan is running the scripts on the nearform machines as well and disabled nssm. Please expect some instability in the two machines for a few days.

@sxa
Copy link
Member

sxa commented Feb 6, 2023

@pbo-linaro We seem to have quite a lot of failed tests in the CI just now. Is that expected? I wouldn't have through they were directly related to the work that @joaocgreis and @StefanStojanovic are doing just now

@pbo-linaro
Copy link

Yes, we have been discussing this with @joaocgreis and @StefanStojanovic. The plan is to deactivate the 5 concerned tests, reenable the CI (so no more regressions appear), and then fix those 5 tests.

Considering they are already broken in current state, it's not wrong to do it this way. On the opposite, trying to fix anything before CI is enabled would be wrong.

StefanStojanovic added a commit to JaneaSystems/build that referenced this issue Mar 2, 2023
Windows 10 and 11 machines occasionally show finish setting up device
screen thus preventing Jenkins from starting after reboot. These changes
disable that behavior.

Refs: nodejs#3046
StefanStojanovic added a commit to JaneaSystems/build that referenced this issue Mar 8, 2023
Windows 10 and 11 machines occasionally show finish setting up device
screen thus preventing Jenkins from starting after reboot. These changes
disable that behavior.

Refs: nodejs#3046
PR-URL: nodejs#3211
Reviewed-By: Richard Lau <[email protected]>
StefanStojanovic added a commit that referenced this issue Mar 8, 2023
Windows 10 and 11 machines occasionally show finish setting up device
screen thus preventing Jenkins from starting after reboot. These changes
disable that behavior.

Refs: #3046
PR-URL: #3211
Reviewed-By: Richard Lau <[email protected]>
@joaocgreis
Copy link
Member

A quick update: VS2019 is not officially supported for running on ARM64 Windows, so we've had to use VS2022. We now have 6 ARM64 Windows VMs connected to CI.

Here is a test run: https://ci.nodejs.org/job/node-test-commit-windows-fanned/54031/ . This run is for nodejs/node#47020 , and once that PR lands we can move forward here and enable tests to run by default.

@niyas-sait
Copy link
Author

Thanks, @joaocgreis. nodejs/node#47020 has been landed. Hope we can enable the CI by default now.

@joaocgreis
Copy link
Member

ARM64 Windows tests are running by default 🎉

Will monitor for issues for a few days, this can then be closed after #3250 lands.

@pbo-linaro
Copy link

Excellent! Great work @joaocgreis and @StefanStojanovic!

StefanStojanovic added a commit to JaneaSystems/node that referenced this issue Mar 23, 2023
StefanStojanovic added a commit to JaneaSystems/node that referenced this issue Mar 23, 2023
StefanStojanovic added a commit to JaneaSystems/node that referenced this issue Mar 23, 2023
nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Mar 27, 2023
Refs: #47020
Refs: nodejs/build#3046
PR-URL: #47235
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Debadree Chatterjee <[email protected]>
richardlau pushed a commit to nodejs/node that referenced this issue Mar 28, 2023
Refs: #47020
Refs: nodejs/build#3046
PR-URL: #47235
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Debadree Chatterjee <[email protected]>
nodejs-github-bot pushed a commit to nodejs/node that referenced this issue Mar 30, 2023
Refs: nodejs/build#3046
Refs: nodejs/build#2540
Fixes: #25998
PR-URL: #47233
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
RafaelGSS pushed a commit to nodejs/node that referenced this issue Apr 5, 2023
Refs: nodejs/build#3046
Refs: nodejs/build#2540
Fixes: #25998
PR-URL: #47233
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
RafaelGSS pushed a commit to nodejs/node that referenced this issue Apr 6, 2023
Refs: nodejs/build#3046
Refs: nodejs/build#2540
Fixes: #25998
PR-URL: #47233
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
RafaelGSS pushed a commit to nodejs/node that referenced this issue Apr 7, 2023
Refs: #47020
Refs: nodejs/build#3046
PR-URL: #47235
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Debadree Chatterjee <[email protected]>
RafaelGSS pushed a commit to nodejs/node that referenced this issue Apr 7, 2023
Refs: nodejs/build#3046
Refs: nodejs/build#2540
Fixes: #25998
PR-URL: #47233
Reviewed-By: Matteo Collina <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Luigi Pinca <[email protected]>
danielleadams pushed a commit to nodejs/node that referenced this issue Jul 6, 2023
Refs: #47020
Refs: nodejs/build#3046
PR-URL: #47235
Reviewed-By: Luigi Pinca <[email protected]>
Reviewed-By: Debadree Chatterjee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants