-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new AIX machines #459
Comments
First machine added (https://ci.nodejs.org/computer/test-osuosl-aix61-ppc64_be-1/). A few tweaks to the instructions were required. PR for those here: built/ran tests, 2 new failures. Failures covered by this issue: nodejs/node#7973 |
Build/test time seems to be ~30 mins on the new machine, which is about the same as the slowest platforms so once we get the second machine on-line will plan to add to regular regression runs. |
@mhdawson is that with ccache? 30min tests sounds awfully slow. |
ccache is installed and I think it should be being used but I was planning to double check |
Good news is ccache was not properly in the path, so we should do better, will fix that up. |
Ok ccache is working now and down 19 mins. It seems that the node-test-commit-linux job seems to typically run 20 mins as do a few of the sub jobs I looked at so seems in the right ballpark. https://ci.nodejs.org/job/node-test-commit-aix/288/nodes=aix61-ppc64/ |
Are you using JOBS? |
Second machine almost ready, just needs to be added to firewall |
@jbergstroem the parallelism was previously set to 1 but I changed that to 5 as part of the change since it looked like that was how many cpus were available on the new machines. So right now it has -j 5 |
I think this means we have 5 procs bash-4.3# /usr/sbin/lsdev -C -c processor |
On the phone, will add shortly! |
(added) |
Second machine in and working now. Will plan to stitch AIX into the regular regression runs on Monday. There are still some failures (AIX was green but there are issues related to malloc bnoorhuis is working on and one new issue seen on the new machines), but I think its still good to be able to tell if commit introduce new failures. |
Ok 2 test machines are up and running. They have been running ok for the last few days and seem to run about 20 mins which is consistent with other linux jobs. Will plan to to change to run test on AIX as part of standard job to test PRs as opposed to nightly some time later today. |
Next step will then be to setup/add the release machine to the release CI. |
Added AIX to node-test-commit. Run here to validate: https://ci.nodejs.org/job/node-test-commit/4475/ |
node-test-commit-aix is red every run. So now all of our CI runs are red. Any chance we can remove it until that is sorted out? |
@mhdawson I disabled the job in EDIT: CI back to green: https://ci.nodejs.org/job/node-test-commit/4489/ |
Thanks @joaocgreis! If this needs to be re-added quickly for some reason, and if the same tests are failing each time, I suppose they could be marked flaky in the status file, although obviously that's not as good as fixing whatever the issue is. |
AIX was green until a little while back until it was broken by some new changes, which then covered up other changes being made that cause more problems. This is bound to happen when its not run as part of the regular job. Now that we have adequate hardware to run AIX for each run, I was hoping we could add AIX even though it was red as it would help more quickly find regressions (even if submitters don't notice because it was already red) and would help the triage if one did sneak in. The failures are consistent and there are 2 issues open to cover the existing failures. There has been enough red in the past I did not think it was going to be a major issue but maybe that's changed now. I can look to see if I can mark them as flaky for just AIX but not sure its the best thing to do. |
Definitely. We've been mostly-green since December and I'd hate to go back to a world where red CI was shrugged off and code landed. There are two ways to mark the tests as flaky, I think. I believe one way results in green and the other way results in yellow. I'd be OK with a yellow CI indicating there's tests that need addressing but not anything that should hold up code landing. /cc @orangemocha in case I'm wrong about that. I'm pretty sure he set that stuff up and I don't actually know how it works. I'm just a happy user. |
I see there are already two tests in https://github.com/nodejs/node/blob/ab3306ad/test/parallel/parallel.status for AIX, it would be great to add the missing ones as well and re-enabling the job. Please test v4 as well (if AIX is to run for v4). @mhdawson if you expect the tests to be corrected soon, you can add them as |
They should be fixed relatively soon. I will add then as PASS,FLAKY so that it shows up as yellow and are still visible. |
ok PR here to mark as flaky |
Build for 4.x https://ci.nodejs.org/job/node-test-commit-aix/335/ |
There are 2 failures on 4.x: The second is one that we are actively investigating in master which was only seen after we moved to the new AIX macihnes. The first seems to have been fixed in libuv as per nodejs/node#3676 which likely has not made it back to 4.x. I'm thinking that given that people can get 4.x AIX builds from the IBM developerworks site with any required fixes for these issues, and 6.x LTS is not that far away that the goal should be having AIX in the community releases for 6.x and just leave 4.x as it is. We can revisit/confirm this once we have AIX downloads for 6.x (as stable) available on the community download page. |
One additional comment, if we think the failures will be an issue in the CI when people do runs against 4.x I'm happy to submit a PR to mark those 2 tests as expected to fail in 4.x. @Trott, @joaocgreis what's your views on that |
If v4 does not support AIX, then CI should not run it for v4. Currently, CI detects node versions v0.x and arm is not run on those. I'll have to extend this mechanism soon for v4 (easy but not much). @mhdawson my view is that for now you're welcome to mark those as flaky. When we have the mechanism to run only on v6, you can keep supporting it or not, your call I guess. |
It would be nice for runs against 4.x to catch any new regressions since we do ship 4.x binaries even if the community does not. I'll submit a PR to mark those 2 as flaky |
build to validate my branch before creating PR for v4.x-staging https://ci.nodejs.org/job/node-test-commit-aix/346 |
A couple of failures on tests marked as flaky for linux so marking flaky on AIX as well. New run https://ci.nodejs.org/job/node-test-commit-aix/351/nodes=aix61-ppc64/console |
PR for 4.x nodejs/node#8076 |
Ok so they have been in the regressions runs for a while now. Still some tests marked as flaky but I'm getting those prioritized so that people in IBM will work to burn down that list. Going to close this issue for now and have opened a separate one for getting the release machine added. |
I now have access to one of the new AIX machines from osuosl, the other 2 are still being configured for networking.
Opening this issue for awareness and to capture any info while doing the install.
The text was updated successfully, but these errors were encountered: