Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm: release worker centos7-arm64 uses GCC4.8 #1542

Closed
refack opened this issue Oct 20, 2018 · 33 comments
Closed

arm: release worker centos7-arm64 uses GCC4.8 #1542

refack opened this issue Oct 20, 2018 · 33 comments

Comments

@refack
Copy link
Contributor

refack commented Oct 20, 2018

Some new code was added in that exposed an stdlibc++4.8 bug that was fixed in 4.9
https://ci-release.nodejs.org/job/iojs+release/3855/nodes=centos7-arm64/
(Node minimal compiler has been gcc4.9 since node9)

/CC @nodejs/release @nodejs/build-infra

@refack
Copy link
Contributor Author

refack commented Oct 29, 2018

As a result ATM we are not building nightlies for ARM7
e.g. https://ci-release.nodejs.org/job/iojs+release/3873/nodes=centos7-arm64/
And will probably fail for 11.x and maybe 10.x is one of several PR that require 4.9 get backported.
P.S. the test CI is setup all 4.9 so we will have now early indication.

/CC @jasnell @targos

P.P.S. I would fix it, but I don't have access to release machines.

@refack
Copy link
Contributor Author

refack commented Oct 30, 2018

/CC @mhdawson @joaocgreis @rvagg @jbergstroem

P.S. it seems like the Ansible playbook is all set up, just need to run it on release-packetnet-centos7-arm64-1

@mhdawson
Copy link
Member

I can run it, would just like @rvagg to confirm that he is comfortable that we just need to run the playbook.

@mhdawson
Copy link
Member

mhdawson commented Oct 30, 2018

In test I see that we have a number of different types of machines:

centos7-arm64-gcc48
centos7-arm64-gcc6

Not sure which versions of the compiler are used for
debian7-docker-armv7
debian8-docker-armv7
ubuntu1604-arm64

From the building docs minimum levels are:

GCC 4.9.4 for 8.X and higher
GCC 4.8.5 for 6.X

My key concern is that I think we may still need to build 6.x on 4.8.5 on the release machines and I don't see any compiler selection logic for ARM and we only have a single machine in the release CI for arm64

@refack
Copy link
Contributor Author

refack commented Oct 30, 2018

centos7-arm64-gcc48
centos7-arm64-gcc6

AFAIK these are "virtual" tags that are used to select different version of devtoolset
gcc48 used the default devtoolset2
gcc6 does . /opt/rh/devtoolset-6/enabl

@mhdawson
Copy link
Member

@gdams is looking at the ansible scripts to get on top of what compiler level is installed for centos7-arm64 based on the updated ansible scripts.

@mhdawson
Copy link
Member

@refack good to know, but next question is if we have the equivalent for 4.9.4. I can see test-packetnet-centos7-arm64-1 has those 2 tags but don't see any 4.9.4 equivalent.

@gdams
Copy link
Member

gdams commented Oct 30, 2018

so we are running this to add gcc 4.9.4 on rhel7.x, Maybe we need the same logic on Centos7.x?

@mhdawson
Copy link
Member

I think that might have been for IBM platforms since we did not believe there as a version of the redhat developer toolset available. It seems like there is a version on arm based on the tags and this in the ARM64 job in test

 if [[ "$nodes" =~ centos[67]-(arm)?64-gcc6 ]]; then
    exec_cmd=". /opt/rh/devtoolset-6/enable; $exec_cmd"
  fi

@gdams
Copy link
Member

gdams commented Oct 30, 2018

so the following are installed on centos7:

  centos7: [
    'ccache,gcc-c++,devtoolset-6,sudo',
  ],

@mhdawson
Copy link
Member

mhdawson commented Oct 30, 2018

@gdams does the ansible script for centOS ARM 64 ensure we have both devtoolset-6 as well as the 4.8.4 compiler?

@refack
Copy link
Contributor Author

refack commented Oct 30, 2018

AFAIK devtoolset is backward and forward compatible (#809 (comment)), so I think that if we have gcc4.8 for node < 11 (status quo) and new gcc6 for >= we solved the issue.

@mhdawson
Copy link
Member

I do think that the dev toolset is more compatible so what @refack says could be correct.

@mhdawson
Copy link
Member

This is what is in the selector script

[ /centos[67]-(arm)?(64|32)-gcc48/, anyType,     gte(10) ],
[ /centos[67]-(arm)?(64|32)-gcc6/,  anyType,     lt(10)  ],

Current tag in release is centos7-arm64 which does not cause selection on either of those.

@mhdawson
Copy link
Member

Not quite sure why 10 was chosen as the cutover above since the change is in fact between 6.x and 8.x

@mhdawson
Copy link
Member

Comments in this issue #1203 indicate Rod was concerned that installing the new dev toolset might affect existing builds.

@mhdawson
Copy link
Member

It also adds some insight on why the switch might have been at 9 versus 8 (problems we'd seen on 8.x).

@mhdawson
Copy link
Member

So what I'd like as next steps are:

  1. Confirm that the ansible scripts would result in having both 4.8.4 and the dev toolset-6
  2. Confirm that we are comfortable having both installed on the same machine (ie address/resolve any concerns from ansible: use gcc 4.9.4 or newer on CentOS 6 #1203). An alternative might be adding a new machine in release. At this point I'd need @rvagg to comment if that was feasible.

In terms of risk/reward I think it's better not to have nightlies on ARM64 for a little while versus potentially breaking our 6.X and 8.X LTS releases. Based on that I'd like to take the time so that we can answer the questions above and get input from @rvagg.

@mhdawson
Copy link
Member

My preference is adding a new machine. I'll look to see if it is documented how I could add a new machine.

@mhdawson
Copy link
Member

@nodejs/build if people are ok with giving @gdams infra level access to at least the packet hosting infrastructure, we could probably get the new machines spun up more quickly.

@mhdawson
Copy link
Member

mhdawson commented Oct 30, 2018

In addition to steps from above, we probably also need to validate that release binaries built on devtoolset-6 run ok on a centos7 machine with a 4.9.4 compiler.

@refack
Copy link
Contributor Author

refack commented Oct 30, 2018

@nodejs/build if people are ok with giving @gdams infra level access to at least the packet hosting infrastructure, we could probably get the new machines spun up more quickly.

I think this needs to be discussed in a wider context (no offence intended George).

@mhdawson
Copy link
Member

@refack I'm ok with discussing in wider context, just thought I'd mention since it might take me a bit of time to get a new machine going.

@mhdawson
Copy link
Member

I logged into the packet infra and when I go to add a new machine, none of the options are ARM. Not sure we'd have the ok to add another one anyway as they are pretty big machines with 96 cores and 32G or RAM.

So I think its back to figuring if we are comfortable having both installed on the same machine. Would like input from @rvagg on that front.

@mhdawson
Copy link
Member

Another option is using docker on the existing machine so we can add a new machine sharing the resources without the potential for messing up the release machine. We'd probably have to prove that out in test first through.

@mhdawson
Copy link
Member

@rvagg want to make sure this is still on your radar.

@rvagg
Copy link
Member

rvagg commented Nov 14, 2018

Yeah, sorry, this is a big one that takes some brain space so I've not got to it quicker.

So here's what I'm thinking our core problem is: ci-release doesn't have the gcc48/gcc6 label split, it's only got the one centos7-arm64 machine with a single label centos7-arm64, plus there's no logic in place to switch devtoolset based on version.

This is in ci for node-test-commit-arm:

  if [[ "$nodes" =~ centos[67]-(arm)?64-gcc6 ]]; then
    exec_cmd=". /opt/rh/devtoolset-6/enable; $exec_cmd"
  fi

Our VersionSelectorScript.groovy has:

  [ /centos[67]-(arm)?(64|32)-gcc48/, anyType,     gte(10) ],
  [ /centos[67]-(arm)?(64|32)-gcc6/,  anyType,     lt(10)  ],

So on ci, it's selecting gcc6 for >=10 and that's working fine so we have all green.

On ci-release, the version selector has no impact on arm64 because the label centos7-arm64 doesn't feature at all. So we're using the default devtoolset (2), i.e. gcc 4.8.

So, given that, here's my proposed solution:

  1. Introduce the gcc48/gcc6 labelling on ci-release
  2. Introduce the devtoolset-6 invoker script (above) in the ci-release iojs+release script for linux-gnu (it's already specific enough to only invoke for arm64)

Then, optionally, on the next Node 10 and 11 release announcements, we could state that we've switched build environment for ARM64 so upgrades should proceed with caution. However, as we've already discovered, Red Hat are doing funky stuff with devtoolset such that it doesn't appear to have an impact on the ability of the binaries to run on library level of the system they are built on. So systems with libc (and libstdc++, I think) versions of at least as new as Cent OS 7 should be fine. So we could just cross our fingers and hope nobody is impacted. The usage level of ARM64 is pretty low (one of my next jobs is to get the download numbers working again, so I don't have current numbers).

How does that sound @refack, @mhdawson, @gdams?

@refack
Copy link
Contributor Author

refack commented Nov 14, 2018

  • Introduce the gcc48/gcc6 labelling on ci-release
  • Introduce the devtoolset-6 invoker script (above) in the ci-release iojs+release script for linux-gnu (it's already specific enough to only invoke for arm64)

Sound great. It works on the public CI 💯

@mhdawson
Copy link
Member

@rvagg that all sounds reasonable, but the key question was if we felt comfortable in running the ansible script to add devtoolset-6 onto the release machine as it does not current exist there.

@mhdawson
Copy link
Member

If the answer is yes then we just need to run the ansible script and update labels, invoker in the release infra.

@refack
Copy link
Contributor Author

refack commented Nov 14, 2018

Aside: we have explicit guarantee form glibc fork, that devtoolset-6 is backwards and forwards ABI compatible. So the steps suggested are us being triple safe.

@rvagg
Copy link
Member

rvagg commented Nov 15, 2018

done, last nightly rebuilt all green https://nodejs.org/download/nightly/v12.0.0-nightly201811153212f77ac6/

running test builds for 10, 8 and 6 now just to make sure it's doing the right thing.

@rvagg rvagg closed this as completed Nov 15, 2018
@rvagg
Copy link
Member

rvagg commented Nov 15, 2018

All good, but we're now getting an error on v8-canary builds running gcc 6 on centos 7 arm64. This appears to be the same error generated by ppcle-ubuntu1404-release-64 on v8-canary builds too. The plain x64 centos 6 gcc 6 is fine with it, though.

21:07:38   g++ -o /home/iojs/build/ws/out/Release/mksnapshot -pthread -rdynamic  -Wl,--start-group /home/iojs/build/ws/out/Release/obj.target/mksnapshot/deps/v8/src/snapshot/mksnapshot.o /home/iojs/build/ws/out/Release/obj.target/deps/v8/gypfiles/libv8_base.a /home/iojs/build/ws/out/Release/obj.target/deps/v8/gypfiles/libv8_init.a /home/iojs/build/ws/out/Release/obj.target/deps/v8/gypfiles/libv8_libbase.a /home/iojs/build/ws/out/Release/obj.target/deps/v8/gypfiles/libv8_libplatform.a /home/iojs/build/ws/out/Release/obj.target/deps/v8/gypfiles/libv8_nosnapshot.a /home/iojs/build/ws/out/Release/obj.target/tools/icu/libicui18n.a /home/iojs/build/ws/out/Release/obj.target/deps/v8/gypfiles/libv8_libsampler.a /home/iojs/build/ws/out/Release/obj.target/tools/icu/libicuucx.a /home/iojs/build/ws/out/Release/obj.target/tools/icu/libicudata.a /home/iojs/build/ws/out/Release/obj.target/tools/icu/libicustubdata.a /home/iojs/build/ws/out/Release/obj.target/deps/v8/gypfiles/libv8_initializers.a -ldl -lrt -Wl,--end-group
21:07:44   touch 8c8eac620719d8e1694be14664fa13b540e76912.intermediate
21:07:44   LD_LIBRARY_PATH=/home/iojs/build/ws/out/Release/lib.host:/home/iojs/build/ws/out/Release/lib.target:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH; cd ../deps/v8/gypfiles; mkdir -p /home/iojs/build/ws/out/Release/obj.target/v8_snapshot/geni; "/home/iojs/build/ws/out/Release/mksnapshot" --turbo_instruction_scheduling --embedded_src "/home/iojs/build/ws/out/Release/obj.target/v8_snapshot/geni/embedded.cc" --embedded_variant Default --startup_src "/home/iojs/build/ws/out/Release/obj.target/v8_snapshot/geni/snapshot.cc"
21:07:44 
21:07:44 
21:07:44 #
21:07:44 # Fatal error in , line 0
21:07:44 # Check failed: InVM(address, size).
21:07:44 #
21:07:44 #
21:07:44 #
21:07:44 #FailureMessage Object: 0x3ffc2904a90
21:07:44 ==== C stack trace ===============================
21:07:44 
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot(v8::base::debug::StackTrace::StackTrace()+0x18) [0x146ff90]
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot() [0x105fc5c]
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot(V8_Fatal(char const*, int, char const*, ...)+0x178) [0x105a308]
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot(v8::internal::VirtualMemory::Release(unsigned long)+0) [0xcd5f60]
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot(v8::internal::StoreBuffer::SetUp()+0x90) [0xa32ec0]
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot(v8::internal::Heap::SetUp()+0x32c) [0x9e5c7c]
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot(v8::internal::Isolate::Init(v8::internal::StartupDeserializer*)+0x3ec) [0xa63e14]
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot(v8::SnapshotCreator::SnapshotCreator(v8::Isolate*, long const*, v8::StartupData*)+0x94) [0x85f12c]
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot(main+0x184) [0x85281c]
21:07:44     /lib64/libc.so.6(__libc_start_main+0xf0) [0x3ff89d515d4]
21:07:44     /home/iojs/build/ws/out/Release/mksnapshot() [0x85acc4]
21:07:44 /bin/sh: line 1: 48297 Trace/breakpoint trap   "/home/iojs/build/ws/out/Release/mksnapshot" --turbo_instruction_scheduling --embedded_src "/home/iojs/build/ws/out/Release/obj.target/v8_snapshot/geni/embedded.cc" --embedded_variant Default --startup_src "/home/iojs/build/ws/out/Release/obj.target/v8_snapshot/geni/snapshot.cc"
21:07:44 deps/v8/gypfiles/v8_snapshot.target.mk:16: recipe for target '8c8eac620719d8e1694be14664fa13b540e76912.intermediate' failed
21:07:44 make[2]: *** [8c8eac620719d8e1694be14664fa13b540e76912.intermediate] Error 133
21:07:44 rm e0d002c729dedf142e7449366bb6c7632f3aa716.intermediate 8c8eac620719d8e1694be14664fa13b540e76912.intermediate dc0aa0889910e117de695888bc7126f9a7a14634.intermediate
21:07:44 Makefile:99: recipe for target 'node' failed
21:07:44 make[1]: *** [node] Error 2
21:07:44 Makefile:1004: recipe for target 'node-v12.0.0-v8-canary20181115bc23a47c3b-linux-arm64.tar' failed
21:07:44 make: *** [node-v12.0.0-v8-canary20181115bc23a47c3b-linux-arm64.tar] Error 2
21:07:44 Build step 'Conditional steps (multiple)' marked build as failure

@nodejs/v8 is this a known issue with V8 7.2? Should we be concerned and/or preparing build resources to make it work properly? It's the same that's occurring on CI for v8-canary @ https://ci.nodejs.org/job/node-test-commit-arm/20000/nodes=centos7-arm64-gcc6/

On our x64 machine: gcc (GCC) 6.3.1 20170216 (Red Hat 6.3.1-3)
On the arm64 machine: gcc (GCC) 6.3.1 20170216 (Red Hat 6.3.1-3)
On the ppc machine: gcc (Ubuntu 4.9.4-2ubuntu1~14.04.1) 4.9.4

Not reopening because original issue is solved, without hearing back I'm just going to assume the V8 folks are on to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants