Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beats crashing with glibc 2.35 - Fatal glibc error: rseq registration failed #30576

Closed
belimawr opened this issue Feb 24, 2022 · 9 comments · Fixed by #30620
Closed

Beats crashing with glibc 2.35 - Fatal glibc error: rseq registration failed #30576

belimawr opened this issue Feb 24, 2022 · 9 comments · Fixed by #30620
Assignees
Labels
bug libbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@belimawr
Copy link
Contributor

belimawr commented Feb 24, 2022

I am seeing some Beats (multiple versions, different builds) crashing shortly after startup on Linux with glibc 2.35. They crash with the following error message:

Fatal glibc error: rseq registration failed

I looked at the coredump and it only lists some runtime assembly code:

0  0x0000000000f5d8a1 in runtime.raise
   at /usr/local/go/src/runtime/sys_linux_amd64.s:165
1  0x0000000000f5bfe1 in runtime.goexit
   at /usr/local/go/src/runtime/asm_amd64.s:1581

So far I am not sure whether it's a problem with Libbeat or the Go runtime. I opened an issue on Go's issue tracker because it seems to be related to the runtime: golang/go#51315

So far I've seen Filebeat and Auditbeat crashing, it does not matter if I download one of the official releases or build them myself, the result is always the same.

Interestingly, if I build with CGO_ENALBED=0, they run without any issues.

The Elastic-Agent seems to work fine, but the beats under it are all failing:

[root@archlinux Agent]# ./elastic-agent status
Status: FAILED
Message: (no message)
Applications:
  * metricbeat             (DEGRADED)
                           Missed last check-in
  * filebeat               (FAILED)
                           Missed two check-ins
  * filebeat_monitoring    (DEGRADED)
                           Missed last check-in
  * metricbeat_monitoring  (HEALTHY)
                           Running
[root@archlinux Agent]# coredumpctl list
TIME                         PID UID GID SIG     COREFILE EXE                                                                                            SIZE
Thu 2022-02-24 15:57:30 UTC  603   0   0 SIGABRT none     /home/vagrant/auditbeat-7.15.2-linux-x86_64/auditbeat                                           n/a
Thu 2022-02-24 15:57:47 UTC  620   0   0 SIGABRT none     /home/vagrant/auditbeat-7.15.2-linux-x86_64/auditbeat                                           n/a
Thu 2022-02-24 16:04:28 UTC  738   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     11.0M
Thu 2022-02-24 16:04:29 UTC  781   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 13.9M
Thu 2022-02-24 16:05:09 UTC  756   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     11.7M
Thu 2022-02-24 16:05:10 UTC  835   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     10.8M
Thu 2022-02-24 16:05:31 UTC  729   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 14.1M
Thu 2022-02-24 16:05:33 UTC  861   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 13.8M
Thu 2022-02-24 16:07:31 UTC  889   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 13.9M
Thu 2022-02-24 16:07:33 UTC  905   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 13.8M
Thu 2022-02-24 16:07:34 UTC  914   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     11.2M
Thu 2022-02-24 16:07:34 UTC  921   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     11.0M
Thu 2022-02-24 16:08:05 UTC  970   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     11.1M
Thu 2022-02-24 16:08:08 UTC  986   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     11.2M
Thu 2022-02-24 16:08:08 UTC  993   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 13.8M
Thu 2022-02-24 16:08:11 UTC 1015   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 13.8M
Thu 2022-02-24 16:11:14 UTC 1205   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 13.9M
Thu 2022-02-24 16:11:47 UTC 1246   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     10.8M
Thu 2022-02-24 16:11:47 UTC 1268   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     10.8M
Thu 2022-02-24 16:12:18 UTC 1282   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     11.0M
Thu 2022-02-24 16:12:19 UTC 1292   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 13.7M
Thu 2022-02-24 16:12:19 UTC 1308   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/filebeat-8.0.0-linux-x86_64/filebeat     10.9M
Thu 2022-02-24 16:12:25 UTC 1239   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 14.2M
Thu 2022-02-24 16:12:26 UTC 1344   0   0 SIGABRT present  /opt/Elastic/Agent/data/elastic-agent-2ab3a7/install/metricbeat-8.0.0-linux-x86_64/metricbeat 13.8M
[root@archlinux Agent]# 

How to reproduce it

The easiest way to reproduce is to run some beats on a Arch Linux VM, with Vagrant that's pretty easy:

vagrant init archlinux/archlinux
vagrant up
vagrant ssh
# into the machine
sudo pacman -Syu --noconfirm # Update all packages, including glibc
systemctl reboot

# The ssh connection will close and you're back to the host
# Wait the VM to finish rebooting, then:
vagrant ssh 

# Make sure you're running glibc 2.35
pacman -Ss glibc | grep installed

# have fun running beats.
@belimawr belimawr added bug libbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Feb 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@belimawr belimawr self-assigned this Feb 24, 2022
@mickymiek
Copy link

Hitting this as well because of a specific case.
We're using this docker image filebeat-oss:8.0.0-rc2 and install a more recent version of systemd library (https://packages.ubuntu.com/jammy/systemd) in it because of this issue: #30398. It was working fine until a new jammy update introduced glibc 2.35.
Any idea how to work around this?
Thank you

@belimawr
Copy link
Contributor Author

belimawr commented Mar 1, 2022

Hitting this as well because of a specific case. We're using this docker image filebeat-oss:8.0.0-rc2 and install a more recent version of systemd library (https://packages.ubuntu.com/jammy/systemd) in it because of this issue: #30398. It was working fine until a new jammy update introduced glibc 2.35. Any idea how to work around this? Thank you

@mickymiek, unfortunately, the only workaround we've found so far is to disable CGO, however the journald input depends on CGO. It's not an option for you.

You could try my PR with the fix. TLDR; it's as simple (now that we know it) as to allow the rseq systemcall (or not use it, that's what happens when we disable CGO).

belimawr added a commit to belimawr/beats that referenced this issue Mar 1, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: elastic#30576
@belimawr
Copy link
Contributor Author

belimawr commented Mar 1, 2022

@mickymiek, we found a way to workaround this issue (thanks to @ph ).

It is possible to specify the seccomp policy in the configuration file, so you just need to add rseq there.

The default policy for x86_64 is here:

defaultPolicy = &seccomp.Policy{
DefaultAction: seccomp.ActionErrno,
Syscalls: []seccomp.SyscallGroup{
{
Action: seccomp.ActionAllow,
Names: []string{
"accept",
"accept4",
"access",
"arch_prctl",
"bind",
"brk",
"chmod",
"chown",
"clock_gettime",
"clone",
"clone3",
"close",
"connect",
"dup",
"dup2",
"epoll_create",
"epoll_create1",
"epoll_ctl",
"epoll_pwait",
"epoll_wait",
"exit",
"exit_group",
"fchdir",
"fchmod",
"fchmodat",
"fchown",
"fchownat",
"fcntl",
"fdatasync",
"flock",
"fstat",
"fstatfs",
"fsync",
"ftruncate",
"futex",
"getcwd",
"getdents",
"getdents64",
"geteuid",
"getgid",
"getpeername",
"getpid",
"getppid",
"getrandom",
"getrlimit",
"getrusage",
"getsockname",
"getsockopt",
"gettid",
"gettimeofday",
"getuid",
"inotify_add_watch",
"inotify_init1",
"inotify_rm_watch",
"ioctl",
"kill",
"listen",
"lseek",
"lstat",
"madvise",
"mincore",
"mkdirat",
"mmap",
"mprotect",
"munmap",
"nanosleep",
"newfstatat",
"open",
"openat",
"pipe",
"pipe2",
"poll",
"ppoll",
"pread64",
"pselect6",
"pwrite64",
"read",
"readlink",
"readlinkat",
"recvfrom",
"recvmmsg",
"recvmsg",
"rename",
"renameat",
"rt_sigaction",
"rt_sigprocmask",
"rt_sigreturn",
"sched_getaffinity",
"sched_yield",
"sendfile",
"sendmmsg",
"sendmsg",
"sendto",
"set_robust_list",
"setitimer",
"setsockopt",
"shutdown",
"sigaltstack",
"socket",
"splice",
"stat",
"statfs",
"sysinfo",
"tgkill",
"time",
"tkill",
"uname",
"unlink",
"unlinkat",
"wait4",
"waitid",
"write",
"writev",

So you can add somehting like this to your configuration file (just make sure you trust the policy before deploying it to your environment):

seccomp:
  default_action: errno
  syscalls:
  - action: allow
    names:
      - accept
      - accept4
      - access
      - arch_prctl
      - bind
      - brk
      - chmod
      - chown
      - clock_gettime
      - clone
      - clone3
      - close
      - connect
      - dup
      - dup2
      - epoll_create
      - epoll_create1
      - epoll_ctl
      - epoll_pwait
      - epoll_wait
      - exit
      - exit_group
      - fchdir
      - fchmod
      - fchmodat
      - fchown
      - fchownat
      - fcntl
      - fdatasync
      - flock
      - fstat
      - fstatfs
      - fsync
      - ftruncate
      - futex
      - getcwd
      - getdents
      - getdents64
      - geteuid
      - getgid
      - getpeername
      - getpid
      - getppid
      - getrandom
      - getrlimit
      - getrusage
      - getsockname
      - getsockopt
      - gettid
      - gettimeofday
      - getuid
      - inotify_add_watch
      - inotify_init1
      - inotify_rm_watch
      - ioctl
      - kill
      - listen
      - lseek
      - lstat
      - madvise
      - mincore
      - mkdirat
      - mmap
      - mprotect
      - munmap
      - nanosleep
      - newfstatat
      - open
      - openat
      - pipe
      - pipe2
      - poll
      - ppoll
      - pread64
      - pselect6
      - pwrite64
      - read
      - readlink
      - readlinkat
      - recvfrom
      - recvmmsg
      - recvmsg
      - rename
      - renameat
      - rseq
      - rt_sigaction
      - rt_sigprocmask
      - rt_sigreturn
      - sched_getaffinity
      - sched_yield
      - sendfile
      - sendmmsg
      - sendmsg
      - sendto
      - set_robust_list
      - setitimer
      - setsockopt
      - shutdown
      - sigaltstack
      - socket
      - splice
      - stat
      - statfs
      - sysinfo
      - tgkill
      - time
      - tkill
      - uname
      - unlink
      - unlinkat
      - wait4
      - waitid
      - write
      - writev

@mickymiek
Copy link

@belimawr thanks a lot for your answers. We'll give it a try and report back soon

@amigthea
Copy link

amigthea commented Mar 1, 2022

This seems to works as well

seccomp:
  default_action: allow

@mickymiek
Copy link

Adding seccomp configuration fixed it. Thank you guys!

belimawr added a commit that referenced this issue Mar 2, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
mergify bot pushed a commit that referenced this issue Mar 2, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)
mergify bot pushed a commit that referenced this issue Mar 2, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)
mergify bot pushed a commit that referenced this issue Mar 2, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)
@belimawr
Copy link
Contributor Author

belimawr commented Mar 2, 2022

This seems to works as well

seccomp:
  default_action: allow

Yes, this works fine. The only "down side" is that it will allow all syscalls, which reduces security.

No issues using it, just keep in mind what it means.

belimawr added a commit that referenced this issue Mar 9, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)

Co-authored-by: Tiago Queiroz <[email protected]>
belimawr added a commit that referenced this issue Mar 9, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)

Co-authored-by: Tiago Queiroz <[email protected]>
belimawr added a commit that referenced this issue Mar 14, 2022
rseq syscall is available on glibc >= 2.35, and called when CGO is
used. If we don't allow rseq, Beats will eventually crash with an
glibc error: `Fatal glibc error: rseq registration failed`.

Fixes: #30576
(cherry picked from commit f02fa32)

Co-authored-by: Tiago Queiroz <[email protected]>
@svanschalkwyk
Copy link

svanschalkwyk commented Jul 26, 2022

In metricbeat.yml:
seccomp:
default_action: allow
syscalls:

  • action: allow
    names:
    • rseq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug libbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants