Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SONiC core dump utility #3499

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions build_debian.sh
Original file line number Diff line number Diff line change
Expand Up @@ -346,9 +346,7 @@ sudo cp files/image_config/monit/monitrc $FILESYSTEM_ROOT/etc/monit/
sudo chmod 600 $FILESYSTEM_ROOT/etc/monit/monitrc

## Config sysctl
sudo mkdir -p $FILESYSTEM_ROOT/var/core
sudo augtool --autosave "
set /files/etc/sysctl.conf/kernel.core_pattern '|/usr/bin/coredump-compress %e %t %p'

set /files/etc/sysctl.conf/kernel.softlockup_panic 1
set /files/etc/sysctl.conf/kernel.panic 10
Expand Down Expand Up @@ -429,6 +427,8 @@ sudo cp files/dhcp/dhclient.conf $FILESYSTEM_ROOT/etc/dhcp/
if [ -f files/image_config/ntp/ntp ]; then
sudo cp ./files/image_config/ntp/ntp $FILESYSTEM_ROOT/etc/init.d/
fi
## Configure application core dump handler
sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get install -y systemd-coredump

## Version file
sudo mkdir -p $FILESYSTEM_ROOT/etc/sonic
Expand Down
3 changes: 3 additions & 0 deletions files/build_templates/docker_image_ctl.j2
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,9 @@ start() {
docker create {{docker_image_run_opt}} \
{%- if install_debug_image == "y" %}
-v /src:/src:ro -v /debug:/debug:rw \
-v /var/log/journal:/var/log/journal:ro \
-v /var/lib/systemd/coredump:/var/lib/systemd/coredump:ro \
-v /etc/machine-id:/etc/machine-id:ro \
{%- endif %}
{%- if '--log-driver=json-file' in docker_image_run_opt or '--log-driver' not in docker_image_run_opt %}
--log-opt max-size=2M --log-opt max-file=5 \
Expand Down
18 changes: 18 additions & 0 deletions files/build_templates/sonic_debian_extension.j2
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,24 @@ sudo cp $IMAGE_CONFIGS/caclmgrd/caclmgrd.service $FILESYSTEM_ROOT/etc/systemd/s
echo "caclmgrd.service" | sudo tee -a $GENERATED_SERVICE_FILE
sudo cp $IMAGE_CONFIGS/caclmgrd/caclmgrd $FILESYSTEM_ROOT/usr/bin/

# Allow systemd-coredump to perform cleanup of core files and not tmpfiles.d
sudo sed -i "/\/var\/lib\/systemd\/coredump/d" $FILESYSTEM_ROOT/usr/lib/tmpfiles.d/systemd.conf

# Customize systemd-coredump configuration
sudo mkdir -p $FILESYSTEM_ROOT/etc/systemd/coredump.conf.d
sudo cp $IMAGE_CONFIGS/coredump/coredump.conf.d/00-sonic-coredump.conf $FILESYSTEM_ROOT/etc/systemd/coredump.conf.d
# Setup service to configure coredump service
sudo cp $IMAGE_CONFIGS/coredump/coredump-config.service $FILESYSTEM_ROOT/etc/systemd/system/
sudo LANG=C chroot $FILESYSTEM_ROOT systemctl enable coredump-config.service
sudo cp $IMAGE_CONFIGS/coredump/coredump-config.sh $FILESYSTEM_ROOT/usr/bin/

## Enable persistent journal to store coredump history
sudo mkdir -p $FILESYSTEM_ROOT/etc/systemd/journald.conf.d/
sudo cp files/image_config/journald/journald.conf.d/00-sonic-journald.conf $FILESYSTEM_ROOT/etc/systemd/journald.conf.d/

## Shortcut to access core files
sudo ln -sf /var/lib/systemd/coredump $FILESYSTEM_ROOT/var/core

# Copy process-reboot-cause service files
sudo cp $IMAGE_CONFIGS/process-reboot-cause/process-reboot-cause.service $FILESYSTEM_ROOT/etc/systemd/system/
echo "process-reboot-cause.service" | sudo tee -a $GENERATED_SERVICE_FILE
Expand Down
11 changes: 11 additions & 0 deletions files/image_config/coredump/coredump-config.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[Unit]
Description=Update coredump configuration
Requires=updategraph.service
After=updategraph.service

[Service]
Type=oneshot
ExecStart=/usr/bin/coredump-config.sh

[Install]
WantedBy=multi-user.target
14 changes: 14 additions & 0 deletions files/image_config/coredump/coredump-config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#!/bin/bash

DISABLE_COREDUMP_CONF="/etc/sysctl.d/50-disable-coredump.conf"

if [ "$(redis-cli -n 4 HGET "COREDUMP|config" "enabled")" = "false" ] ; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If key is present && carry value as false, then we disable. In other words the default behavior is "enabled=true". To disable, one has to create this key explicitly.

Instead, why not require a key for disabling, which would imply default is enabled.

"COREDUMP|disabled" == true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is common place to see configuration knobs with a positive intent. So if we flip this around, it may confuse the user with other usages of true/false kind of configurations.

echo "kernel.core_pattern=" > ${DISABLE_COREDUMP_CONF}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean ?

Can you please explain the impact of disable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the coredump admin mode is disabled in config db, core files will not be generated. We are creating a sysctl entry to disable core dump.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me a use case, where you would not want core file dump ?

Core file dump implies that some unexpected error occurred or user explicitly creating one with kill for a reason. In either case, the dump is required for analysis.

If this is the only purpose of coredump-config.service, I don't see a need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. corefiles can be a space hog
  2. Multiple corefiles may be generated if a process is in infinite loop
  3. corefiles may contain sensitive information, so some applications may not want it to be recorded.
  4. We plan to re-use coredump-config.service for enable/disable of kernel coredump as well. Also there may be additional parameters that you would like to configure w.r.t core files (e.g limit on the size of core file). Current mode of operation is we chose some fixed numbers. But future extensions may make them configurable. To start with, we are providing a framework to enable/disable the feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already another initiative to do core file rotation and limit the count of core files at any time per process. Turning off is definitely not the solution.

The limitation on count / size is also from Broadcom only. I need to look for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact you are referring that PR #468, which has the following.

" a. Support per-process core file rotation and archiving to optimize disk space "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am referring to various configurations provided by systemd-coredump. Below is a link to it.
https://www.freedesktop.org/software/systemd/man/coredump.conf.html
User's might want some bits of it be part of ConfigDB.

The PR I was referring to is PR#729 which enables kernel core dump feature. For this feature, it is desirable that users have an enable/disable knob as kdump requires dedicated 512MB of memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add Guohan to this thread. What we need is the "requirement/use case"? I don't see one.

I find it rather risky to have a DB variable to control, as it could get saved and persist across reboots, transparently, which is a big risk as it is likely to block core dumps unintentionally.

I would rather have this ability as a CLI tool, which disables temporarily and all should be back to default/enabled state upon reboot.

What we do need is the ability to limit count of cores per process and overall disk size taken by cores.

else
rm -f ${DISABLE_COREDUMP_CONF}
fi

# Read sysctl conf files again
systemctl restart systemd-sysctl

exit 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[Coredump]
Storage=external
Compress=yes
ProcessSizeMax=2G
ExternalSizeMax=2G
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[Journal]
Storage=persistent
SystemMaxUse=256M
RuntimeMaxUse=356M
MaxLevelStore=crit
4 changes: 3 additions & 1 deletion rules/docker-base-stretch.mk
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ VIM = vim
OPENSSH = openssh-client
SSHPASS = sshpass
STRACE = strace
$(DOCKER_BASE_STRETCH)_DBG_IMAGE_PACKAGES += $(GDB) $(GDBSERVER) $(VIM) $(OPENSSH) $(SSHPASS) $(STRACE)
SYSTEMD_COREDUMP = systemd-coredump
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W/o this package installed, will there be core dumps created?

Plus this is already installed in build_debian.sh unconditionally. Can you please explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W/O this package installed inside the container, core dumps will still be created.

Core files are always generated on host o/s and stored in the /var/lib/systemd/coredump directory.

The application that has crashed may be part of a container. So if you want to run gdb using the coredumpctl gdb command, it will not find the application binary when executed on host o/s.

So, we map the /var/lib/systemd/coredump directory to the containers (see change in docker_image_ctl.j2) and also install the coredumpctl tool here. Now, the corefile and the handy coredumpctl tool are ready for debugging the application inside the container.

$(DOCKER_BASE_STRETCH)_DBG_IMAGE_PACKAGES += $(GDB) $(GDBSERVER) $(VIM) $(OPENSSH) $(SSHPASS) $(STRACE) \
$(SYSTEMD_COREDUMP)

SONIC_DOCKER_IMAGES += $(DOCKER_BASE_STRETCH)
SONIC_STRETCH_DOCKERS += $(DOCKER_BASE_STRETCH)