Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[coredump] Refactor info and dump collection #3811

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

TurboTurtle
Copy link
Member

Refactor the plugin to adjust how coredumpctl info and coredump file collection is handled.

Plugin will now collect the compressed coredump file for coredump entries for which coredumpctl reports that the file is present, for the first X number of coredump files where X is the new dumps plugin option.

A second new plugin option, executable, can be used to specific a regex string to match coredump entries against to determine if they should be collected or not. The default behavior is to collect for all entries.

A symlink for coredump file collections will be dropped in the plugin directory to aide in associating dump files with coredumpctl entries, as it may not always be obvious for end users based on filenames alone.


Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines

  • Is the commit message split over multiple lines and hard-wrapped at 72 characters?
  • Is the subject and message clear and concise?
  • Does the subject start with [plugin_name] if submitting a plugin patch or a [section_name] if part of the core sosreport code?
  • Does the commit contain a Signed-off-by: First Lastname [email protected]?
  • Are any related Issues or existing PRs properly referenced via a Closes (Issue) or Resolved (PR) line?
  • Are all passwords or private data gathered by this PR obfuscated?

Copy link

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

  • sudo yum install -y dnf-plugins-core on RHEL 8
  • sudo dnf install -y dnf-plugins-core on Fedora
  • dnf copr enable packit/sosreport-sos-3811
  • And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

@TurboTurtle
Copy link
Member Author

So this one turned out more interesting than I thought. The dump size reported by coredumpctl is the on disk compressed size. Since the service uses zstd, this means that a seemingly innocuous 20mb file could inflate to over 1GB in size (which is what coredumpctl dump would produce). So there isn't really a way to anticipate collection sizes within the plugin.

From that, we can just collect the compressed coredumps, and then drop a symlink in the plugin dir to help users align what coredump they're looking for from the output of coredumpctl list, which isn't always straightforward. Not a standard practice by any means, but in this particular case I think it makes sense.

@TurboTurtle TurboTurtle force-pushed the coredump-refactor branch 2 times, most recently from e6e9965 to 1e5a263 Compare October 18, 2024 19:57
@TurboTurtle
Copy link
Member Author

Ubuntu 18.04 failures are due to python version. We recently agreed to move to 3.8 as minimum version. cirrus.yaml should be updated shortly when the GCE image for the new Ubuntu dev release is available, per iRC:

15:50:13 @turboturt+| arif-ali: re: #3811 - I'm assuming the failures in the 18.04 test are due to python version (invalid syntax, this is the first commit using a := to the project) - is 18.04 a py3.6 by default     │
                    | release? If so, I think it's time to drop the 18.04 test                                                                                                                                           │
15:56:13   arif-ali | turboturtle: yup, walrus is 3.8, and bionic (18.04) should therefore be dropped. I have a PR ready locally to track new cirrusci.yaml changes to pick the new dev release which has been tagged but│
                    | the GCE image is due to be ready next week. 

PR on hold til cirrus text matrix is updated.

core = re.search(r"(^\s*Storage:(.*)(\(present\)))", res, re.M)
try:
core_path = core.groups()[1].strip()
self.add_copy_spec(core_path, tailit=False, sizelimit=100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we skip collecting the core due to its >100M size, will cores_collected be incremented? (ideally it shouldnt)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch of a current blindspot in tailit handling. We don't actually give any indication to the caller that the collection was skipped. Will dig on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth having an option that allows it to copy a bigger size? I had a coredump that was in 2GB in size recently, and we needed it. I think it would be worthwhile that as an option? thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to add plugin options around size limiting. Though...what about extending log_size to a per-plugin setting similar to timeout?


plugin_name = "coredump"
profiles = ('system', 'debug')
packages = ('systemd-udev', 'systemd-coredump')

option_list = [
PluginOpt("detailed", default=False,
desc="collect detailed information for every report")
PluginOpt("dumps", default=5, desc="number of dump files to collect"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collecting 5 coredumps as a default? Isn't it too much?

I am bit inclined to 3 as the default, but no strong preference (and no real use case / user experience how many coredumps are worth to take).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 was a shot in the dark number, no problem with dropping to 3.

Refactor the plugin to adjust how `coredumpctl info` and coredump file
collection is handled.

Plugin will now collect the compressed coredump file for coredump
entries for which coredumpctl reports that the file is present, for the
first X number of coredump files where X is the new `dumps` plugin
option.

A second new plugin option, `executable`, can be used to specific a
regex string to match coredump entries against to determine if they
should be collected or not. The default behavior is to collect for all
entries.

A symlink for coredump file collections will be dropped in the plugin
directory to aide in associating dump files with coredumpctl entries, as
it may not always be obvious for end users based on filenames alone.

Signed-off-by: Jake Hunsaker <[email protected]>
@TurboTurtle
Copy link
Member Author

Dropped number of cores from 5 to 3. Bumped the core size default to 200 since 100 can reasonably be considered small for coredumps.

There's not a great way to detect skips from a_c_s today, so inserted a stat to handle not incrementing the counter for when we will skip a core path. Noted a TODO for extending log-size to a per-plugin option. I'll handle that probably by EOM, but it is a somewhat involved logical dance if the timeout handling is any indication.

@TurboTurtle TurboTurtle merged commit 99d0ae2 into sosreport:main Nov 6, 2024
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants