Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*.gz files are getting corrupted during collection under the 'logs' plugin when obfuscation option is enabled for masking. #3884

Open
suhastawade opened this issue Dec 16, 2024 · 7 comments

Comments

@suhastawade
Copy link

suhastawade commented Dec 16, 2024

While performing logs collection using sos utility the .gz files from /var/log/ directory are getting collected. But when masking is ON then obfuscation is happening and it is corrupting those .gz files.
As a result those are not getting extracted and not able to see any data out of it.

You can see that the file format of the .gz file is shown as "data" only instead of "gzip compressed data".
This is from my local extracted directory - "sosreport-testappliance-20241215061507-periodic-nvjetqe"
[root@testappliance sosreport-testappliance-20241215061507-periodic-nvjetqe]# file ./var/log/messages-2024121000.gz
./var/log/messages-2024121000.gz: data

Following image shows that during that during cleanup it fails to parse the lines from the .gz files resulting in corrupted data.
Command I used is - sosreport -o logs --clean --keep-binary-files -vvv

<<<
Screenshot 2024-12-16 at 4 13 24 PM

Following is the github location where we are collecting all the files of name "messages*" and "secure*" and because of this the .gz files are also getting collected which results into corrupted files during collection.

https://github.com/sosreport/sos/blob/main/sos/report/plugins/logs.py#L49
https://github.com/sosreport/sos/blob/main/sos/report/plugins/logs.py#L50

sos report version -
[root@testappliance log]# sosreport -v
Please note the 'sosreport' command has been deprecated in favor of the new 'sos' command, E.G. 'sos report'.
Redirecting to 'sos report -v'

sosreport (version 4.7.2)

If required you can also reproduce this issue at your end with having multiple .gz files inside /var/log/ directory.

@jcastill
Copy link
Member

Thank you for reporting this issue @suhastawade . I managed to reproduce this locally. I haven't looked at the code yet but it looks like we may be trying to perform the substitutions directly on the gz file. I'll start looking into this in a bit, but if anyone else wants to work on it just let us know here so we don't overlap.

@TurboTurtle
Copy link
Member

We should be removing tarballs entirely - https://github.com/sosreport/sos/blob/main/sos/cleaner/archives/__init__.py#L392 - precisely for this reason that we can't reliably obfuscate binary data without corrupting it.

It is interesting that only the IPv6 parser is reporting errors within that obfuscation attempt, though the file should have been removed before even getting to the parser step.

@jcastill
Copy link
Member

jcastill commented Dec 16, 2024

I think that using the --keep-binary-files could be causing this issue:

https://github.com/sosreport/sos/blob/main/sos/cleaner/__init__.py#L692

                if (not self.opts.keep_binary_files and
                        archive.should_remove_file(short_name)):
                    archive.remove_file(short_name)

Shall we remove this check to avoid issues like this?

jcastill added a commit to jcastill/sos that referenced this issue Dec 16, 2024
This commit tries to honour option --keep-binary-files
when using cleaner by making sure that these files
are skipped so the cleaner doesn't attempt
to apply substitutions directly on them.

Related: sosreport#3884

Signed-off-by: Jose Castillo <[email protected]>
@jcastill
Copy link
Member

I've opened a PR with a possible fix, that tries to honour the --keep-binary-files that seems to work as expected. @suhastawade can you try it? Just remember that we won't try to obfuscate anything from the binary, so if you use that option you need to be really sure you really want whoever receives the files to have them.

@TurboTurtle
Copy link
Member

Ah, I didn't see the option being used in the command at first.

@suhastawade
Copy link
Author

@jcastill - Thanks for taking this on priority.
But I have one question here, if we are going to skip the .gz files here when "--keep-binary-files" option is provided, then obfuscation will not deliver the sanitisation for those .gz files and masking will not be applied.
Can't we skip the .gz files from collection to make sure it will not be collected and corruption of those files will also not happens as those will be skipped.

Let me know if you have any further questions / comments.

@TurboTurtle
Copy link
Member

If you don't want binary files that may have sensitive information in the report archive after cleaning, then don't use --keep-binary-files when cleaning the report.

The vast majority of sos use cases do not use clean/mask, and binary data such as tarballs are considered important collections. We remove binary files by default out of the report during cleaning precisely because we cannot reliably obfuscate them, and attempting to is likely to introduce the exact problem you're seeing - corruption.

The --keep-binary-files option is expressly intended to be used to say "I need to clean as much as possible in the report, but I understand that binary files cannot be obfuscated and at the same time, I definitely need the data in those binary files, so I am accepting the risk of non-obfuscated data in those binary files that I am requesting to keep".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants