Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address more extraction edge cases; improve naming and consistency #733

Merged
merged 14 commits into from
Jan 3, 2025

Conversation

egibs
Copy link
Member

@egibs egibs commented Dec 20, 2024

This PR addresses another subtle bug when extracting .tar.gz archives that are actually tar archives as well as other invalid .gz files and also adds better symlink handling.

I noticed this failure:

unable to process /tmp/malcontent-821299582/included-source/felix-ebpf-gpl.tar.gz: extract to temp: failed to extract /tmp/malcontent-821299582/included-source/felix-ebpf-gpl.tar.gz: failed to create gzip reader: gzip: invalid header

And inspected the file locally:

$ file felix-ebpf-gpl.tar.gz 
felix-ebpf-gpl.tar.gz: POSIX tar archive (GNU)

To fix this, I moved the .gz case statement below the .tar.gz and .tgz case statement so that it wouldn't preempt ExtractTar and also added a check to validate that a given file is actually a valid gzip file (and using the defaut tar reader if not).

With these changes, the file extracts and scans correctly:

🔎 Scanning "~/Downloads/calico-fips/included-source/felix-ebpf-gpl.tar.gz"
├─ 🟡 ~/Downloads/calico-fips/included-source/felix-ebpf-gpl.tar.gz ∴ /felix/bpf-gpl/bin/from_nat_info.o [MEDIUM]
│     ≡ command & control [MEDIUM]
│       🟡 addr/ip — mentions an IP and port:
│          client_ip, ctx_port, dst_port, get_port, host_ip, inner_ip, intf_ip, local_port, nat_ip, nat_port, orig_ip, orig_port, o…
│     ≡ credential [MEDIUM]
│       🟡 sniffer/bpf — BPF (Berkeley Packet Filter): bpf
│     ≡ discovery [MEDIUM]
│       🟡 multiple — collects system and network information: ip_addr, ipv46_addr, ipv4_addr, ipv6_addr
│     ≡ filesystem [MEDIUM]
│       🟡 path/relative — references and possibly executes relative path:
│          ./arp, ./bpf, ./conntrack_types, ./counters, ./failsafe, ./fib, ./include, ./jump, ./metadata, ./nat_lookup, ./nat_types…
│     ≡ networking [MEDIUM]
│       🟡 ip/addr — mentions an 'IP address': ip_addr
│       🟡 ip/host_port — connects to an arbitrary host:port: host(revnat->port, host_to_ctx_port
│       🔵 ip/icmp — ICMP (Internet Control Message Protocol), aka ping
│       🟡 socket/raw — send raw and/or malformed IP packets: IPPROTO_RAW
│       🔵 socket/send — send a message to a socket
│     ≡ persistence [LOW]
│       🔵 kernel_module/symbol_lookup — bpf: BPF
│
├─ 🟡 ~/Downloads/calico-fips/included-source/felix-ebpf-gpl.tar.gz ∴ /felix/bpf-gpl/bin/to_lo_fib_info_co-re.o [MEDIUM]
│     ≡ command & control [MEDIUM]
│       🟡 addr/ip — mentions an IP and port:
│          client_ip, ctx_port, dst_port, func_ip, get_port, host_ip, inner_ip, intf_ip, local_port, nat_ip, nat_port, orig_ip, ori…
│     ≡ credential [MEDIUM]
│       🟡 sniffer/bpf — BPF (Berkeley Packet Filter): bpf
│     ≡ discovery [MEDIUM]
│       🟡 multiple — collects system and network information: ip_addr, ipv46_addr, ipv4_addr, ipv6_addr
│     ≡ filesystem [MEDIUM]
│       🟡 path/relative — references and possibly executes relative path:
│          ./arp, ./bpf, ./conntrack_types, ./counters, ./failsafe, ./fib, ./ifstate, ./include, ./jump, ./nat_lookup, ./nat_types,…
│     ≡ networking [MEDIUM]
│       🟡 ip/addr — mentions an 'IP address': ip_addr
│       🟡 ip/host_port — connects to an arbitrary host:port: host source port, host(revnat->port, host_to_ctx_port
│       🔵 ip/icmp — ICMP (Internet Control Message Protocol), aka ping
│       🟡 ip/syncookie — references SYN cookies, used to resist DoS attacks: syncookie
│       🟡 socket/raw — send raw and/or malformed IP packets: IPPROTO_RAW
│       🔵 socket/send — send a message to a socket
│     ≡ persistence [LOW]
│       🔵 kernel_module/symbol_lookup — bpf: BPF
│
├─ 🔵 ~/Downloads/calico-fips/included-source/felix-ebpf-gpl.tar.gz ∴ /felix/bpf-gpl/include/libbpf/include/uapi/linux/if_xdp.h [LOW]
│     ≡ anti-static [LOW]
│       🔵 obfuscation/bitwise — uses bitwise math: 1 << 0, 1 << 1, 1 << 2, 1 << 3
│     ≡ networking [LOW]
│       🔵 socket/send — send a message to a socket: sendto
│
...

@egibs egibs requested a review from tstromberg December 20, 2024 03:12
@egibs egibs force-pushed the fix-tar-gz-edge-case branch 4 times, most recently from ac5f66d to b815c32 Compare December 20, 2024 13:33
@egibs egibs force-pushed the fix-tar-gz-edge-case branch from 0009e80 to 0acf3ec Compare December 20, 2024 14:05
@egibs egibs changed the title Fix tar files named as tar.gz edge case Address more gzip, tar, and tar.gz edge cases Dec 20, 2024
@egibs
Copy link
Member Author

egibs commented Dec 20, 2024

Uncovered a few more edge cases while working on this, namely:

  • .7.gz files packaged in Debian that are actually just ASCII files
  • symlink handling within tar archives wasn't working as expected with nonexistent files

@egibs egibs requested a review from stevebeattie December 27, 2024 13:43
@stevebeattie
Copy link
Member

I noticed this failure:

unable to process /tmp/malcontent-821299582/included-source/felix-ebpf-gpl.tar.gz: extract to temp: failed to extract /tmp/malcontent-821299582/included-source/felix-ebpf-gpl.tar.gz: failed to create gzip reader: gzip: invalid header

And inspected the file locally:

$ file felix-ebpf-gpl.tar.gz 
felix-ebpf-gpl.tar.gz: POSIX tar archive (GNU)

FYI, this specific instance looks to be a bug in the wolfi package for calico, in the build here https://github.com/wolfi-dev/os/blob/4e9ab93174c17cae8e9defb012b4a731621be1af/calico-3.29.yaml#L206-L212 ; it's trying to replicate the gpl ebpf tarball build from the upstream build process at https://github.com/projectcalico/calico/blob/810091a6acf98ecc891c5bc3c07374a786d68743/node/Makefile#L286-L290 but it dropped the xz compression without adding gzip compression, despite the target rename.

  • .7.gz files packaged in Debian that are actually just ASCII files

Do you have specific examples? The only example I could find on a running Ubuntu system was /var/lib/dpkg/alternatives/builtins.7.gz which is where the current configuration info is stored for /etc/alternatives/builtins.7.gz (I can understand the intent of this alternatives link, but it doesn't seem to be used by anything but the bash packages).

Overall, though probably outside the scope of this PR, I'd like to see such files be flagged by malcontent, as they are pretty rarely legit, they probably are either:

  • bugs in the upstream or the downstream packaging (e.g. calico),
  • the extremely rare legit case (alternatives of compressed files, test cases exercising these kinds of issues),
  • or something trying to look like something it's not, which is a bit suspicious.

Thanks.

@egibs
Copy link
Member Author

egibs commented Dec 31, 2024

Do you have specific examples?

The example you found was the one I came across. TBD if there are others in other distros.

I like the idea of flagging files for further review but the changes in this PR will at least avoid trying to extract archives that aren't valid.

pkg/archive/deb.go Outdated Show resolved Hide resolved
pkg/archive/tar.go Outdated Show resolved Hide resolved
@egibs egibs requested a review from stevebeattie December 31, 2024 22:27
@egibs egibs changed the title Address more gzip, tar, and tar.gz edge cases Address more extraction edge cases; improve naming and consistency Jan 1, 2025
Signed-off-by: egibs <[email protected]>
Copy link
Member

@stevebeattie stevebeattie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes, this looks much better.

@stevebeattie stevebeattie merged commit 4c9eb73 into chainguard-dev:main Jan 3, 2025
8 checks passed
stevebeattie added a commit to stevebeattie/os that referenced this pull request Jan 10, 2025
In mimicking the behavior of upstream calico creating a tarball
of the felix bpf gpl components, a bug was duplicated that
named the tarball with a .tar.gz extension but didn't actually
compress the tarball. Upstream eventually fixed this as part
of their conversion to using xz compression on the tarball in
projectcalico/calico#9364

Fix in the calico builds by converting to use xz. Busybox's version
of tar does not support the --use-compress-program=COMMAND option
from gnu tar that upstream used, so instead use the -J option to get
xz compression.

(This also was tripping up malcontent scans, but a fix for that landed
upstream in chainguard-dev/malcontent#733 )

Signed-off-by: Steve Beattie <[email protected]>
@egibs egibs deleted the fix-tar-gz-edge-case branch January 17, 2025 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants