Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Troubeshooting tools #524

Closed
jschwinger233 opened this issue May 23, 2024 · 13 comments · Fixed by #632
Closed

[Enhancement] Troubeshooting tools #524

jschwinger233 opened this issue May 23, 2024 · 13 comments · Fixed by #632
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@jschwinger233
Copy link
Member

Improvement Suggestion

Two ideas:

  1. Add dae sysdump to automatically collect network settings (routing, netfilter, sysctl, ...) and archive all info into a tar. This can save us from asking "would you mind showing me your netfilter using xxx cmd".
  2. Add dae trace --drop-only. This can make the tool much easier as users don't have specify ip or port.

Potential Benefits

An ideal bug report would be providing with sysdump.tar + trace.log if possible.

@jschwinger233 jschwinger233 added enhancement New feature or request good first issue Good for newcomers labels May 23, 2024
@dae-prow
Copy link
Contributor

dae-prow bot commented May 23, 2024

Thanks for opening this issue!

@linglilongyi
Copy link
Contributor

Should different command set be run based on different OS? Or just traverse the commands and collect the standard output into the pack?

@jschwinger233
Copy link
Member Author

@linglilongyi Hi thanks for asking!

I was wondering if calling syscall (wrapped by 3rd party library of course) rather than relying on external commands is a good idea.

For example, if we want to collect routing information on the system, instead of os.Exec("ip", "route"), we can call vishvananda/netlink#Handle.RouteList in the dae process, something like (demo code generated by ChatGPT ).

package main

import (
    "fmt"
    "log"
    "net"

    "github.com/vishvananda/netlink"
)

func main() {
    // Create a new netlink handle
    handle, err := netlink.NewHandle()
    if err != nil {
        log.Fatalf("Failed to create netlink handle: %v", err)
    }
    defer handle.Delete()

    // Retrieve all routes
    routes, err := handle.RouteList(nil, netlink.FAMILY_ALL)
    if err != nil {
        log.Fatalf("Failed to list routes: %v", err)
    }

    // Format and print each route
    for _, route := range routes {
        fmt.Println(formatRoute(route))
    }
}

func formatRoute(route netlink.Route) string {
    var dst, gw, dev string

    if route.Dst == nil {
        dst = "default"
    } else {
        dst = route.Dst.String()
    }

    if route.Gw != nil {
        gw = route.Gw.String()
    }

    link, err := netlink.LinkByIndex(route.LinkIndex)
    if err == nil {
        dev = link.Attrs().Name
    }

    flags := ""
    if route.Flags&netlink.FLAG_ONLINK != 0 {
        flags += " onlink"
    }

    if route.Flags&netlink.FLAG_PERMANENT != 0 {
        flags += " permanent"
    }

    return fmt.Sprintf("%s via %s dev %s%s", dst, gw, dev, flags)
}

This is faster and lower-costly.

@linglilongyi
Copy link
Contributor

I have some trouble.
First it seems that no 3rd party library can handle mission of listing xtable rule.
Second I have wrote a mod name sysdump.go under main/cmd , and I have wrote init function to add subcommand like the others. But there is not a sysdump subcommand showing in dae --help after building. Is there anything else I should modify besides the new go file?

@jschwinger233
Copy link
Member Author

jschwinger233 commented Jun 30, 2024

@linglilongyi

netfilter is notorious for the lack of programmable APIs:

4.5 Is there an C/C++ API for adding/removing rules?
The answer unfortunately is: No.

https://www.netfilter.org/documentation/FAQ/netfilter-faq-4.html#ss4.5

So feel free to exec command lines in this case.

As of how to add new subcommand, would you mind drafting a PR in your fork so I can look into what's missing?

@linglilongyi
Copy link
Contributor

I build up the ./dae and test the /bin/dae. lol...

@linglilongyi
Copy link
Contributor

Does the "drop-only" refer to conection with TcpFlag "S" ?

@jschwinger233
Copy link
Member Author

@linglilongyi Currently dae trace dumps huge amount of packets, most of them are not necessary. In most cases, we care about packets that are dropped by kernel, aka processed by kernel function kfree_skb_reason, or called by kfree_skbmem but without consume_skb. I can walk you through if you feel interested, but you don't have to push yourself into it. It's always okay (even recommended) to split big tasks into small parts and open PRs separately.

@linglilongyi
Copy link
Contributor

linglilongyi commented Jul 5, 2024

Currently dae trace dumps huge amount of packets, most of them are not necessary. In most cases, we care about packets that are dropped by kernel, aka processed by kernel function kfree_skb_reason, or called by kfree_skbmem but without consume_skb.

I think that the "packets" refers to bpfObjects in trace.go. And the key is capture features of droping bpfObjects.

Could I insert

// isDropOnly is true only when called with "--drop-only" 
if isDropOnly && NearestSymbol(event.Pc).Name != "kfree_skb_reason" {
    continue
}

in https://github.com/daeuniverse/dae/blob/main/trace/trace.go#L276 to achieve that? Or I should drop the accepted package more upstreamly?

@jschwinger233
Copy link
Member Author

@linglilongyi dae trace prints packets' (skbs') lifetime, --drop-only is supposed to print the lifetime of dropped packet. It's more like:

if dropOnly && NearestSymbol() == "kfree_skb_reason" {
    droppedSkbEvents := skbEvents[event.Addr]
    for _, event := range droppedSkbEvents {
        print()
    }
}

We can collect events by skb address, hold the output until we see kfree_skb_reason. We should also delete(skbEvents, skbAddr) for normal skbs, otherwise skbEvents map increases infinitely.

It also looks feasible (and fun) to deliver a bpf-only approach. We could use hash of array bpf map (BPF_MAP_TYPE_ARRAY_OF_MAPS?) to temporarily store events in kernel space and push events to userpace only if kfree_skb_reason is seen.

I can walk you though the details if you like.

@linglilongyi
Copy link
Contributor

Should I replace unix mod with dae/tree/main/pkg/ebpf_internal/internal/unix/types_linux.go in #572 ?

And for the drop-only feature, I finally find some func that generated by bpf2go in your repo. I still need some time to go through it.

@linglilongyi
Copy link
Contributor

@jschwinger233 I have drafted a PR in my own fork, would you pleased have a check on it. It is all written in go now.

@mzz2017
Copy link
Contributor

mzz2017 commented Jul 31, 2024

@linglilongyi Thank you! jschwinger233 and I are busy these days. We'll contact you as soon as free. FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants