Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: optimize TCP and UDP checksum handling with eBPF #220

Merged
merged 5 commits into from
May 30, 2024

Conversation

tzssangglass
Copy link
Contributor

udp and tcp ingress:

  1. use bpf_l4_csum_replace bpf_l3_csum_replace to recalculate the checksums;
  2. and bpf_skb_store_bytes to update dest ip and port;

Signed-off-by: tzssangglass [email protected]

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2024
@k8s-ci-robot k8s-ci-robot requested a review from aryan9600 April 7, 2024 17:42
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 7, 2024
@k8s-ci-robot k8s-ci-robot requested a review from shaneutt April 7, 2024 17:42
@k8s-ci-robot
Copy link
Contributor

Welcome @tzssangglass!

It looks like this is your first PR to kubernetes-sigs/blixt 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/blixt has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 7, 2024
@tzssangglass tzssangglass marked this pull request as ready for review April 7, 2024 17:43
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 7, 2024
@k8s-ci-robot k8s-ci-robot requested a review from mlavacca April 7, 2024 17:43
@shaneutt shaneutt requested review from astoycos and removed request for shaneutt April 8, 2024 14:51
@shaneutt shaneutt self-requested a review April 8, 2024 14:52
Copy link
Member

@astoycos astoycos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @tzssangglass Thanks so much for doing this! Just some questions and comments, also looks like you need to run cargo fmt

Otherwise my general comments are

  • Try and keep the unsafe closures as minimal as possible
  • Can we condense alot of this code into a separate helper function with a good comment describing what it does? Something like (packet_redirect_cksum_update) or something like that?

dataplane/ebpf/src/ingress/tcp.rs Outdated Show resolved Hide resolved
}
}

update_tcp_conns(tcp_hdr_ref, &client_key, &mut lb_mapping)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved to after the cksum calculation code block as it was prior to this patch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, I found that I cannot put

    if tcp_hdr_ref.rst() == 1 {
        unsafe {
            LB_CONNECTIONS.remove(&client_key)?;
        }
    }

    let mut lb_mapping = LoadBalancerMapping {
        backend,
        backend_key,
        tcp_state,
    };

    update_tcp_conns(tcp_hdr_ref, &client_key, &mut lb_mapping)?;

behind

    let backend_ip = backend.daddr.to_be();
    let ret = set_ipv4_ip_dst(&ctx, TCP_CSUM_OFF, &original_daddr, backend_ip);
    if ret != 0 {
        return Ok(TC_ACT_OK);
    }

    let backend_port = (backend.dport as u16).to_be();
    let ret = set_ipv4_dest_port(&ctx, TCP_CSUM_OFF, &original_dport, backend_port);
    if ret != 0 {
        return Ok(TC_ACT_OK);
    }

otherwise eBPF program would load failed, eBPF verifier shows that access out of bounds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think accessing tcp_hdr after bpf_redirect_neigh will cause an out-of-bounds error, but I don't know why the original code did not have this issue.

TCP_CSUM_OFF,
original_daddr as u64,
backend_ip as u64,
IS_PSEUDO | (mem::size_of_val(&backend_ip) as u64),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the IS_PSEUDO const doing here?

Copy link
Contributor Author

@tzssangglass tzssangglass Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to my understanding, IS_PSEUDO | (mem::size_of_val(&backend_ip) as u64) as the last argument of bpf_l4_csum_replace(), it can simultaneously set the IS_PSEUDO flag and specify the size of the field to be updated, allowing the function to correctly recalculate the checksum of the transport layer and consider the impact of the pseudo header.

see more:

  1. result of IS_PSEUDO | (mem::size_of_val(&backend_ip) as u64): 00010000 | 0100 -> 00010100

  2. in bpf_l4_csum_replace()

    	switch (flags & BPF_F_HDR_FIELD_MASK) {
    	case 0:
    		if (unlikely(from != 0))
    			return -EINVAL;
    
    		inet_proto_csum_replace_by_diff(ptr, skb, to, is_pseudo);
    		break;
    	case 2:
    		inet_proto_csum_replace2(ptr, skb, from, to, is_pseudo);
    		break;
    	case 4:
    		inet_proto_csum_replace4(ptr, skb, from, to, is_pseudo);
    		break;

    here BPF_F_HDR_FIELD_MASK is

    	enum {
    	BPF_F_HDR_FIELD_MASK		= 0xfULL,
    	};

    flags & BPF_F_HDR_FIELD_MASK: 00010100 & 00001111 -> 00000100, and would choose case 4 to update l4 checksum.

dataplane/ebpf/src/ingress/tcp.rs Outdated Show resolved Hide resolved
dataplane/ebpf/src/ingress/tcp.rs Outdated Show resolved Hide resolved
dataplane/ebpf/src/ingress/udp.rs Outdated Show resolved Hide resolved
@tzssangglass
Copy link
Contributor Author

ok, I will update based on the comments.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 14, 2024
@shaneutt
Copy link
Member

@tzssangglass needs a rebase on main

@astoycos when you're done with your review, if you'd like me to take a pass as well please do ping me

@shaneutt shaneutt removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2024
@shaneutt shaneutt removed their assignment Apr 16, 2024
udp ingress:
1. use `bpf_l4_csum_replace` `bpf_l3_csum_replace` to recalculate the checksums;
2. and `bpf_skb_store_bytes` to update dest ip and port;

Signed-off-by: tzssangglass <[email protected]>
Signed-off-by: tzssangglass <[email protected]>
…estination IP and port modifications for TCP and UDP eBPF programs, streamlining the code by removing duplicate checksum and byte storage operations.

Signed-off-by: tzssangglass <[email protected]>
Signed-off-by: tzssangglass <[email protected]>
@tzssangglass
Copy link
Contributor Author

needs a rebase on main

done

Copy link
Member

@astoycos astoycos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Huge improvement here thanks and sorry for the slowness @tzssangglass !!!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astoycos, tzssangglass

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2024
@k8s-ci-robot k8s-ci-robot merged commit 4b8278c into kubernetes-sigs:main May 30, 2024
12 checks passed
@tzssangglass tzssangglass deleted the checksum branch May 31, 2024 03:05
@tzssangglass tzssangglass restored the checksum branch May 31, 2024 03:06
@tzssangglass
Copy link
Contributor Author

oops, it looks like k8s-ci-robot didn't squash the commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants