Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable nf_conntrack_tcp_be_liberal for Ubuntu 22.04 until kernel update #7860

Merged

Conversation

ritchxu
Copy link
Contributor

@ritchxu ritchxu commented Jul 5, 2023

Description

Azure's latest Ubuntu 22.04 image is lacking a kernel fix torvalds/linux@6e250dc.

GitHub Actions' ubuntu-22.04 image is built upon Azure's base image. Without the fix, when customers' workflows upload large-size artifacts to remote servers and remote server fell behind in responding with ACK TCP packets, the delayed ACK packets are counted as INVALID because they are out of the TCP window, leading to the client kernel panic and responding with RST. This results in the intermittent connection reset symptom that has been reported by multiple customers1234.

With permission, a more detailed analysis/history can be seen here.

Solution

Until the base image is updated to include the kernel fix, this PR proposes to enable the nf_conntrack_tcp_be_liberal kernel flag5. As a result, only out-of-window RST segments will be marked as INVALID while out-of-window ACK packets will be allowed.

Check list

  • Related issue / work item is attached
  • Tests are written (if applicable)
  • Documentation is updated (if applicable)
  • Changes are tested and related VM images are successfully generated

Footnotes

  1. https://github.com/github/c2c-actions-support/issues/2441

  2. https://github.com/github/c2c-actions-support/issues/2461

  3. https://github.com/github/c2c-package-registry/issues/6891

  4. https://github.com/github/c2c-package-registry/issues/7013

  5. https://www.kernel.org/doc/Documentation/networking/nf_conntrack-sysctl.txt

@samvantran samvantran requested a review from a team July 5, 2023 21:35
@mikhailkoliada
Copy link
Contributor

@ritchxu we should really move it somewhere else and modify sysctl.conf. Your changes only modify /proc, and all the changes will be lost after a machine reboot happens (which we do a lot during deployment too)

@ritchxu
Copy link
Contributor Author

ritchxu commented Jul 13, 2023

@ritchxu we should really move it somewhere else and modify sysctl.conf. Your changes only modify /proc, and all the changes will be lost after a machine reboot happens (which we do a lot during deployment too)

Thanks @mikhailkoliada do you have an effective way of persisting sysctl parameters? I have yet to find a good way:

ritchxu@RUIX-DESKTOP:~$ cat /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal
0
ritchxu@RUIX-DESKTOP:~$ sudo sysctl -w net.netfilter.nf_conntrack_tcp_be_liberal=1
[sudo] password for ritchxu:
net.netfilter.nf_conntrack_tcp_be_liberal = 1
ritchxu@RUIX-DESKTOP:~$ sudo sysctl -p
ritchxu@RUIX-DESKTOP:~$ cat /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal
1
ritchxu@RUIX-DESKTOP:~$
[process exited with code 1 (0x00000001)]
You can now close this terminal with Ctrl+D, or press Enter to restart.
ritchxu@RUIX-DESKTOP:~$ cat /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal
0

@vpolikarpov-akvelon
Copy link
Contributor

Hey @ritchxu. There is a couple of sysctl adjustments for linux runner image here: https://github.com/actions/runner-images/blob/main/images/linux/scripts/installers/configure-environment.sh#L31-L37. You may try adding new flag the same way.

@ritchxu
Copy link
Contributor Author

ritchxu commented Jul 14, 2023

Hey @ritchxu. There is a couple of sysctl adjustments for linux runner image here: https://github.com/actions/runner-images/blob/main/images/linux/scripts/installers/configure-environment.sh#L31-L37. You may try adding new flag the same way.

I did notice that, but suspect the change wouldn't stick given my test in #7860 (comment), but let's give it a try 😄

execute_command = "sudo sh -c '{{ .Vars }} {{ .Path }}'"
inline = ["cat /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal", "sysctl -a | grep nf_conntrack_tcp_be_liberal"]
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only temporary to verify if the net.netfilter.nf_conntrack_tcp_be_liberal change sticks after reboot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change didn't stick 😞 https://github.com/actions/runner-images/actions/runs/5556097720/jobs/10148164916?pr=7860#step:7:17962

==> azure-arm.build_vhd: Provisioning with shell script: /home/vsts/Agents/image-generation/_work/runner-images/runner-images/images/linux/scripts/base/reboot.sh
    azure-arm.build_vhd: Reboot VM
==> azure-arm.build_vhd: Provisioning with shell script: /tmp/packer-shell1639431296
    azure-arm.build_vhd: 0
    azure-arm.build_vhd: net.netfilter.nf_conntrack_tcp_be_liberal = 0

@vpolikarpov-akvelon
Copy link
Contributor

Well, I found solution finally.

Occasionally, we can't set this value in sysctl.conf as it is applied when nf_conntrack module isn't loaded yet, so value "doesn't stick" indeed. This behavior was discussed here: https://bugzilla.redhat.com/show_bug.cgi?id=312481

Also I didn't found appropriate nf_conntrack module option for that so we can't set it using modprobe.d rules.

But I able to set it using udev rules. I created a file /etc/udev/rules.d/50-netfilter.rules with this content:

ACTION=="add", SUBSYSTEM=="module", KERNEL=="nf_conntrack", RUN+="/usr/sbin/sysctl net.netfilter.nf_conntrack_tcp_be_liberal=1"

I'm not sure it's a good solution, but it works at least.

@ritchxu
Copy link
Contributor Author

ritchxu commented Jul 18, 2023

@vpolikarpov-akvelon Thanks for the suggestion 👍🏼. I've pushed a commit to try it out.

@vpolikarpov-akvelon
Copy link
Contributor

Hey @ritchxu. Could you please merge or rebase you branch onto main?

@ritchxu ritchxu force-pushed the ritchxu/nf_conntrack_tcp_be_liberal branch from 3791530 to f40e07e Compare July 20, 2023 17:23
@ritchxu
Copy link
Contributor Author

ritchxu commented Jul 20, 2023

Hey @ritchxu. Could you please merge or rebase you branch onto main?

Done 👍🏼

@vpolikarpov-akvelon
Copy link
Contributor

vpolikarpov-akvelon commented Jul 21, 2023

Hey, @ritchxu. I'm really sorry, but there is a couple more of important patches that should be merged for tests to complete successfully. Could you please rebase once more?

@ritchxu ritchxu force-pushed the ritchxu/nf_conntrack_tcp_be_liberal branch from f40e07e to 7c8d990 Compare July 21, 2023 15:15
@ritchxu
Copy link
Contributor Author

ritchxu commented Jul 21, 2023

Hey, @ritchxu. I'm really sorry, but there is a couple more of important patches that should be merged for tests to complete successfully. Could you please rebase once more?

No problem, it's done 👍🏼

@ritchxu
Copy link
Contributor Author

ritchxu commented Jul 21, 2023

@vpolikarpov-akvelon your workaround works! https://github.com/actions/runner-images/actions/runs/5624140452/job/15240333265?pr=7860#step:7:18196

    azure-arm.build_vhd: Reboot VM
==> azure-arm.build_vhd: Provisioning with shell script: /tmp/packer-shell1662180624
    azure-arm.build_vhd: 1
    azure-arm.build_vhd: net.netfilter.nf_conntrack_tcp_be_liberal = 1

I pushed another update to remove the temporary step that prints out ☝🏼. We should be good to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants