Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add error logging for that Antrea network's MTU exceeds Suricata's maximum supported value #5408

Merged
merged 1 commit into from
Sep 20, 2023

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Aug 18, 2023

Fixes: #5398

When the MTU value of Antrea network is greater than the maximum supported MTU by
Suricata, which is used as L7 NetworkPolicy engine in Antrea. It may fail to start.
Currently, assuming that the page size is 4096, the maximum supported MTU by Suricata
is 32678 according to Suricata source code at
https://github.com/OISF/suricata/blob/49713ebaa0b8edb057d60f1cfe9126946645a848/src/source-af-packet.c#L1757C2-L1777C129.
This commit adds an error logging to warn users about that.

Signed-off-by: Hongliang Liu [email protected]

antoninbas
antoninbas previously approved these changes Aug 22, 2023
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@antoninbas antoninbas added kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release action/backport Indicates a PR that requires backports. kind/bug Categorizes issue or PR as related to a bug. and removed kind/cherry-pick Categorizes issue or PR as related to the cherry-pick of a bug fix from the main branch to a release labels Aug 22, 2023
@antoninbas
Copy link
Contributor

/test-all

@antoninbas antoninbas added the action/release-note Indicates a PR that should be included in release notes. label Aug 22, 2023
@tnqn
Copy link
Member

tnqn commented Aug 24, 2023

the minimum MTU value from all OVS ports will be used. The MTU value might be less than the MTU calculated by Antrea which is used in Antrea local gateway port and Pod ports, result in the unavailability of L7 NetworkPolicy.

What does it mean? I saw the bug was because one interface used 1500 while the other used 65485, neither of which seems not the minimum MTU of OVS ports.

@hongliangl
Copy link
Contributor Author

hongliangl commented Aug 24, 2023

the minimum MTU value from all OVS ports will be used. The MTU value might be less than the MTU calculated by Antrea which is used in Antrea local gateway port and Pod ports, result in the unavailability of L7 NetworkPolicy.

What does it mean? I saw the bug was because one interface used 1500 while the other used 65485, neither of which seems not the minimum MTU of OVS ports.

For example, there are two ports connected to the OVS:

  • port antrea-gw0, MTU 65485
  • port pod-1, MTU 65485

When creating an OVS internal port antrea-l7-tap0 without specifying MTU, its MTU will be also 65485, then we have these ports:

  • port antrea-gw0, MTU 65485
  • port pod-1, MTU 65485
  • port antrea-l7-tap0, MTU 65485

If a port unknown whose MTU is 1500 is attached to the OVS, then when creating an OVS internal port antrea-l7-tap1 without specifying MTU, its MTU will be 1500, then we have these ports:

  • port antrea-gw0, MTU 65485
  • port pod-1, MTU 65485
  • port antrea-l7-tap0, MTU 65485
  • port unknown, MTU 1500
  • port antrea-l7-tap1, MTU 1500

I'm not sure what happened during creating the OVS internal ports for L7 NetworkPolicy, which results in the fact that the ports for L7 NetworkPolicy have different MTU values, but I think the issue that they have different MTU values can be fixed by specifying MTU value when creating them or modifying them at every Antrea startup.

@tnqn
Copy link
Member

tnqn commented Aug 24, 2023

Aren't the MTUs of Pod interface and antrea-gw0 already set to 1500 or 1450? In which case they will be 65485? I understand setting suricata interfaces's MTU explicitly is right thing to do, but I don't understand the root cause of the issue.

@hongliangl
Copy link
Contributor Author

hongliangl commented Aug 24, 2023

Aren't the MTUs of Pod interface and antrea-gw0 already set to 1500 or 1450? In which case they will be 65485? I understand setting suricata interfaces's MTU explicitly is right thing to do, but I don't understand the root cause of the issue.

MTU 65485 is set on @tushartathgur local testbed when I had a live debugging with him last time. I saw that the MTU of every port connected to OVS is 65485, including antrea-gw0. Not like a port created by Antrea with specifying MTU value explicitly used by a Pod , we don't specify the MTU values of ports used by L7 NetworkPolicy explicitly. I'm not sure the root cause of the issue in @tushartathgur 's local testbed, but I believe we can fix the issue by specifying the MTU value.

@tnqn
Copy link
Member

tnqn commented Aug 24, 2023

Could you test if setting the MTU explicitly to the same MTU as Pod can fix the issue on a Node with MTU > 1500? From the issue report, I can't tell if it crashed because inconsistent MTU of MTU larger than 1500.

@hongliangl
Copy link
Contributor Author

Could you test if setting the MTU explicitly to the same MTU as Pod can fix the issue on a Node with MTU > 1500? From the issue report, I can't tell if it crashed because inconsistent MTU of MTU larger than 1500.

OK

@hongliangl
Copy link
Contributor Author

Could you test if setting the MTU explicitly to the same MTU as Pod can fix the issue on a Node with MTU > 1500? From the issue report, I can't tell if it crashed because inconsistent MTU of MTU larger than 1500.

After investigation, I found that the crash was caused by the MTU 65485. According to the error log Frame size bigger than block size, I have inspected the related code (Suricata 6.0.13) in this link https://github.com/OISF/suricata/blob/f09a6b562ec7f28f512e156d103284fa910307a1/src/source-af-packet.c#L1792

 /* Compute structure:
       Target is to store all pending packets
       with a size equal to MTU + auxdata
       And we keep a decent number of block

       To do so:
       Compute frame_size (aligned to be able to fit in block
       Check which block size we need. Blocksize is a 2^n * pagesize
       We then need to get order, big enough to have
       frame_size < block size
       Find number of frame per block (divide)
       Fill in packet_req

       Compute frame size:
       described in packet_mmap.txt
       dependant on snaplen (need to use a variable ?)
snaplen: MTU ?
tp_hdrlen determine_version in daq_afpacket
in V1:  sizeof(struct tpacket_hdr);
in V2: val in getsockopt(instance->fd, SOL_PACKET, PACKET_HDRLEN, &val, &len)
frame size: TPACKET_ALIGN(snaplen + TPACKET_ALIGN(TPACKET_ALIGN(tp_hdrlen) + sizeof(struct sockaddr_ll) + ETH_HLEN) - ETH_HLEN);

     */
    int tp_hdrlen = sizeof(struct tpacket_hdr);
    int snaplen = default_packet_size;

    if (snaplen == 0) {
        snaplen = GetIfaceMaxPacketSize(ptv->iface);
        if (snaplen <= 0) {
            SCLogWarning(SC_ERR_INVALID_VALUE,
                         "Unable to get MTU, setting snaplen to sane default of 1514");
            snaplen = 1514;
        }
    }

    ptv->req.v2.tp_frame_size = TPACKET_ALIGN(snaplen +TPACKET_ALIGN(TPACKET_ALIGN(tp_hdrlen) + sizeof(struct sockaddr_ll) + ETH_HLEN) - ETH_HLEN);
    ptv->req.v2.tp_block_size = getpagesize() << order;
    int frames_per_block = ptv->req.v2.tp_block_size / ptv->req.v2.tp_frame_size;
    if (frames_per_block == 0) {
        SCLogError(SC_ERR_INVALID_VALUE, "Frame size bigger than block size");
        return -1;
    }

According to the code:

  • The value of tp_block_size in the code is Linux page size << 3. Generally, the page size is 4096, then the tp_block_size is 32768.
  • The value of tp_frame_size is calculated from TPACKET_ALIGN(snaplen +TPACKET_ALIGN(TPACKET_ALIGN(tp_hdrlen) + sizeof(struct sockaddr_ll) + ETH_HLEN) - ETH_HLEN);.
    • From Suricata code, in the case of Antrea, snaplen is calculated from MTU of Antrea L7 NetworkPolicy port + 24.
    • If MTU of Antrea L7 NetworkPolicy port is 65485, the value of tp_frame_size is 65584, which is bigger than tp_block_size.
    • After calculation, the biggest value of MTU we can use currently is 32678.

@tnqn

@hongliangl hongliangl force-pushed the 20230818-fix-l7-mtu branch 4 times, most recently from 0a42524 to 4740a2a Compare August 28, 2023 05:00
pkg/agent/agent_linux.go Outdated Show resolved Hide resolved
pkg/agent/agent_linux.go Show resolved Hide resolved
The MTU of OVS ports for L7 NetworkPolicy should be set to the
calculated MTU value according to traffic mode at every startup.
For example, before this commit, assuming that feature gate
L7NetworkPolicy is enabled in encap mode, then the OVS ports for
L7 NetworkPolicy will be created and their MTU is 1420. If the
traffic mode is changed to noEncap, the MTU of the OVS ports is
still 1420. However, the MTU of Pods ports and Antrea local gateway
port is 1500 right now. Besides, when creating the L7 NetworkPolicy
ports for the first time in a Node, without specifying the MTU value,
the minimum MTU value from all OVS ports will be used.

From above, we can see that the MTU value might be smaller than the
MTU calculated by Antrea which is used in Antrea local gateway port
and Pod ports, which results in the unavailability of L7 NetworkPolicy
if the size of packet is greater than the value of L7 NetworkPolicy port
MTU.

Signed-off-by: Hongliang Liu <[email protected]>
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Sep 20, 2023

/skip-all

@tnqn
Copy link
Member

tnqn commented Sep 20, 2023

@hongliangl I suppose the commit message and the PR description is out-of-date? Could you update the PR description which I will use as commit message.

@hongliangl hongliangl changed the title Set MTU of OVS ports for L7 NetworkPolicy at startup Add error logging for that Antrea network's MTU exceeds Suricata's maximum supported value Sep 20, 2023
@hongliangl
Copy link
Contributor Author

@hongliangl I suppose the commit message and the PR description is out-of-date? Could you update the PR description which I will use as commit message.

Added

@tnqn tnqn merged commit adb6fd1 into antrea-io:main Sep 20, 2023
@tnqn
Copy link
Member

tnqn commented Sep 20, 2023

Removing action/backport as it doesn't really prevent suricata from crashing if the MTU is greater than the maximum value.

@tnqn tnqn removed the action/backport Indicates a PR that requires backports. label Sep 20, 2023
@hongliangl hongliangl deleted the 20230818-fix-l7-mtu branch September 20, 2023 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes. kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suricata terminating briefly after initializing
3 participants