Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network struct reorganize #4

Open
wants to merge 12 commits into
base: linux-6.6.y
Choose a base branch
from
Open

Conversation

opsiff
Copy link
Owner

@opsiff opsiff commented Feb 10, 2025

Summary by Sourcery

Reorganize network structures for better cacheline alignment.

New Features:

  • Introduce cacheline annotations for tcp_sock, net_device, and netns_ipv4 structures.

Enhancements:

  • Reorganize the tcp_sock structure to group frequently accessed fields in the same cacheline, improving performance.

Tests:

  • Add compile-time checks to ensure cacheline alignment of the reorganized structures.

xli98 and others added 12 commits January 17, 2025 00:19
mainline inclusion
from mainline-v6.8-rc1
category: performance

Analyzed a few structs in the networking stack by looking at variables
within them that are used in the TCP/IP fast path.

Fast path is defined as TCP path where data is transferred from sender to
receiver unidirectionally. It doesn't include phases other than
TCP_ESTABLISHED, nor does it look at error paths.

We hope to re-organizing variables that span many cachelines whose fast
path variables are also spread out, and this document can help future
developers keep networking fast path cachelines small.

Optimized_cacheline field is computed as
(Fastpath_Bytes/L3_cacheline_size_x86), and not the actual organized
results (see patches to come for these).

Investigation is done on 6.5

Name	                Struct_Cachelines  Cur_fastpath_cache Fastpath_Bytes Optimized_cacheline
tcp_sock	        42 (2664 Bytes)	   12   		396		8
net_device	        39 (2240 bytes)	   12			234		4
inet_sock	        15 (960 bytes)	   14			922		14
Inet_connection_sock	22 (1368 bytes)	   18			1166		18
Netns_ipv4 (sysctls)	12 (768 bytes)     4			77		2
linux_mib	        16 (1060)	   6			104		2

Note how there isn't much improvement space for inet_sock and
Inet_connection_sock because sk and icsk_inet respectively takes up so
much of the struct that rest of the variables become a small portion of
the struct size.

So, we decided to reorganize tcp_sock, net_device, netns_ipv4

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Coco Li <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 14006f1)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

Set up build time warnings to safeguard against future header changes of
organized structs.

Warning includes:

1) whether all variables are still in the same cache group
2) whether all the cache groups have the sum of the members size (in the
   maximum condition, including all members defined in configs)

The __cache_group* variables are ignored in kernel-doc check in the
various header files they appear in to enforce the cache groups.

Suggested-by: Daniel Borkmann <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: Coco Li <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit aeb9ce0)
mainline inclusion
from mainline-v6.8-rc1
category: performance

Reorganize fast path variables on tx-txrx-rx order.
Fastpath cacheline ends after sysctl_tcp_rmem.
There are only read-only variables here. (write is on the control path
and not considered in this case)

Below data generated with pahole on x86 architecture.
Fast path variables span cache lines before change: 4
Fast path variables span cache lines after change: 2

Suggested-by: Eric Dumazet <[email protected]>
Reviewed-by: Wei Wang <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Coco Li <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Shakeel Butt <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 18fd64d)
mainline inclusion
from mainline-v6.8-rc1
category: performance

Reorganize fast path variables on tx-txrx-rx order
Fastpath variables end after npinfo.

Below data generated with pahole on x86 architecture.

Fast path variables span cache lines before change: 12
Fast path variables span cache lines after change: 4

Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Coco Li <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 43a71cd)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

The variables are organized according in the following way:

- TX read-mostly hotpath cache lines
- TXRX read-mostly hotpath cache lines
- RX read-mostly hotpath cache lines
- TX read-write hotpath cache line
- TXRX read-write hotpath cache line
- RX read-write hotpath cache line

Fastpath cachelines end after rcvq_space.

Cache line boundaries are enforced only between read-mostly and
read-write. That is, if read-mostly tx cachelines bleed into
read-mostly txrx cachelines, we do not care. We care about the
boundaries between read and write cachelines because we want
to prevent false sharing.

Fast path variables span cache lines before change: 12
Fast path variables span cache lines after change: 8

Suggested-by: Eric Dumazet <[email protected]>
Reviewed-by: Wei Wang <[email protected]>
Signed-off-by: Coco Li <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit d5fed5a)
Signed-off-by: Wentao Guan <[email protected]>
mainline inclusion
from mainline-v6.8-rc1
category: performance

dev->gso_partial_features is read from tx fast path for GSO packets.

Move it to appropriate section to avoid a cache line miss.

Fixes: 43a71cd ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Coco Li <[email protected]>
Cc: David Ahern <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 993498e)
mainline inclusion
from mainline-v6.8-rc1
category: performance

xdp_prog is used in receive path, both from XDP enabled drivers
and from netif_elide_gro().

This patch also removes two 4-bytes holes.

Fixes: 43a71cd ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Coco Li <[email protected]>
Cc: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit d3d344a)
mainline inclusion
from mainline-v6.8-rc5
category: performance

tp->scaling_ratio is a read mostly field, used in rx and tx fast paths.

Fixes: d5fed5a ("tcp: reorganize tcp_sock fast path variables")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Coco Li <[email protected]>
Cc: Wei Wang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 119ff04)
mainline inclusion
from mainline-v6.8-rc5
category: performance

[gwt bp tip]: its field is reserved for future
tp->tcp_usec_ts is a read mostly field, used in rx and tx fast paths.

Fixes: d5fed5a ("tcp: reorganize tcp_sock fast path variables")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Coco Li <[email protected]>
Cc: Wei Wang <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit 666a877)
mainline inclusion
from mainline-v6.8-rc5
category: performance

dev->lstats is notably used from loopback ndo_start_xmit()
and other virtual drivers.

Per cpu stats updates are dirtying per-cpu data,
but the pointer itself is read-only.

Fixes: 43a71cd ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Coco Li <[email protected]>
Cc: Simon Horman <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
(cherry picked from commit c353c7b)
mainline inclusion
from mainline-v6.9-rc1
category: performance

dev->state can be read in rx and tx fast paths.

netif_running() which needs dev->state is called from
- enqueue_to_backlog() [RX path]
- __dev_direct_xmit()  [TX path]

Fixes: 43a71cd ("net-device: reorganize net_device fast path variables")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Coco Li <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
(cherry picked from commit f6e0a49)
mainline inclusion
from mainline-v6.13-rc1
category: performance

Commit d5fed5a ("tcp: reorganize tcp_sock fast path variables")
moved the fields around and misplaced the documentation for "lsndtime".
So, let's replace it in the proper place.

Signed-off-by: Menglong Dong <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
(cherry picked from commit 0a2cdeeae9ddc14d752173be6af98bc9fb45c6ad)
Copy link

sourcery-ai bot commented Feb 10, 2025

Reviewer's Guide by Sourcery

This pull request reorganizes several networking structures to optimize cacheline usage by grouping related fields together. The changes include splitting the tcp_sock, net_device, and netns_ipv4 structures into cacheline groups marked by new macros, adding compile-time assertions to check proper cacheline alignment and sizes, and updating associated documentation and helper macros.

Updated class diagram for tcp_sock with cacheline groups

classDiagram
    class tcp_sock {
      <<cacheline group: tcp_sock_read_tx>>
      + max_window: u32
      + rcv_ssthresh: u32
      + reordering: u32
      + notsent_lowat: u32
      + gso_segs: u16
      + lost_skb_hint: sk_buff*
      + retransmit_skb_hint: sk_buff*

      <<cacheline group: tcp_sock_read_txrx>>
      + tsoffset: u32
      + snd_wnd: u32
      + mss_cache: u32
      + snd_cwnd: u32
      + prr_out: u32
      + lost_out: u32
      + sacked_out: u32
      + tcp_header_len: u16

      <<cacheline group: tcp_sock_read_rx>>
      + copied_seq: u32
      + rcv_tstamp: u32
      + snd_wl1: u32
      + tlp_high_seq: u32
      + rttvar_us: u32

      <<cacheline group: tcp_sock_write_tx>>
      + segs_out: u32
      + data_segs_out: u32
      + bytes_sent: u64

      <<cacheline group: tcp_sock_write_txrx>>
      + pred_flags: __be32
      + rcv_nxt: u32
      + snd_nxt: u32
      + snd_una: u32
      + window_clamp: u32
      + srtt_us: u32

      <<cacheline group: tcp_sock_write_rx>>
      + bytes_received: u64
      + segs_in: u32
      + data_segs_in: u32
      + rcv_wup: u32
    }
Loading

Updated class diagram for net_device with cacheline groups

classDiagram
    class net_device {
      <<cacheline group: net_device_read_tx>>
      + priv_flags: unsigned long long
      + netdev_ops: const struct net_device_ops*
      + header_ops: const struct header_ops*
      + _tx: netdev_queue*
      + mtu: unsigned int

      <<cacheline group: net_device_read_txrx>>
      + lstats/tstats/dstats: pointer
      + state: unsigned long
      + flags: unsigned int
      + hard_header_len: unsigned short
      + features: netdev_features_t
      + ip6_ptr: inet6_dev*

      <<cacheline group: net_device_read_rx>>
      + xdp_prog: bpf_prog*
      + ptype_specific: list_head
      + ifindex: int
      + real_num_rx_queues: unsigned int
      + _rx: netdev_rx_queue*
    }
Loading

Updated class diagram for netns_ipv4 with cacheline groups

classDiagram
    class netns_ipv4 {
      <<cacheline group: netns_ipv4_read_tx>>
      + sysctl_tcp_early_retrans: u8
      + sysctl_tcp_tso_win_divisor: u8
      + sysctl_tcp_tso_rtt_log: u8
      + sysctl_tcp_autocorking: u8
      + sysctl_tcp_min_snd_mss: int
      + sysctl_tcp_notsent_lowat: unsigned int

      <<cacheline group: netns_ipv4_read_txrx>>
      + sysctl_tcp_moderate_rcvbuf: u8

      <<cacheline group: netns_ipv4_read_rx>>
      + sysctl_ip_early_demux: u8
      + sysctl_tcp_early_demux: u8
      + sysctl_tcp_reordering: int
      + sysctl_tcp_rmem: int[3]
    }
Loading

File-Level Changes

Change Details Files
Reorganized the tcp_sock structure.
  • Added cacheline group markers (__cacheline_group_begin and __cacheline_group_end) to clearly partition tcp_sock fields based on hotpath usage (e.g., TX read-mostly, TXRX read-mostly, RX read-mostly, TX write, etc.).
  • Reordered and relocated several fields to optimize cache utilization and performance.
  • Removed or merged duplicated fields and updated comments to reflect the new organization.
include/linux/tcp.h
Reorganized the net_device structure.
  • Grouped net_device fields into distinct cacheline sections (TX read-mostly, TXRX read-mostly, RX read-mostly) using cacheline group macros.
  • Moved fields around to support efficient fast-path accesses.
  • Removed duplicate field declarations and simplified structure layout.
include/linux/netdevice.h
Reorganized the netns_ipv4 structure.
  • Partitioned sysctl and other configuration fields into cacheline groups for TX readonly, TXRX readonly, and RX readonly regions.
  • Updated inline documentation to reference the new cacheline groups.
include/net/netns/ipv4.h
Added compile-time cacheline assertions and structure checks.
  • Introduced tcp_struct_check() in tcp.c to verify the placement and sizes of tcp_sock cacheline groups.
  • Added net_dev_struct_check() in dev.c to assert correctness in net_device structure layout.
  • Implemented netns_ipv4_struct_check() in net_namespace.c to check cacheline alignment of netns_ipv4 fields.
net/ipv4/tcp.c
net/core/dev.c
net/core/net_namespace.c
Introduced new helper macros for cacheline groups.
  • Defined __cacheline_group_begin and __cacheline_group_end markers in include/linux/cache.h.
  • Added CACHELINE_ASSERT_GROUP_MEMBER and CACHELINE_ASSERT_GROUP_SIZE macros to enforce alignment and group size constraints at build time.
include/linux/cache.h
Updated documentation and build scripts.
  • Added new documentation files in Documentation/networking/net_cachelines for tcp_sock, net_device, inet_sock, netns_ipv4_sysctl, and others.
  • Updated maintainers and kernel-doc scripts to reflect the new structure layout.
Documentation/networking/index.rst
Documentation/networking/net_cachelines/index.rst
Documentation/networking/net_cachelines/inet_connection_sock.rst
Documentation/networking/net_cachelines/inet_sock.rst
Documentation/networking/net_cachelines/net_device.rst
Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst
Documentation/networking/net_cachelines/snmp.rst
Documentation/networking/net_cachelines/tcp_sock.rst
MAINTAINERS
scripts/kernel-doc

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @opsiff - I've reviewed your changes - here's some feedback:

Overall Comments:

  • It would be good to add a script to automatically check the cacheline layout.
  • Consider using __cacheline_aligned instead of ____cacheline_aligned for better readability.
Here's what I looked at during the review
  • 🟢 General issues: all looks good
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 2 issues found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@@ -175,23 +175,122 @@ static inline struct tcp_request_sock *tcp_rsk(const struct request_sock *req)
#define TCP_RMEM_TO_WIN_SCALE 8

struct tcp_sock {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider using dedicated sub-structures for cacheline groups to improve readability and reduce repetitive markers.

Consider grouping related fields into separate sub-structures rather than using many inline __cacheline_group markers. For example, instead of:

__cacheline_group_begin(tcp_sock_read_tx);
u32	max_window;
u32	rcv_ssthresh;
u32	reordering;
u32	notsent_lowat;
u16	gso_segs;
struct sk_buff *lost_skb_hint;
struct sk_buff *retransmit_skb_hint;
__cacheline_group_end(tcp_sock_read_tx);

you could define a dedicated sub-structure and include it in the parent, like:

struct tcp_sock_read_tx {
    u32 max_window;         /* Maximal window ever seen from peer */
    u32 rcv_ssthresh;       /* Current window clamp */
    u32 reordering;         /* Packet reordering metric */
    u32 notsent_lowat;      /* TCP_NOTSENT_LOWAT */
    u16 gso_segs;           /* GSO segments */
    struct sk_buff *lost_skb_hint;
    struct sk_buff *retransmit_skb_hint;
};

struct tcp_sock {
    struct inet_connection_sock inet_conn;
    __cacheline_group_begin(tcp_sock_read_tx);
    struct tcp_sock_read_tx read_tx;
    __cacheline_group_end(tcp_sock_read_tx);
    /* other cacheline groups... */
};

This modularizes the layout, makes the structure easier to read, and reduces repetitive markers. Similar refactoring can be applied to other cacheline groups.

@@ -11578,6 +11578,63 @@ static struct pernet_operations __net_initdata default_device_ops = {
.exit_batch = default_device_exit_batch,
};

static void __init net_dev_struct_check(void)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider abstracting the repetitive CACHELINE_ASSERT_GROUP_MEMBER calls into a helper macro or function.

Consider abstracting these repetitive assertions into a helper macro so that you list the fields just once. For example, you could define a macro that maps a list of member names into individual assertions. One approach is:

/* Helper macro to call the assertion on a field */
#define CACHELINE_ASSERT_FIELD(struct_type, group, field) \
    CACHELINE_ASSERT_GROUP_MEMBER(struct_type, group, field)

/* Helper to apply the above macro on each field in a simple list;
 * since C99 doesn’t have a standard MAP macro, you can unroll a small one:
 */
#define CACHELINE_ASSERT_FIELDS_TX(...) \
    do { \
        CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, __VA_ARGS__); \
    } while(0)

If the number of fields is fixed, you might “unroll” them:

static void __init net_dev_struct_check(void)
{
    /* TX read-mostly hotpath */
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, priv_flags);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, netdev_ops);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, header_ops);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, _tx);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, real_num_tx_queues);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, gso_max_size);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, gso_ipv4_max_size);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, gso_max_segs);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, gso_partial_features);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, num_tc);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, mtu);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, needed_headroom);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, tc_to_txq);
#ifdef CONFIG_XPS
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, xps_maps);
#endif
#ifdef CONFIG_NETFILTER_EGRESS
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, nf_hooks_egress);
#endif
#ifdef CONFIG_NET_XGRESS
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_tx, tcx_egress);
#endif
    CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_tx, 160);

    /* TXRX read-mostly hotpath */
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_txrx, lstats);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_txrx, state);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_txrx, flags);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_txrx, hard_header_len);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_txrx, features);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_txrx, ip6_ptr);
    CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_txrx, 46);

    /* RX read-mostly hotpath */
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, ptype_specific);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, ifindex);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, real_num_rx_queues);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, _rx);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, gro_flush_timeout);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, napi_defer_hard_irqs);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, gro_max_size);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, gro_ipv4_max_size);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, rx_handler);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, rx_handler_data);
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, nd_net);
#ifdef CONFIG_NETPOLL
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, npinfo);
#endif
#ifdef CONFIG_NET_XGRESS
    CACHELINE_ASSERT_FIELD(struct net_device, net_device_read_rx, tcx_ingress);
#endif
    CACHELINE_ASSERT_GROUP_SIZE(struct net_device, net_device_read_rx, 104);
}

If you wish to further reduce the repetition, consider building a small configuration table for each group and iterating over it. For example, you might create an array of field names or offsets (if the macros can work with computed offsets) and then loop through the array, invoking the assertion for each entry.

These changes encapsulate the repetitive logic. They preserve functionality while reducing the file’s complexity and making future modifications easier.

opsiff pushed a commit that referenced this pull request Feb 24, 2025
[ Upstream commit c7b87ce0dd10b64b68a0b22cb83bbd556e28fe81 ]

libtraceevent parses and returns an array of argument fields, sometimes
larger than RAW_SYSCALL_ARGS_NUM (6) because it includes "__syscall_nr",
idx will traverse to index 6 (7th element) whereas sc->fmt->arg holds 6
elements max, creating an out-of-bounds access. This runtime error is
found by UBsan. The error message:

  $ sudo UBSAN_OPTIONS=print_stacktrace=1 ./perf trace -a --max-events=1
  builtin-trace.c:1966:35: runtime error: index 6 out of bounds for type 'syscall_arg_fmt [6]'
    #0 0x5c04956be5fe in syscall__alloc_arg_fmts /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:1966
    #1 0x5c04956c0510 in trace__read_syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2110
    #2 0x5c04956c372b in trace__syscall_info /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:2436
    #3 0x5c04956d2f39 in trace__init_syscalls_bpf_prog_array_maps /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:3897
    #4 0x5c04956d6d25 in trace__run /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:4335
    deepin-community#5 0x5c04956e112e in cmd_trace /home/howard/hw/linux-perf/tools/perf/builtin-trace.c:5502
    deepin-community#6 0x5c04956eda7d in run_builtin /home/howard/hw/linux-perf/tools/perf/perf.c:351
    deepin-community#7 0x5c04956ee0a8 in handle_internal_command /home/howard/hw/linux-perf/tools/perf/perf.c:404
    deepin-community#8 0x5c04956ee37f in run_argv /home/howard/hw/linux-perf/tools/perf/perf.c:448
    deepin-community#9 0x5c04956ee8e9 in main /home/howard/hw/linux-perf/tools/perf/perf.c:556
    deepin-community#10 0x79eb3622a3b7 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
    deepin-community#11 0x79eb3622a47a in __libc_start_main_impl ../csu/libc-start.c:360
    deepin-community#12 0x5c04955422d4 in _start (/home/howard/hw/linux-perf/tools/perf/perf+0x4e02d4) (BuildId: 5b6cab2d59e96a4341741765ad6914a4d784dbc6)

     0.000 ( 0.014 ms): Chrome_ChildIO/117244 write(fd: 238, buf: !, count: 1)                                      = 1

Fixes: 5e58fcf ("perf trace: Allow allocating sc->arg_fmt even without the syscall tracepoint")
Signed-off-by: Howard Chu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Namhyung Kim <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
opsiff pushed a commit that referenced this pull request Feb 24, 2025
commit c6ef3a7fa97ec823a1e1af9085cf13db9f7b3bac upstream.

If the uvc_status_init() function fails to allocate the int_urb, it will
free the dev->status pointer but doesn't reset the pointer to NULL. This
results in the kfree() call in uvc_status_cleanup() trying to
double-free the memory. Fix it by resetting the dev->status pointer to
NULL after freeing it.

Fixes: a31a405 ("V4L/DVB:usbvideo:don't use part of buffer for USB transfer #4")
Cc: [email protected]
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Laurent Pinchart <[email protected]>
Reviewed by: Ricardo Ribalda <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants