Skip to content
This repository has been archived by the owner on Nov 21, 2022. It is now read-only.

Commit

Permalink
Merge branch 'TC-Introduce-qevents'
Browse files Browse the repository at this point in the history
Petr Machata says:

====================
TC: Introduce qevents

The Spectrum hardware allows execution of one of several actions as a
result of queue management decisions: tail-dropping, early-dropping,
marking a packet, or passing a configured latency threshold or buffer
size. Such packets can be mirrored, trapped, or sampled.

Modeling the action to be taken as simply a TC action is very attractive,
but it is not obvious where to put these actions. At least with ECN marking
one could imagine a tree of qdiscs and classifiers that effectively
accomplishes this task, albeit in an impractically complex manner. But
there is just no way to match on dropped-ness of a packet, let alone
dropped-ness due to a particular reason.

To allow configuring user-defined actions as a result of inner workings of
a qdisc, this patch set introduces a concept of qevents. Those are attach
points for TC blocks, where filters can be put that are executed as the
packet hits well-defined points in the qdisc algorithms. The attached
blocks can be shared, in a manner similar to clsact ingress and egress
blocks, arbitrary classifiers with arbitrary actions can be put on them,
etc.

For example:

	red limit 500K avpkt 1K qevent early_drop block 10
	matchall action mirred egress mirror dev eth1

The central patch #2 introduces several helpers to allow easy and uniform
addition of qevents to qdiscs: initialization, destruction, qevent block
number change validation, and qevent handling, i.e. dispatch of the filters
attached to the block bound to a qevent.

Patch #1 adds root_lock argument to qdisc enqueue op. The problem this is
tackling is that if a qevent filter pushes packets to the same qdisc tree
that holds the qevent in the first place, attempt to take qdisc root lock
for the second time will lead to a deadlock. To solve the issue, qevent
handler needs to unlock and relock the root lock around the filter
processing. Passing root_lock around makes it possible to get the lock
where it is needed, and visibly so, such that it is obvious the lock will
be used when invoking a qevent.

The following two patches, #3 and #4, then add two qevents to the RED
qdisc: "early_drop" qevent fires when a packet is early-dropped; "mark"
qevent, when it is ECN-marked.

Patch #5 contains a selftest. I have mentioned this test when pushing the
RED ECN nodrop mode and said that "I have no confidence in its portability
to [...] different configurations". That still holds. The backlog and
packet size are tuned to make the test deterministic. But it is better than
nothing, and on the boxes that I ran it on it does work and shows that
qevents work the way they are supposed to, and that their addition has not
broken the other tested features.

This patch set does not deal with offloading. The idea there is that a
driver will be able to figure out that a given block is used in qevent
context by looking at binder type. A future patch-set will add a qdisc
pointer to struct flow_block_offload, which a driver will be able to
consult to glean the TC or other relevant attributes.

Changes from RFC to v1:
- Move a "q = qdisc_priv(sch)" from patch #3 to patch #4
- Fix deadlock caused by mirroring packet back to the same qdisc tree.
- Rename "tail" qevent to "tail_drop".
- Adapt to the new 100-column standard.
- Add a selftest
====================

Signed-off-by: David S. Miller <[email protected]>
  • Loading branch information
davem330 committed Jun 30, 2020
2 parents 5e701e4 + 6cf0291 commit 989d957
Show file tree
Hide file tree
Showing 40 changed files with 822 additions and 84 deletions.
2 changes: 2 additions & 0 deletions include/net/flow_offload.h
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,8 @@ enum flow_block_binder_type {
FLOW_BLOCK_BINDER_TYPE_UNSPEC,
FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS,
FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
FLOW_BLOCK_BINDER_TYPE_RED_EARLY_DROP,
FLOW_BLOCK_BINDER_TYPE_RED_MARK,
};

struct flow_block {
Expand Down
49 changes: 49 additions & 0 deletions include/net/pkt_cls.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,12 @@ struct tcf_block_ext_info {
u32 block_index;
};

struct tcf_qevent {
struct tcf_block *block;
struct tcf_block_ext_info info;
struct tcf_proto __rcu *filter_chain;
};

struct tcf_block_cb;
bool tcf_queue_work(struct rcu_work *rwork, work_func_t func);

Expand Down Expand Up @@ -553,6 +559,49 @@ int tc_setup_cb_reoffload(struct tcf_block *block, struct tcf_proto *tp,
void *cb_priv, u32 *flags, unsigned int *in_hw_count);
unsigned int tcf_exts_num_actions(struct tcf_exts *exts);

#ifdef CONFIG_NET_CLS_ACT
int tcf_qevent_init(struct tcf_qevent *qe, struct Qdisc *sch,
enum flow_block_binder_type binder_type,
struct nlattr *block_index_attr,
struct netlink_ext_ack *extack);
void tcf_qevent_destroy(struct tcf_qevent *qe, struct Qdisc *sch);
int tcf_qevent_validate_change(struct tcf_qevent *qe, struct nlattr *block_index_attr,
struct netlink_ext_ack *extack);
struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, struct sk_buff *skb,
spinlock_t *root_lock, struct sk_buff **to_free, int *ret);
int tcf_qevent_dump(struct sk_buff *skb, int attr_name, struct tcf_qevent *qe);
#else
static inline int tcf_qevent_init(struct tcf_qevent *qe, struct Qdisc *sch,
enum flow_block_binder_type binder_type,
struct nlattr *block_index_attr,
struct netlink_ext_ack *extack)
{
return 0;
}

static inline void tcf_qevent_destroy(struct tcf_qevent *qe, struct Qdisc *sch)
{
}

static inline int tcf_qevent_validate_change(struct tcf_qevent *qe, struct nlattr *block_index_attr,
struct netlink_ext_ack *extack)
{
return 0;
}

static inline struct sk_buff *
tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, struct sk_buff *skb,
spinlock_t *root_lock, struct sk_buff **to_free, int *ret)
{
return skb;
}

static inline int tcf_qevent_dump(struct sk_buff *skb, int attr_name, struct tcf_qevent *qe)
{
return 0;
}
#endif

struct tc_cls_u32_knode {
struct tcf_exts *exts;
struct tcf_result *res;
Expand Down
6 changes: 4 additions & 2 deletions include/net/sch_generic.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ struct qdisc_skb_head {
struct Qdisc {
int (*enqueue)(struct sk_buff *skb,
struct Qdisc *sch,
spinlock_t *root_lock,
struct sk_buff **to_free);
struct sk_buff * (*dequeue)(struct Qdisc *sch);
unsigned int flags;
Expand Down Expand Up @@ -241,6 +242,7 @@ struct Qdisc_ops {

int (*enqueue)(struct sk_buff *skb,
struct Qdisc *sch,
spinlock_t *root_lock,
struct sk_buff **to_free);
struct sk_buff * (*dequeue)(struct Qdisc *);
struct sk_buff * (*peek)(struct Qdisc *);
Expand Down Expand Up @@ -788,11 +790,11 @@ static inline void qdisc_calculate_pkt_len(struct sk_buff *skb,
#endif
}

static inline int qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
static inline int qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
qdisc_calculate_pkt_len(skb, sch);
return sch->enqueue(skb, sch, to_free);
return sch->enqueue(skb, sch, root_lock, to_free);
}

static inline void _bstats_update(struct gnet_stats_basic_packed *bstats,
Expand Down
2 changes: 2 additions & 0 deletions include/uapi/linux/pkt_sched.h
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,8 @@ enum {
TCA_RED_STAB,
TCA_RED_MAX_P,
TCA_RED_FLAGS, /* bitfield32 */
TCA_RED_EARLY_DROP_BLOCK, /* u32 */
TCA_RED_MARK_BLOCK, /* u32 */
__TCA_RED_MAX,
};

Expand Down
4 changes: 2 additions & 2 deletions net/core/dev.c
Original file line number Diff line number Diff line change
Expand Up @@ -3749,7 +3749,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
qdisc_calculate_pkt_len(skb, q);

if (q->flags & TCQ_F_NOLOCK) {
rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
rc = q->enqueue(skb, q, NULL, &to_free) & NET_XMIT_MASK;
qdisc_run(q);

if (unlikely(to_free))
Expand Down Expand Up @@ -3792,7 +3792,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
qdisc_run_end(q);
rc = NET_XMIT_SUCCESS;
} else {
rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
rc = q->enqueue(skb, q, root_lock, &to_free) & NET_XMIT_MASK;
if (qdisc_run_begin(q)) {
if (unlikely(contended)) {
spin_unlock(&q->busylock);
Expand Down
119 changes: 119 additions & 0 deletions net/sched/cls_api.c
Original file line number Diff line number Diff line change
Expand Up @@ -3748,6 +3748,125 @@ unsigned int tcf_exts_num_actions(struct tcf_exts *exts)
}
EXPORT_SYMBOL(tcf_exts_num_actions);

#ifdef CONFIG_NET_CLS_ACT
static int tcf_qevent_parse_block_index(struct nlattr *block_index_attr,
u32 *p_block_index,
struct netlink_ext_ack *extack)
{
*p_block_index = nla_get_u32(block_index_attr);
if (!*p_block_index) {
NL_SET_ERR_MSG(extack, "Block number may not be zero");
return -EINVAL;
}

return 0;
}

int tcf_qevent_init(struct tcf_qevent *qe, struct Qdisc *sch,
enum flow_block_binder_type binder_type,
struct nlattr *block_index_attr,
struct netlink_ext_ack *extack)
{
u32 block_index;
int err;

if (!block_index_attr)
return 0;

err = tcf_qevent_parse_block_index(block_index_attr, &block_index, extack);
if (err)
return err;

if (!block_index)
return 0;

qe->info.binder_type = binder_type;
qe->info.chain_head_change = tcf_chain_head_change_dflt;
qe->info.chain_head_change_priv = &qe->filter_chain;
qe->info.block_index = block_index;

return tcf_block_get_ext(&qe->block, sch, &qe->info, extack);
}
EXPORT_SYMBOL(tcf_qevent_init);

void tcf_qevent_destroy(struct tcf_qevent *qe, struct Qdisc *sch)
{
if (qe->info.block_index)
tcf_block_put_ext(qe->block, sch, &qe->info);
}
EXPORT_SYMBOL(tcf_qevent_destroy);

int tcf_qevent_validate_change(struct tcf_qevent *qe, struct nlattr *block_index_attr,
struct netlink_ext_ack *extack)
{
u32 block_index;
int err;

if (!block_index_attr)
return 0;

err = tcf_qevent_parse_block_index(block_index_attr, &block_index, extack);
if (err)
return err;

/* Bounce newly-configured block or change in block. */
if (block_index != qe->info.block_index) {
NL_SET_ERR_MSG(extack, "Change of blocks is not supported");
return -EINVAL;
}

return 0;
}
EXPORT_SYMBOL(tcf_qevent_validate_change);

struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, struct sk_buff *skb,
spinlock_t *root_lock, struct sk_buff **to_free, int *ret)
{
struct tcf_result cl_res;
struct tcf_proto *fl;

if (!qe->info.block_index)
return skb;

fl = rcu_dereference_bh(qe->filter_chain);

if (root_lock)
spin_unlock(root_lock);

switch (tcf_classify(skb, fl, &cl_res, false)) {
case TC_ACT_SHOT:
qdisc_qstats_drop(sch);
__qdisc_drop(skb, to_free);
*ret = __NET_XMIT_BYPASS;
return NULL;
case TC_ACT_STOLEN:
case TC_ACT_QUEUED:
case TC_ACT_TRAP:
__qdisc_drop(skb, to_free);
*ret = __NET_XMIT_STOLEN;
return NULL;
case TC_ACT_REDIRECT:
skb_do_redirect(skb);
*ret = __NET_XMIT_STOLEN;
return NULL;
}

if (root_lock)
spin_lock(root_lock);

return skb;
}
EXPORT_SYMBOL(tcf_qevent_handle);

int tcf_qevent_dump(struct sk_buff *skb, int attr_name, struct tcf_qevent *qe)
{
if (!qe->info.block_index)
return 0;
return nla_put_u32(skb, attr_name, qe->info.block_index);
}
EXPORT_SYMBOL(tcf_qevent_dump);
#endif

static __net_init int tcf_net_init(struct net *net)
{
struct tcf_net *tn = net_generic(net, tcf_net_id);
Expand Down
4 changes: 2 additions & 2 deletions net/sched/sch_atm.c
Original file line number Diff line number Diff line change
Expand Up @@ -374,7 +374,7 @@ static struct tcf_block *atm_tc_tcf_block(struct Qdisc *sch, unsigned long cl,

/* --------------------------- Qdisc operations ---------------------------- */

static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
struct atm_qdisc_data *p = qdisc_priv(sch);
Expand Down Expand Up @@ -432,7 +432,7 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
#endif
}

ret = qdisc_enqueue(skb, flow->q, to_free);
ret = qdisc_enqueue(skb, flow->q, root_lock, to_free);
if (ret != NET_XMIT_SUCCESS) {
drop: __maybe_unused
if (net_xmit_drop_count(ret)) {
Expand Down
2 changes: 1 addition & 1 deletion net/sched/sch_blackhole.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
#include <linux/skbuff.h>
#include <net/pkt_sched.h>

static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch,
static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
qdisc_drop(skb, sch, to_free);
Expand Down
2 changes: 1 addition & 1 deletion net/sched/sch_cake.c
Original file line number Diff line number Diff line change
Expand Up @@ -1687,7 +1687,7 @@ static u32 cake_classify(struct Qdisc *sch, struct cake_tin_data **t,

static void cake_reconfigure(struct Qdisc *sch);

static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
struct cake_sched_data *q = qdisc_priv(sch);
Expand Down
4 changes: 2 additions & 2 deletions net/sched/sch_cbq.c
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,7 @@ cbq_mark_toplevel(struct cbq_sched_data *q, struct cbq_class *cl)
}

static int
cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
struct cbq_sched_data *q = qdisc_priv(sch);
Expand All @@ -373,7 +373,7 @@ cbq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return ret;
}

ret = qdisc_enqueue(skb, cl->q, to_free);
ret = qdisc_enqueue(skb, cl->q, root_lock, to_free);
if (ret == NET_XMIT_SUCCESS) {
sch->q.qlen++;
cbq_mark_toplevel(q, cl);
Expand Down
18 changes: 9 additions & 9 deletions net/sched/sch_cbs.c
Original file line number Diff line number Diff line change
Expand Up @@ -77,21 +77,21 @@ struct cbs_sched_data {
s64 sendslope; /* in bytes/s */
s64 idleslope; /* in bytes/s */
struct qdisc_watchdog watchdog;
int (*enqueue)(struct sk_buff *skb, struct Qdisc *sch,
int (*enqueue)(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free);
struct sk_buff *(*dequeue)(struct Qdisc *sch);
struct Qdisc *qdisc;
struct list_head cbs_list;
};

static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
struct Qdisc *child,
struct Qdisc *child, spinlock_t *root_lock,
struct sk_buff **to_free)
{
unsigned int len = qdisc_pkt_len(skb);
int err;

err = child->ops->enqueue(skb, child, to_free);
err = child->ops->enqueue(skb, child, root_lock, to_free);
if (err != NET_XMIT_SUCCESS)
return err;

Expand All @@ -101,16 +101,16 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return NET_XMIT_SUCCESS;
}

static int cbs_enqueue_offload(struct sk_buff *skb, struct Qdisc *sch,
static int cbs_enqueue_offload(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
struct cbs_sched_data *q = qdisc_priv(sch);
struct Qdisc *qdisc = q->qdisc;

return cbs_child_enqueue(skb, sch, qdisc, to_free);
return cbs_child_enqueue(skb, sch, qdisc, root_lock, to_free);
}

static int cbs_enqueue_soft(struct sk_buff *skb, struct Qdisc *sch,
static int cbs_enqueue_soft(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
struct cbs_sched_data *q = qdisc_priv(sch);
Expand All @@ -124,15 +124,15 @@ static int cbs_enqueue_soft(struct sk_buff *skb, struct Qdisc *sch,
q->last = ktime_get_ns();
}

return cbs_child_enqueue(skb, sch, qdisc, to_free);
return cbs_child_enqueue(skb, sch, qdisc, root_lock, to_free);
}

static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch,
static int cbs_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
struct cbs_sched_data *q = qdisc_priv(sch);

return q->enqueue(skb, sch, to_free);
return q->enqueue(skb, sch, root_lock, to_free);
}

/* timediff is in ns, slope is in bytes/s */
Expand Down
2 changes: 1 addition & 1 deletion net/sched/sch_choke.c
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ static bool choke_match_random(const struct choke_sched_data *q,
return choke_match_flow(oskb, nskb);
}

static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
struct choke_sched_data *q = qdisc_priv(sch);
Expand Down
2 changes: 1 addition & 1 deletion net/sched/sch_codel.c
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ static struct sk_buff *codel_qdisc_dequeue(struct Qdisc *sch)
return skb;
}

static int codel_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
static int codel_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch, spinlock_t *root_lock,
struct sk_buff **to_free)
{
struct codel_sched_data *q;
Expand Down
Loading

0 comments on commit 989d957

Please sign in to comment.