Skip to content

Commit

Permalink
Implement serializing the state of packet traversal in "continuations".
Browse files Browse the repository at this point in the history
One purpose of OpenFlow packet-in messages is to allow a controller to
interpose on the path of a packet through the flow tables.  If, for
example, the controller needs to modify a packet in some way that the
switch doesn't directly support, the controller should be able to
program the switch to send it the packet, then modify the packet and
send it back to the switch to continue through the flow table.

That's the theory.  In practice, this doesn't work with any but the
simplest flow tables.  Packet-in messages simply don't include enough
context to allow the flow table traversal to continue.  For example:

    * Via "resubmit" actions, an Open vSwitch packet can have an
      effective "call stack", but a packet-in can't describe it, and
      so it would be lost.

    * A packet-in can't preserve the stack used by NXAST_PUSH and
      NXAST_POP actions.

    * A packet-in can't preserve the OpenFlow 1.1+ action set.

    * A packet-in can't preserve the state of Open vSwitch mirroring
      or connection tracking.

This commit introduces a solution called "continuations".  A continuation
is the state of a packet's traversal through OpenFlow flow tables.  A
"controller" action with the "pause" flag, which is newly implemented in
this commit, generates a continuation and sends it to the OpenFlow
controller in a packet-in asynchronous message (only NXT_PACKET_IN2
supports continuations, so the controller must configure them with
NXT_SET_PACKET_IN_FORMAT).  The controller processes the packet-in,
possibly modifying some of its data, and sends it back to the switch with
an NXT_RESUME request, which causes flow table traversal to continue.  In
principle, a single packet can be paused and resumed multiple times.

Another way to look at it is:

    - "pause" is an extension of the existing OFPAT_CONTROLLER
      action.  It sends the packet to the controller, with full
      pipeline context (some of which is switch implementation
      dependent, and may thus vary from switch to switch).

    - A continuation is an extension of OFPT_PACKET_IN, allowing for
      implementation dependent metadata.

    - NXT_RESUME is an extension of OFPT_PACKET_OUT, with the
      semantics that the pipeline processing is continued with the
      original translation context from where it was left at the time
      it was paused.

Signed-off-by: Ben Pfaff <[email protected]>
Acked-by: Jarno Rajahalme <[email protected]>
  • Loading branch information
blp committed Feb 20, 2016
1 parent 5d10476 commit 77ab5fd
Show file tree
Hide file tree
Showing 29 changed files with 1,242 additions and 255 deletions.
5 changes: 4 additions & 1 deletion NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@ Post-v2.5.0
* OpenFlow 1.1+ OFPT_QUEUE_GET_CONFIG_REQUEST now supports OFPP_ANY.
* OpenFlow 1.4+ OFPMP_QUEUE_DESC is now supported.
* New property-based packet-in message format NXT_PACKET_IN2 with support
for arbitrary user-provided data.
for arbitrary user-provided data and for serializing flow table
traversal into a continuation for later resumption.
* New extension message NXT_SET_ASYNC_CONFIG2 to allow OpenFlow 1.4-like
control over asynchronous messages in earlier versions of OpenFlow.
- ovs-ofctl:
* queue-get-config command now allows a queue ID to be specified.
- DPDK:
Expand Down
99 changes: 97 additions & 2 deletions include/openflow/nicira-ext.h
Original file line number Diff line number Diff line change
Expand Up @@ -235,12 +235,106 @@ struct nx_packet_in {
};
OFP_ASSERT(sizeof(struct nx_packet_in) == 24);

/* NXT_PACKET_IN2.
/* NXT_PACKET_IN2
* ==============
*
* NXT_PACKET_IN2 is conceptually similar to OFPT_PACKET_IN but it is expressed
* as an extensible set of properties instead of using a fixed structure.
*
* Added in Open vSwitch 2.6. */
* Added in Open vSwitch 2.6
*
*
* Continuations
* -------------
*
* When a "controller" action specifies the "pause" flag, the controller action
* freezes the packet's trip through Open vSwitch flow tables and serializes
* that state into the packet-in message as a "continuation". The controller
* can later send the continuation back to the switch, which will restart the
* packet's traversal from the point where it was interrupted. This permits an
* OpenFlow controller to interpose on a packet midway through processing in
* Open vSwitch.
*
* Continuations fit into packet processing this way:
*
* 1. A packet ingresses into Open vSwitch, which runs it through the OpenFlow
* tables.
*
* 2. An OpenFlow flow executes a "controller" action that includes the "pause"
* flag. Open vSwitch serializes the packet processing state and sends it,
* as an NXT_PACKET_IN2 that includes an additional NXPINT_CONTINUATION
* property (the continuation), to the OpenFlow controller.
*
* (The controller must use NXAST_CONTROLLER2 to generate the packet-in,
* because only this form of the "controller" action has a "pause" flag.
* Similarly, the controller must use NXT_SET_PACKET_IN_FORMAT to select
* NXT_PACKET_IN2 as the packet-in format, because this is the only format
* that supports continuation passing.)
*
* 3. The controller receives the NXT_PACKET_IN2 and processes it. The
* controller can interpret and, if desired, modify some of the contents of
* the packet-in, such as the packet and the metadata being processed.
*
* 4. The controller sends the continuation back to the switch, using an
* NXT_RESUME message. Packet processing resumes where it left off.
*
* The controller might change the pipeline configuration concurrently with
* steps 2 through 4. For example, it might add or remove OpenFlow flows. If
* that happens, then the packet will experience a mix of processing from the
* two configurations, that is, the initial processing (before
* NXAST_CONTROLLER2) uses the initial flow table, and the later processing
* (after NXT_RESUME) uses the later flow table. This means that the
* controller needs to take care to avoid incompatible pipeline changes while
* processing continuations.
*
* External side effects (e.g. "output") of OpenFlow actions processed before
* NXAST_CONTROLLER2 is encountered might be executed during step 2 or step 4,
* and the details may vary among Open vSwitch features and versions. Thus, a
* controller that wants to make sure that side effects are executed must pass
* the continuation back to the switch, that is, must not skip step 4.
*
* Architecturally, continuations may be "stateful" or "stateless", that is,
* they may or may not refer to buffered state maintained in Open vSwitch.
* This means that a controller should not attempt to resume a given
* continuations more than once (because the switch might have discarded the
* buffered state after the first use). For the same reason, continuations
* might become "stale" if the controller takes too long to resume them
* (because the switch might have discarded old buffered state). Taken
* together with the previous note, this means that a controller should resume
* each continuation exactly once (and promptly).
*
* Without the information in NXPINT_CONTINUATION, the controller can (with
* careful design, and help from the flow cookie) determine where the packet is
* in the pipeline, but in the general case it can't determine what nested
* "resubmit"s that may be in progress, or what data is on the stack maintained
* by NXAST_STACK_PUSH and NXAST_STACK_POP actions, what is in the OpenFlow
* action set, etc.
*
* Continuations are expensive because they require a round trip between the
* switch and the controller. Thus, they should not be used to implement
* processing that needs to happen at "line rate".
*
* The contents of NXPINT_CONTINUATION are private to the switch, may change
* unpredictably from one version of Open vSwitch to another, and are not
* documented here. The contents are also tied to a given Open vSwitch process
* and bridge, so that restarting Open vSwitch or deleting and recreating a
* bridge will cause the corresponding NXT_RESUME to be rejected.
*
* In the current implementation, Open vSwitch forks the packet processing
* pipeline across patch ports. Suppose, for example, that the pipeline for
* br0 outputs to a patch port whose peer belongs to br1, and that the pipeline
* for br1 executes a controller action with the "pause" flag. This only
* pauses processing within br1, and processing in br0 continues and possibly
* completes with visible side effects, such as outputting to ports, before
* br1's controller receives or processes the continuation. This
* implementation maintains the independence of separate bridges and, since
* processing in br1 cannot affect the behavior of br0 anyway, should not cause
* visible behavioral changes.
*
* A stateless implementation of continuations may ignore the "controller"
* action max_len, always sending the whole packet, because the full packet is
* required to continue traversal.
*/
enum nx_packet_in2_prop_type {
/* Packet. */
NXPINT_PACKET, /* Raw packet data. */
Expand All @@ -255,6 +349,7 @@ enum nx_packet_in2_prop_type {
NXPINT_REASON, /* uint8_t, one of OFPR_*. */
NXPINT_METADATA, /* NXM or OXM for metadata fields. */
NXPINT_USERDATA, /* From NXAST_CONTROLLER2 userdata. */
NXPINT_CONTINUATION, /* Private data for continuing processing. */
};

/* Configures the "role" of the sending controller. The default role is:
Expand Down
3 changes: 1 addition & 2 deletions lib/learning-switch.c
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,6 @@ static void
process_packet_in(struct lswitch *sw, const struct ofp_header *oh)
{
struct ofputil_packet_in pi;
size_t total_len;
uint32_t buffer_id;
uint32_t queue_id;
ofp_port_t out_port;
Expand All @@ -525,7 +524,7 @@ process_packet_in(struct lswitch *sw, const struct ofp_header *oh)
struct dp_packet pkt;
struct flow flow;

error = ofputil_decode_packet_in(oh, &pi, &total_len, &buffer_id);
error = ofputil_decode_packet_in(oh, true, &pi, NULL, &buffer_id, NULL);
if (error) {
VLOG_WARN_RL(&rl, "failed to decode packet-in: %s",
ofperr_to_string(error));
Expand Down
9 changes: 8 additions & 1 deletion lib/meta-flow.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2011, 2012, 2013, 2014, 2015 Nicira, Inc.
* Copyright (c) 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -171,6 +171,13 @@ mf_subvalue_shift(union mf_subvalue *value, int n)
}
}

/* Appends a formatted representation of 'sv' to 's'. */
void
mf_subvalue_format(const union mf_subvalue *sv, struct ds *s)
{
ds_put_hex(s, sv, sizeof *sv);
}

/* Returns true if 'wc' wildcards all the bits in field 'mf', false if 'wc'
* specifies at least one bit in the field.
*
Expand Down
3 changes: 2 additions & 1 deletion lib/meta-flow.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2011, 2012, 2013, 2014, 2015 Nicira, Inc.
* Copyright (c) 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -1940,6 +1940,7 @@ bool mf_subvalue_intersect(const union mf_subvalue *a_value,
union mf_subvalue *dst_mask);
int mf_subvalue_width(const union mf_subvalue *);
void mf_subvalue_shift(union mf_subvalue *, int n);
void mf_subvalue_format(const union mf_subvalue *, struct ds *);

/* An array of fields with values */
struct field_array {
Expand Down
28 changes: 24 additions & 4 deletions lib/ofp-actions.c
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016 Nicira, Inc.
* Copyright (c) 2008-2016 Nicira, Inc.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -628,12 +628,16 @@ struct nx_action_controller {
};
OFP_ASSERT(sizeof(struct nx_action_controller) == 16);

/* Properties for NXAST_CONTROLLER2. */
/* Properties for NXAST_CONTROLLER2.
*
* For more information on the effect of NXAC2PT_PAUSE, see the large comment
* on NXT_PACKET_IN2 in nicira-ext.h */
enum nx_action_controller2_prop_type {
NXAC2PT_MAX_LEN, /* ovs_be16 max bytes to send (default all). */
NXAC2PT_CONTROLLER_ID, /* ovs_be16 dest controller ID (default 0). */
NXAC2PT_REASON, /* uint8_t reason (OFPR_*), default 0. */
NXAC2PT_USERDATA, /* Data to copy into NXPINT_USERDATA. */
NXAC2PT_PAUSE, /* Flag to pause pipeline to resume later. */
};

/* Action structure for NXAST_CONTROLLER2.
Expand Down Expand Up @@ -717,6 +721,10 @@ decode_NXAST_RAW_CONTROLLER2(const struct nx_action_controller2 *nac2,
oc->userdata_len = ofpbuf_msgsize(&payload);
break;

case NXAC2PT_PAUSE:
oc->pause = true;
break;

default:
error = OFPPROP_UNKNOWN(false, "NXAST_RAW_CONTROLLER2", type);
break;
Expand All @@ -737,6 +745,7 @@ encode_CONTROLLER(const struct ofpact_controller *controller,
struct ofpbuf *out)
{
if (controller->userdata_len
|| controller->pause
|| controller->ofpact.raw == NXAST_RAW_CONTROLLER2) {
size_t start_ofs = out->size;
put_NXAST_CONTROLLER2(out);
Expand All @@ -754,6 +763,9 @@ encode_CONTROLLER(const struct ofpact_controller *controller,
ofpprop_put(out, NXAC2PT_USERDATA, controller->userdata,
controller->userdata_len);
}
if (controller->pause) {
ofpprop_put_flag(out, NXAC2PT_PAUSE);
}
pad_ofpat(out, start_ofs);
} else {
struct nx_action_controller *nac;
Expand All @@ -773,6 +785,7 @@ parse_CONTROLLER(char *arg, struct ofpbuf *ofpacts,
uint16_t controller_id = 0;
uint16_t max_len = UINT16_MAX;
const char *userdata = NULL;
bool pause = false;

if (!arg[0]) {
/* Use defaults. */
Expand Down Expand Up @@ -801,14 +814,16 @@ parse_CONTROLLER(char *arg, struct ofpbuf *ofpacts,
}
} else if (!strcmp(name, "userdata")) {
userdata = value;
} else if (!strcmp(name, "pause")) {
pause = true;
} else {
return xasprintf("unknown key \"%s\" parsing controller "
"action", name);
}
}
}

if (reason == OFPR_ACTION && controller_id == 0 && !userdata) {
if (reason == OFPR_ACTION && controller_id == 0 && !userdata && !pause) {
struct ofpact_output *output;

output = ofpact_put_OUTPUT(ofpacts);
Expand All @@ -821,6 +836,7 @@ parse_CONTROLLER(char *arg, struct ofpbuf *ofpacts,
controller->max_len = max_len;
controller->reason = reason;
controller->controller_id = controller_id;
controller->pause = pause;

if (userdata) {
size_t start_ofs = ofpacts->size;
Expand Down Expand Up @@ -853,7 +869,8 @@ format_hex_arg(struct ds *s, const uint8_t *data, size_t len)
static void
format_CONTROLLER(const struct ofpact_controller *a, struct ds *s)
{
if (a->reason == OFPR_ACTION && !a->controller_id && !a->userdata_len) {
if (a->reason == OFPR_ACTION && !a->controller_id && !a->userdata_len
&& !a->pause) {
ds_put_format(s, "CONTROLLER:%"PRIu16, a->max_len);
} else {
enum ofp_packet_in_reason reason = a->reason;
Expand All @@ -877,6 +894,9 @@ format_CONTROLLER(const struct ofpact_controller *a, struct ds *s)
format_hex_arg(s, a->userdata, a->userdata_len);
ds_put_char(s, ',');
}
if (a->pause) {
ds_put_cstr(s, "pause,");
}
ds_chomp(s, ',');
ds_put_char(s, ')');
}
Expand Down
5 changes: 5 additions & 0 deletions lib/ofp-actions.h
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,11 @@ struct ofpact_controller {
uint16_t controller_id; /* Controller ID to send packet-in. */
enum ofp_packet_in_reason reason; /* Reason to put in packet-in. */

/* If true, this action freezes packet traversal of the OpenFlow tables and
* adds a continuation to the packet-in message, that a controller can use
* to resume that traversal. */
bool pause;

/* Arbitrary data to include in the packet-in message (currently, only in
* NXT_PACKET_IN2). */
uint16_t userdata_len;
Expand Down
16 changes: 13 additions & 3 deletions lib/ofp-errors.h
Original file line number Diff line number Diff line change
Expand Up @@ -764,9 +764,19 @@ enum ofperr {
* to be mapped is the same as one assigned to a different field. */
OFPERR_NXTTMFC_DUP_ENTRY,

/* ## ------------------ ## */
/* ## OFPET_EXPERIMENTER ## */
/* ## ------------------ ## */
/* ## ---------- ## */
/* ## NXT_RESUME ## */
/* ## ---------- ## */

/* NX1.0-1.1(1,533), NX1.2+(34). This datapath doesn't support
* NXT_RESUME. */
OFPERR_NXR_NOT_SUPPORTED,

/* NX1.0-1.1(1,534), NX1.2+(35). Continuation is stale: Open vSwitch
* process has been restarted or bridge has been destroyed since
* continuation was generated, or continuation was not generated by this
* Open vSwitch instance. */
OFPERR_NXR_STALE,
};

const char *ofperr_domain_get_name(enum ofp_version);
Expand Down
4 changes: 4 additions & 0 deletions lib/ofp-msgs.h
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,9 @@ enum ofpraw {

/* NXT 1.0+ (26): struct nx_tlv_table_reply, struct nx_tlv_map[]. */
OFPRAW_NXT_TLV_TABLE_REPLY,

/* NXT 1.0+ (28): uint8_t[8][]. */
OFPRAW_NXT_RESUME,
};

/* Decoding messages into OFPRAW_* values. */
Expand Down Expand Up @@ -675,6 +678,7 @@ enum ofptype {
OFPTYPE_NXT_TLV_TABLE_MOD, /* OFPRAW_NXT_TLV_TABLE_MOD. */
OFPTYPE_NXT_TLV_TABLE_REQUEST, /* OFPRAW_NXT_TLV_TABLE_REQUEST. */
OFPTYPE_NXT_TLV_TABLE_REPLY, /* OFPRAW_NXT_TLV_TABLE_REPLY. */
OFPTYPE_NXT_RESUME, /* OFPRAW_NXT_RESUME. */

/* Flow monitor extension. */
OFPTYPE_FLOW_MONITOR_CANCEL, /* OFPRAW_NXT_FLOW_MONITOR_CANCEL. */
Expand Down
Loading

0 comments on commit 77ab5fd

Please sign in to comment.