Skip to content

Commit

Permalink
dpif-netdev: Forwarding optimization for flows with a simple match.
Browse files Browse the repository at this point in the history
There are cases where users might want simple forwarding or drop rules
for all packets received from a specific port, e.g ::

  "in_port=1,actions=2"
  "in_port=2,actions=IN_PORT"
  "in_port=3,vlan_tci=0x1234/0x1fff,actions=drop"
  "in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3"

There are also cases where complex OpenFlow rules can be simplified
down to datapath flows with very simple match criteria.

In theory, for very simple forwarding, OVS doesn't need to parse
packets at all in order to follow these rules.  "Simple match" lookup
optimization is intended to speed up packet forwarding in these cases.

Design:

Due to various implementation constraints userspace datapath has
following flow fields always in exact match (i.e. it's required to
match at least these fields of a packet even if the OF rule doesn't
need that):

  - recirc_id
  - in_port
  - packet_type
  - dl_type
  - vlan_tci (CFI + VID) - in most cases
  - nw_frag - for ip packets

Not all of these fields are related to packet itself.  We already
know the current 'recirc_id' and the 'in_port' before starting the
packet processing.  It also seems safe to assume that we're working
with Ethernet packets.  So, for the simple OF rule we need to match
only on 'dl_type', 'vlan_tci' and 'nw_frag'.

'in_port', 'dl_type', 'nw_frag' and 13 bits of 'vlan_tci' can be
combined in a single 64bit integer (mark) that can be used as a
hash in hash map.  We are using only VID and CFI form the 'vlan_tci',
flows that need to match on PCP will not qualify for the optimization.
Workaround for matching on non-existence of vlan updated to match on
CFI and VID only in order to qualify for the optimization.  CFI is
always set by OVS if vlan is present in a packet, so there is no need
to match on PCP in this case.  'nw_frag' takes 2 bits of PCP inside
the simple match mark.

New per-PMD flow table 'simple_match_table' introduced to store
simple match flows only.  'dp_netdev_flow_add' adds flow to the
usual 'flow_table' and to the 'simple_match_table' if the flow
meets following constraints:

  - 'recirc_id' in flow match is 0.
  - 'packet_type' in flow match is Ethernet.
  - Flow wildcards contains only minimal set of non-wildcarded fields
    (listed above).

If the number of flows for current 'in_port' in a regular 'flow_table'
equals number of flows for current 'in_port' in a 'simple_match_table',
we may use simple match optimization, because all the flows we have
are simple match flows.  This means that we only need to parse
'dl_type', 'vlan_tci' and 'nw_frag' to perform packet matching.
Now we make the unique flow mark from the 'in_port', 'dl_type',
'nw_frag' and 'vlan_tci' and looking for it in the 'simple_match_table'.
On successful lookup we don't need to run full 'miniflow_extract()'.

Unsuccessful lookup technically means that we have no suitable flow
in the datapath and upcall will be required.  So, in this case EMC and
SMC lookups are disabled.  We may optimize this path in the future by
bypassing the dpcls lookup too.

Performance improvement of this solution on a 'simple match' flows
should be comparable with partial HW offloading, because it parses same
packet fields and uses similar flow lookup scheme.
However, unlike partial HW offloading, it works for all port types
including virtual ones.

Performance results when compared to EMC:

Test setup:

             virtio-user   OVS    virtio-user
  Testpmd1  ------------>  pmd1  ------------>  Testpmd2
  (txonly)       x<------  pmd2  <------------ (mac swap)

Single stream of 64byte packets.  Actions:
  in_port=vhost0,actions=vhost1
  in_port=vhost1,actions=vhost0

Stats collected from pmd1 and pmd2, so there are 2 scenarios:
Virt-to-Virt   :     Testpmd1 ------> pmd1 ------> Testpmd2.
Virt-to-NoCopy :     Testpmd2 ------> pmd2 --->x   Testpmd1.
Here the packet sent from pmd2 to Testpmd1 is always dropped, because
the virtqueue is full since Testpmd1 is in txonly mode and doesn't
receive any packets.  This should be closer to the performance of a
VM-to-Phy scenario.

Test performed on machine with Intel Xeon CPU E5-2690 v4 @ 2.60GHz.
Table below represents improvement in throughput when compared to EMC.

 +----------------+------------------------+------------------------+
 |                |    Default (-g -O2)    | "-Ofast -march=native" |
 |   Scenario     +------------+-----------+------------+-----------+
 |                |     GCC    |   Clang   |     GCC    |   Clang   |
 +----------------+------------+-----------+------------+-----------+
 | Virt-to-Virt   |    +18.9%  |   +25.5%  |    +10.8%  |   +16.7%  |
 | Virt-to-NoCopy |    +24.3%  |   +33.7%  |    +14.9%  |   +22.0%  |
 +----------------+------------+-----------+------------+-----------+

For Phy-to-Phy case performance improvement should be even higher, but
it's not the main use-case for this functionality.  Performance
difference for the non-simple flows is within a margin of error.

Acked-by: Sriharsha Basavapatna <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
  • Loading branch information
igsilya committed Jan 7, 2022
1 parent 46d44cf commit e7e9973
Show file tree
Hide file tree
Showing 15 changed files with 386 additions and 83 deletions.
24 changes: 24 additions & 0 deletions Documentation/topics/dpdk/bridge.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,30 @@ using the following command::

$ ovs-vsctl get Interface <iface> statistics

Simple Match Lookup
-------------------

There are cases where users might want simple forwarding or drop rules for all
packets received from a specific port, e.g ::

in_port=1,actions=2
in_port=2,actions=IN_PORT
in_port=3,vlan_tci=0x1234/0x1fff,actions=drop
in_port=4,actions=push_vlan:0x8100,set_field:4196->vlan_vid,output:3

There are also cases where complex OpenFlow rules can be simplified down to
datapath flows with very simple match criteria.

In theory, for very simple forwarding, OVS doesn't need to parse packets at all
in order to follow these rules. In practice, due to various implementation
constraints, userspace datapath has to match at least on a small set of packet
fileds. Some matching criteria (for example, ingress port) are not related to
the packet itself and others (for example, VLAN tag or Ethernet type) can be
extracted without fully parsing the packet. This allows OVS to significantly
speed up packet forwarding for these flows with simple match criteria.
Statistics on the number of packets matched in this way can be found in a
`simple match hits` counter of `ovs-appctl dpif-netdev/pmd-stats-show` command.

EMC Insertion Probability
-------------------------

Expand Down
3 changes: 3 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
Post-v2.16.0
---------------------
- Userspace datapath:
* Optimized flow lookups for datapath flows with simple match criteria.
See 'Simple Match Lookup' in Documentation/topics/dpdk/bridge.rst.
- DPDK:
* EAL argument --socket-mem is no longer configured by default upon
start-up. If dpdk-socket-mem and dpdk-alloc-mem are not specified,
Expand Down
3 changes: 2 additions & 1 deletion lib/dpif-netdev-avx512.c
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,8 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd,
if (mfex_hit) {
pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(&key->mf);
} else {
pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
pkt_meta[i].tcp_flags = parse_tcp_flags(packet,
NULL, NULL, NULL);
}

pkt_meta[i].bytes = dp_packet_size(packet);
Expand Down
41 changes: 23 additions & 18 deletions lib/dpif-netdev-perf.c
Original file line number Diff line number Diff line change
Expand Up @@ -232,10 +232,10 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
uint64_t busy_iter = tot_iter >= idle_iter ? tot_iter - idle_iter : 0;

ds_put_format(str,
" Iterations: %12"PRIu64" (%.2f us/it)\n"
" - Used TSC cycles: %12"PRIu64" (%5.1f %% of total cycles)\n"
" - idle iterations: %12"PRIu64" (%5.1f %% of used cycles)\n"
" - busy iterations: %12"PRIu64" (%5.1f %% of used cycles)\n",
" Iterations: %12"PRIu64" (%.2f us/it)\n"
" - Used TSC cycles: %12"PRIu64" (%5.1f %% of total cycles)\n"
" - idle iterations: %12"PRIu64" (%5.1f %% of used cycles)\n"
" - busy iterations: %12"PRIu64" (%5.1f %% of used cycles)\n",
tot_iter, tot_cycles * us_per_cycle / tot_iter,
tot_cycles, 100.0 * (tot_cycles / duration) / tsc_hz,
idle_iter,
Expand All @@ -244,23 +244,26 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
100.0 * stats[PMD_CYCLES_ITER_BUSY] / tot_cycles);
if (rx_packets > 0) {
ds_put_format(str,
" Rx packets: %12"PRIu64" (%.0f Kpps, %.0f cycles/pkt)\n"
" Datapath passes: %12"PRIu64" (%.2f passes/pkt)\n"
" - PHWOL hits: %12"PRIu64" (%5.1f %%)\n"
" - MFEX Opt hits: %12"PRIu64" (%5.1f %%)\n"
" - EMC hits: %12"PRIu64" (%5.1f %%)\n"
" - SMC hits: %12"PRIu64" (%5.1f %%)\n"
" - Megaflow hits: %12"PRIu64" (%5.1f %%, %.2f "
"subtbl lookups/hit)\n"
" - Upcalls: %12"PRIu64" (%5.1f %%, %.1f us/upcall)\n"
" - Lost upcalls: %12"PRIu64" (%5.1f %%)\n",
" Rx packets: %12"PRIu64" (%.0f Kpps, %.0f cycles/pkt)\n"
" Datapath passes: %12"PRIu64" (%.2f passes/pkt)\n"
" - PHWOL hits: %12"PRIu64" (%5.1f %%)\n"
" - MFEX Opt hits: %12"PRIu64" (%5.1f %%)\n"
" - Simple Match hits:%12"PRIu64" (%5.1f %%)\n"
" - EMC hits: %12"PRIu64" (%5.1f %%)\n"
" - SMC hits: %12"PRIu64" (%5.1f %%)\n"
" - Megaflow hits: %12"PRIu64" (%5.1f %%, %.2f "
"subtbl lookups/hit)\n"
" - Upcalls: %12"PRIu64" (%5.1f %%, %.1f us/upcall)\n"
" - Lost upcalls: %12"PRIu64" (%5.1f %%)\n",
rx_packets, (rx_packets / duration) / 1000,
1.0 * stats[PMD_CYCLES_ITER_BUSY] / rx_packets,
passes, rx_packets ? 1.0 * passes / rx_packets : 0,
stats[PMD_STAT_PHWOL_HIT],
100.0 * stats[PMD_STAT_PHWOL_HIT] / passes,
stats[PMD_STAT_MFEX_OPT_HIT],
100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes,
stats[PMD_STAT_SIMPLE_HIT],
100.0 * stats[PMD_STAT_SIMPLE_HIT] / passes,
stats[PMD_STAT_EXACT_HIT],
100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
stats[PMD_STAT_SMC_HIT],
Expand All @@ -275,16 +278,18 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s,
stats[PMD_STAT_LOST],
100.0 * stats[PMD_STAT_LOST] / passes);
} else {
ds_put_format(str, " Rx packets: %12d\n", 0);
ds_put_format(str,
" Rx packets: %12d\n", 0);
}
if (tx_packets > 0) {
ds_put_format(str,
" Tx packets: %12"PRIu64" (%.0f Kpps)\n"
" Tx batches: %12"PRIu64" (%.2f pkts/batch)\n",
" Tx packets: %12"PRIu64" (%.0f Kpps)\n"
" Tx batches: %12"PRIu64" (%.2f pkts/batch)\n",
tx_packets, (tx_packets / duration) / 1000,
tx_batches, 1.0 * tx_packets / tx_batches);
} else {
ds_put_format(str, " Tx packets: %12d\n\n", 0);
ds_put_format(str,
" Tx packets: %12d\n\n", 0);
}
}

Expand Down
1 change: 1 addition & 0 deletions lib/dpif-netdev-perf.h
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ extern "C" {
enum pmd_stat_type {
PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). */
PMD_STAT_MFEX_OPT_HIT, /* Packets that had miniflow optimized match. */
PMD_STAT_SIMPLE_HIT, /* Packets that had a simple match hit. */
PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
PMD_STAT_SMC_HIT, /* Packets that had a sig match hit (SMC). */
PMD_STAT_MASKED_HIT, /* Packets that matched in the flow table. */
Expand Down
5 changes: 4 additions & 1 deletion lib/dpif-netdev-private-flow.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ struct dp_netdev_flow {
/* Hash table index by unmasked flow. */
const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */
/* 'flow_table'. */
const struct cmap_node simple_match_node; /* In dp_netdev_pmd_thread's
'simple_match_table'. */
const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */
const ovs_u128 ufid; /* Unique flow identifier. */
const ovs_u128 mega_ufid; /* Unique mega flow identifier. */
Expand All @@ -100,7 +102,8 @@ struct dp_netdev_flow {
struct ovs_refcount ref_cnt;

bool dead;
uint32_t mark; /* Unique flow mark assigned to a flow */
uint32_t mark; /* Unique flow mark for netdev offloading. */
uint64_t simple_match_mark; /* Unique flow mark for the simple match. */

/* Statistics. */
struct dp_netdev_flow_stats stats;
Expand Down
13 changes: 10 additions & 3 deletions lib/dpif-netdev-private-thread.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include <stdbool.h>
#include <stdint.h>

#include "ccmap.h"
#include "cmap.h"

#include "dpif-netdev-private-dfc.h"
Expand Down Expand Up @@ -86,12 +87,18 @@ struct dp_netdev_pmd_thread {

/* Flow-Table and classifiers
*
* Writers of 'flow_table' must take the 'flow_mutex'. Corresponding
* changes to 'classifiers' must be made while still holding the
* 'flow_mutex'.
* Writers of 'flow_table'/'simple_match_table' and their n* ccmap's must
* take the 'flow_mutex'. Corresponding changes to 'classifiers' must be
* made while still holding the 'flow_mutex'.
*/
struct ovs_mutex flow_mutex;
struct cmap flow_table OVS_GUARDED; /* Flow table. */
struct cmap simple_match_table OVS_GUARDED; /* Flow table with simple
match flows only. */
/* Number of flows in the 'flow_table' per in_port. */
struct ccmap n_flows OVS_GUARDED;
/* Number of flows in the 'simple_match_table' per in_port. */
struct ccmap n_simple_flows OVS_GUARDED;

/* One classifier per in_port polled by the pmd */
struct cmap classifiers;
Expand Down
12 changes: 7 additions & 5 deletions lib/dpif-netdev-unixctl.man
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,11 @@ Shows performance statistics for one or all pmd threads of the datapath
\fIdp\fR. The special thread "main" sums up the statistics of every non pmd
thread.

The sum of "emc hits", "smc hits", "megaflow hits" and "miss" is the number of
packet lookups performed by the datapath. Beware that a recirculated packet
experiences one additional lookup per recirculation, so there may be
more lookups than forwarded packets in the datapath.
The sum of "phwol hits", "simple match hits", "emc hits", "smc hits",
"megaflow hits" and "miss" is the number of packet lookups performed by the
datapath. Beware that a recirculated packet experiences one additional lookup
per recirculation, so there may be more lookups than forwarded packets in the
datapath.

The MFEX Opt hits displays the number of packets that are processed by the
optimized miniflow extract implementations.
Expand Down Expand Up @@ -140,8 +141,9 @@ pmd thread numa_id 0 core_id 1:
Datapath passes: 3599415 (1.50 passes/pkt)
- PHWOL hits: 0 ( 0.0 %)
- MFEX Opt hits: 3570133 ( 99.2 %)
- Simple Match hits: 0 ( 0.0 %)
- EMC hits: 336472 ( 9.3 %)
- SMC hits: 0 ( 0.0 %)
- SMC hits: 0 ( 0.0 %)
- Megaflow hits: 3262943 ( 90.7 %, 1.00 subtbl lookups/hit)
- Upcalls: 0 ( 0.0 %, 0.0 us/upcall)
- Lost upcalls: 0 ( 0.0 %)
Expand Down
Loading

0 comments on commit e7e9973

Please sign in to comment.