Open vSwitch is often used to implement a firewall. The preferred
way to implement a firewall is ``connection tracking,'' that is,
to keep track of the connection state of individual TCP sessions.
The ct
action described in this section, added in Open vSwitch
2.5, implements connection tracking. For new deployments, it is
the recommended way to implement firewalling with Open vSwitch.
Before ct
was added, Open vSwitch did not have built-in support
for connection tracking. Instead, Open vSwitch supported the
learn
action, which allows a received packet to add a flow to an
OpenFlow flow table. This could be used to implement a primitive
form of connection tracking: packets passing through the firewall
in one direction could create flows that allowed response packets
back through the firewall in the other direction. The additional
fin_timeout
action allowed the learned flows to expire quickly
after TCP session termination.
The ct action
Syntax:
ct(
[argument]...)
ct(commit
[,
argument]...)
The action has two modes of operation, distinguished by whether
commit
is present. The following arguments may be present in
either mode:
zone=
value
A zone is a 16-bit id that isolates connections
into separate domains, allowing overlapping network
addresses in different zones. If a zone is not
provided, then the default is 0. The value may be
specified either as a 16-bit integer literal or a
field or subfield in the syntax described under
``Field Specifications'' above.
Without commit
, this action sends the packet through the
connection tracker. The connection tracker keeps track of the
state of TCP connections for packets passed through it. For each
packet through a connection, it checks that it satisfies TCP
invariants and signals the connection state to later actions
using the ct_state
metadata field, which is documented in
ovs-fields(7).
In this form, ct
forks the OpenFlow pipeline:
• In one fork, ct
passes the packet to the connection
tracker. Afterward, it reinjects the packet into
the OpenFlow pipeline with the connection tracking
fields initialized. The ct_state
field is
initialized with connection state and ct_zone
to
the connection tracking zone specified on the zone
argument. If the connection is one that is already
tracked, ct_mark
and ct_label
to its existing mark
and label, respectively; otherwise they are zeroed.
In addition, ct_nw_proto
, ct_nw_src
, ct_nw_dst
,
ct_ipv6_src
, ct_ipv6_dst
, ct_tp_src
, and ct_tp_dst
are initialized appropriately for the original
direction connection. See the resubmit
action for a
way to search the flow table with the connection
tracking original direction fields swapped with the
packet 5-tuple fields. See ovs-fields(7) for
details on the connection tracking fields.
• In the other fork, the original instance of the
packet continues independent processing following
the ct
action. The ct_state
field and other
connection tracking metadata are cleared.
Without commit
, the ct
action accepts the following arguments:
table=
table
Sets the OpenFlow table where the packet is
reinjected. The table must be a number between 0
and 254 inclusive, or a table's name. If table is
not specified, then the packet is not reinjected.
nat
nat(
type=
addrs[:
ports][,
flag]...)
Specify address and port translation for the
connection being tracked. The type must be src
, for
source address/port translation (SNAT), or dst
, for
destination address/port translation (DNAT). Setting
up address translation for a new connection takes
effect only if the connection is later committed with
ct(commit
...)
.
The src
and dst
options take the following arguments:
addrs The IP address addr or range addr1-
addr2 from
which the translated address should be
selected. If only one address is given, then
that address will always be selected,
otherwise the address selection can be
informed by the optional persistent flag as
described below. Either IPv4 or IPv6 addresses
can be provided, but both addresses must be of
the same type, and the datapath behavior is
undefined in case of providing IPv4 address
range for an IPv6 packet, or IPv6 address
range for an IPv4 packet. IPv6 addresses must
be bracketed with [
and ]
if a port range is
also given.
ports The L4 port or range port1-
port2 from which
the translated port should be selected. When a
port range is specified, fallback to ephemeral
ports does not happen, else, it will. The port
number selection can be informed by the
optional random
and hash
flags described
below. The userspace datapath only supports
the hash
behavior.
The optional flags are:
random
The selection of the port from the given range
should be done using a fresh random number.
This flag is mutually exclusive with hash
.
hash
The selection of the port from the given range
should be done using a datapath specific hash
of the packet's IP addresses and the other,
non-mapped port number. This flag is mutually
exclusive with random
.
persistent
The selection of the IP address from the given
range should be done so that the same mapping
can be provided after the system restarts.
If alg
is specified for the committing ct
action that
also includes nat
with a src
or dst
attribute, then
the datapath tries to set up the helper to be NAT-
aware. This functionality is datapath specific and
may not be supported by all datapaths.
A ``bare'' nat
argument with no options will only
translate the packet being processed in the way the
connection has been set up with an earlier, committed
ct
action. A nat
action with src
or dst
, when applied
to a packet belonging to an established (rather than
new) connection, will behave the same as a bare nat
.
For SNAT, there is a special case when the src
IP
address is configured as all 0's, i.e.,
nat(src=0.0.0.0)
. In this case, when a source port
collision is detected during the commit, the source
port will be translated to an ephemeral port. If
there is no collision, no SNAT is performed.
Open vSwitch 2.6 introduced nat
. Linux 4.6 was the
earliest upstream kernel that implemented ct
support
for nat
.
With commit
, the connection tracker commits the connection to the
connection tracking module. The commit
flag should only be used
from the pipeline within the first fork of ct
without commit
.
Information about the connection is stored beyond the lifetime of
the packet in the pipeline. Some ct_state
flags are only
available for committed connections.
The following options are available only with commit
:
force
A committed connection always has the
directionality of the packet that caused the
connection to be committed in the first place. This
is the ``original direction'' of the connection,
and the opposite direction is the ``reply
direction''. If a connection is already committed,
but it is in the wrong direction, force
effectively
terminates the existing connection and starts a new
one in the current direction. This flag has no
effect if the original direction of the connection
is already the same as that of the current packet.
exec(
action...)
Perform each action within the context of
connection tracking. Only actions which modify the
ct_mark
or ct_label
fields are accepted within exec
action, and these fields may only be modified with
this option. For example:
set_field:
value[/
mask]->ct_mark
Store a 32-bit metadata value with the
connection. Subsequent lookups for packets
in this connection will populate ct_mark
when the packet is sent to the connection
tracker with the table specified.
set_field:
value[/
mask]->ct_label
Store a 128-bit metadata value with the
connection. Subsequent lookups for packets
in this connection will populate ct_label
when the packet is sent to the connection
tracker with the table specified.
alg=
alg
Specify application layer gateway alg to track
specific connection types. If subsequent related
connections are sent through the ct
action, then
the rel
flag in the ct_state
field will be set.
Supported types include:
ftp
Look for negotiation of FTP data
connections. Specify this option for FTP
control connections to detect related data
connections and populate the rel
flag for
the data connections.
tftp
Look for negotiation of TFTP data
connections. Specify this option for TFTP
control connections to detect related data
connections and populate the rel
flag for
the data connections.
Related connections inherit ct_mark
from that
stored with the original connection (i.e. the
connection created by ct(alg=
...)
).
With the Linux datapath, global sysctl options affect ct
behavior. In particular, if net.netfilter.nf_conntrack_helper
is
enabled, which it is by default until Linux 4.7, then application
layer gateway helpers may be executed even if alg
is not
specified. For security reasons, the netfilter team recommends
users disable this option. For further details, please see
⟨http://www.netfilter.org/news.html#2012-04-03⟩ .
The ct
action may be used as a primitive to construct stateful
firewalls by selectively committing some traffic, then matching
ct_state
to allow established connections while denying new
connections. The following flows provide an example of how to
implement a simple firewall that allows new connections from port
1 to port 2, and only allows established connections to send
traffic from port 2 to port 1:
table=0,priority=1,action=drop
table=0,priority=10,arp,action=normal
table=0,priority=100,ip,ct_state=-trk,action=ct(table=1)
table=1,in_port=1,ip,ct_state=+trk+new,action=ct(commit),2
table=1,in_port=1,ip,ct_state=+trk+est,action=2
table=1,in_port=2,ip,ct_state=+trk+new,action=drop
table=1,in_port=2,ip,ct_state=+trk+est,action=1
If ct
is executed on IPv4 (or IPv6) fragments, then the message
is implicitly reassembled before sending to the connection
tracker and refragmented upon output, to the original maximum
received fragment size. Reassembly occurs within the context of
the zone, meaning that IP fragments in different zones are not
assembled together. Pipeline processing for the initial fragments
is halted. When the final fragment is received, the message is
assembled and pipeline processing continues for that flow. Packet
ordering is not guaranteed by IP protocols, so it is not possible
to determine which IP fragment will cause message reassembly (and
therefore continue pipeline processing). As such, it is strongly
recommended that multiple flows should not execute ct
to
reassemble fragments from the same IP message.
Conformance:
The ct
action was introduced in Open vSwitch 2.5. Some of its
features were introduced later, noted individually above.
The ct_clear action
Syntax:
ct_clear
Clears connection tracking state from the flow, zeroing ct_state
,
ct_zone
, ct_mark
, and ct_label
.
This action was introduced in Open vSwitch 2.6.90.
The learn action
Syntax:
learn(
argument...)
The learn
action adds or modifies a flow in an OpenFlow table,
similar to ovs-ofctl --strict mod-flows
. The arguments specify
the match fields, actions, and other properties of the flow to be
added or modified.
Match fields for the new flow are specified as follows. At least
one match field should ordinarily be specified:
field=
value
Specifies that field, in the new flow, must match
the literal value, e.g. dl_type=0x800
. Shorthand
match syntax, such as ip
in place of dl_type=0x800
,
is not supported.
field=
src
Specifies that field in the new flow must match src
taken from the packet currently being processed.
For example, udp_dst=udp_src
, applied to a UDP
packet with source port 53, creates a flow which
matches udp_dst=53
. field and src must have the
same width.
field Shorthand for the previous form when field and src
are the same. For example, udp_dst
, applied to a
UDP packet with destination port 53, creates a flow
which matches udp_dst=53
.
The field and src arguments above should be fields or subfields
in the syntax described under ``Field Specifications'' above.
Match field specifications must honor prerequisites for both the
flow with the learn
and the new flow that it creates. Consider
the following complete flow, in the syntax accepted by ovs-ofctl
.
If the flow's match on udp
were omitted, then the flow would not
satisfy the prerequisites for the learn
action's use of udp_src
.
If dl_type=0x800
or nw_proto
were omitted from learn
, then the
new flow would not satisfy the prerequisite for its match on
udp_dst
. For more information on prerequisites, please refer to
ovs-fields(7):
udp, actions=learn(dl_type=0x800, nw_proto=17, udp_dst=udp_src)
Actions for the new flow are specified as follows. At least one
action should ordinarily be specified:
load:
value->
dst
Adds a load
action to the new flow that loads the
literal value into dst. The syntax is the same as
the load
action explained in the ``Header
Modification'' section.
load:
src->
dst
Adds a load
action to the new flow that loads src,
a field or subfield from the packet being
processed, into dst.
output:
field
Adds an output
action to the new flow's actions
that outputs to the OpenFlow port taken from field,
which must be a field as described above.
fin_idle_timeout=
seconds
fin_hard_timeout=
seconds
Adds a fin_timeout
action with the specified
arguments to the new flow. This feature was added in
Open vSwitch 1.5.90.
The following additional arguments are optional:
idle_timeout=
seconds
hard_timeout=
seconds
priority=
value
cookie=
value
send_flow_rem
These arguments have the same meaning as in the usual
flow syntax documented in ovs-ofctl(8).
table=
table
The table in which the new flow should be inserted.
Specify a decimal number between 0 and 254 inclusive
or the name of a table. The default, if table is
unspecified, is table 1 (not 0).
delete_learned
When this flag is specified, deleting the flow that
contains the learn
action will also delete the flows
created by learn
. Specifically, when the last learn
action with this flag and particular table
and cookie
values is removed, the switch deletes all of the
flows in the specified table with the specified
cookie.
This flag was added in Open vSwitch 2.4.
limit=
number
If the number of flows in the new flow's table with
the same cookie exceeds number
, the action will not
add a new flow. By default, or with limit=0
, there is
no limit.
This flag was added in Open vSwitch 2.8.
result_dst=
field[
bit]
If learn fails (because the number of flows exceeds
limit
), the action sets field[
bit]
to 0, otherwise it
will be set to 1. field[bit]
must be a single bit.
This flag was added in Open vSwitch 2.8.
By itself, the learn
action can only put two kinds of actions
into the flows that it creates: load
and output
actions. If learn
is used in isolation, these are severe limits.
However, learn
is not meant to be used in isolation. It is a
primitive meant to be used together with other Open vSwitch
features to accomplish a task. Its existing features are enough
to accomplish most tasks.
Here is an outline of a typical pipeline structure that allows
for versatile behavior using learn
:
• Flows in table A contain a learn
action, that
populates flows in table L, that use a load
action
to populate register R with information about what
was learned.
• Flows in table B contain two sequential resubmit
actions: one to table L and another one to table
B+1.
• Flows in table B+1 match on register R and act
differently depending on what the flows in table L
loaded into it.
This approach can be used to implement many learn
-based features.
For example:
• Resubmit to a table selected based on learned
information, e.g. see ⟨https://
mail.openvswitch.org/pipermail/ovs-discuss/
2016-June/021694.html⟩ .
• MAC learning in the middle of a pipeline, as
described in the ``Open vSwitch Advanced Features
Tutorial'' in the OVS documentation.
• TCP state based firewalling, by learning outgoing
connections based on SYN packets and matching them
up with incoming packets. (This is usually better
implemented using the ct
action.)
• At least some of the features described in T. A.
Hoff, ``Extending Open vSwitch to Facilitate
Creation of Stateful SDN Applications''.
Conformance:
The learn
action is an Open vSwitch extension to OpenFlow added
in Open vSwitch 1.3. Some features of learn
were added in later
versions, as noted individually above.
The fin_timeout action
Syntax:
fin_timeout(
key=
value...)
This action changes the idle timeout or hard timeout, or both, of
the OpenFlow flow that contains it, when the flow matches a TCP
packet with the FIN or RST flag. When such a packet is observed,
the action reduces the rule's timeouts to those specified on the
action. If the rule's existing timeout is already shorter than
the one that the action specifies, then that timeout is
unaffected.
The timeouts are specified as key-value pairs:
idle_timeout=
seconds
Causes the flow to expire after the given number of
seconds of inactivity.
hard_timeout=
seconds
Causes the flow to expire after the given number of
seconds, regardless of activity. (seconds specifies
time since the flow's creation, not since the
receipt of the FIN or RST.)
This action is normally added to a learned flow by the learn
action. It is unlikely to be useful otherwise.
Conformance:
This Open vSwitch extension action was added in Open vSwitch
1.5.90.