Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IPv6 multicast support. #1705

Merged
merged 4 commits into from
Jan 13, 2021
Merged

Conversation

dceara
Copy link
Collaborator

@dceara dceara commented Sep 15, 2020

Enable MLD Snoop and MLD relay to allow multicast connectivity across
nodes. Extend the IPv4 multicast network policies to make them
applicable to IPv6.

Signed-off-by: Dumitru Ceara [email protected]

@dceara dceara force-pushed the mld_support branch 3 times, most recently from b683fdc to 3893e48 Compare September 16, 2020 07:34
@dceara dceara changed the title [WIP] Add IPv6 multicast support. Add IPv6 multicast support. Sep 16, 2020
@dceara
Copy link
Collaborator Author

dceara commented Sep 17, 2020

For this to work properly the following commit must be included in the OVN package being used:
ovn-org/ovn@520189b

@aojea
Copy link
Contributor

aojea commented Sep 30, 2020

@dceara give me some days and I will add the e2e tests for multicast so we can verify it works and that we do not regress
kubernetes/kubernetes#95198

@aojea aojea mentioned this pull request Oct 1, 2020
@aojea
Copy link
Contributor

aojea commented Oct 2, 2020

@dceara give me some days and I will add the e2e tests for multicast so we can verify it works and that we do not regress
kubernetes/kubernetes#95198

You have it here, feel free to ammend or reuse it #1741

@dceara
Copy link
Collaborator Author

dceara commented Oct 2, 2020

@dceara give me some days and I will add the e2e tests for multicast so we can verify it works and that we do not regress
kubernetes/kubernetes#95198

You have it here, feel free to ammend or reuse it #1741

Many thanks!

@trozet
Copy link
Contributor

trozet commented Oct 22, 2020

looks ok to me @dceara Can you rebase? Also could we add e2e test cases to actually exercise sending ipv4 and ipv6 multicast traffic?

@trozet
Copy link
Contributor

trozet commented Oct 23, 2020

@nerdalert maybe you could help here with the multicast e2e tests?

@aojea
Copy link
Contributor

aojea commented Oct 23, 2020

@nerdalert maybe you could help here with the multicast e2e tests?

you just need to ammend the #1741 here, it works as is

@dceara dceara changed the title Add IPv6 multicast support. [WIP] Add IPv6 multicast support. Nov 27, 2020
@coveralls
Copy link

coveralls commented Nov 27, 2020

Coverage Status

Coverage decreased (-0.04%) to 54.208% when pulling e6f2d17 on dceara:mld_support into c97eff0 on ovn-org:master.

@dceara dceara force-pushed the mld_support branch 3 times, most recently from cf75d6e to ec3f5bb Compare November 27, 2020 15:26
test/e2e/go.sum Outdated Show resolved Hide resolved
@dceara dceara changed the title [WIP] Add IPv6 multicast support. Add IPv6 multicast support. Nov 27, 2020
@dceara
Copy link
Collaborator Author

dceara commented Nov 27, 2020

@trozet I had to partially revert the changes @vishnoianil did because they were breaking multicast between pods that are on different nodes. It turns out we still need the global multicast deny port group. But it's also a good idea to keep the cluster port group. More details here:
94d54f4

Also, I cherry picked the commits from @aojea that introduce the e2e multicast tests. However, we had to disable them for IPv6 due to the fact that agnhost iperf version needs to be updated, otherwise it doesn't support running iperf in server mode for IPv6 multicast.

@vishnoianil
Copy link
Collaborator

@trozet I had to partially revert the changes @vishnoianil did because they were breaking multicast between pods that are on different nodes. It turns out we still need the global multicast deny port group. But it's also a good idea to keep the cluster port group. More details here:
94d54f4

Also, I cherry picked the commits from @aojea that introduce the e2e multicast tests. However, we had to disable them for IPv6 due to the fact that agnhost iperf version needs to be updated, otherwise it doesn't support running iperf in server mode for IPv6 multicast.

@dceara global multicast deny port group will container all the logical ports or selected logical ports where we want to enable multicast ? If it's the first, than it seems ClusterPortGroup is a subset of mcastPortGroupDeny ? If that's the case than wondering keeping both the port group makes any sense (given that these port groups are not small in size and has transactional/replication implications). I think the answer is hidden somewhere in the "flow reinstallation" comment in the patch, but i didn't get that fully.

@dceara
Copy link
Collaborator Author

dceara commented Dec 1, 2020

@trozet I had to partially revert the changes @vishnoianil did because they were breaking multicast between pods that are on different nodes. It turns out we still need the global multicast deny port group. But it's also a good idea to keep the cluster port group. More details here:
94d54f4
Also, I cherry picked the commits from @aojea that introduce the e2e multicast tests. However, we had to disable them for IPv6 due to the fact that agnhost iperf version needs to be updated, otherwise it doesn't support running iperf in server mode for IPv6 multicast.

@dceara global multicast deny port group will container all the logical ports or selected logical ports where we want to enable multicast ? If it's the first, than it seems ClusterPortGroup is a subset of mcastPortGroupDeny ? If that's the case than wondering keeping both the port group makes any sense (given that these port groups are not small in size and has transactional/replication implications). I think the answer is hidden somewhere in the "flow reinstallation" comment in the patch, but i didn't get that fully.

The global multicast deny port group will contain all POD logical ports. But it will only have 2 ACLs defined on it (to drop ingress/egress multicast).

The ClusterPortGroup only contains the mgmt logical ports so it's a subset of mcastPortGroupDeny. The difference however is that on the ClusterPortGroup we can have quite a few ACLs.

When ports are added/removed from a port group ovn-controller has to reprocess all flows referring that port group. So given that the mcastPortGroupDeny has a fixed number of ACLs it's better to keep it as a separate port group to avoid ovn-controller reprocessing all ACLs defined on the ClusterPortGroup if we'd add all ports to the ClusterPortGroup.

@vishnoianil
Copy link
Collaborator

@trozet I had to partially revert the changes @vishnoianil did because they were breaking multicast between pods that are on different nodes. It turns out we still need the global multicast deny port group. But it's also a good idea to keep the cluster port group. More details here:
94d54f4
Also, I cherry picked the commits from @aojea that introduce the e2e multicast tests. However, we had to disable them for IPv6 due to the fact that agnhost iperf version needs to be updated, otherwise it doesn't support running iperf in server mode for IPv6 multicast.

@dceara global multicast deny port group will container all the logical ports or selected logical ports where we want to enable multicast ? If it's the first, than it seems ClusterPortGroup is a subset of mcastPortGroupDeny ? If that's the case than wondering keeping both the port group makes any sense (given that these port groups are not small in size and has transactional/replication implications). I think the answer is hidden somewhere in the "flow reinstallation" comment in the patch, but i didn't get that fully.

The global multicast deny port group will contain all POD logical ports. But it will only have 2 ACLs defined on it (to drop ingress/egress multicast).

The ClusterPortGroup only contains the mgmt logical ports so it's a subset of mcastPortGroupDeny. The difference however is that on the ClusterPortGroup we can have quite a few ACLs.

When ports are added/removed from a port group ovn-controller has to reprocess all flows referring that port group. So given that the mcastPortGroupDeny has a fixed number of ACLs it's better to keep it as a separate port group to avoid ovn-controller reprocessing all ACLs defined on the ClusterPortGroup if we'd add all ports to the ClusterPortGroup.

Thanks for the clarification @dceara. From our older discussion, i believe as far as acl match doesn't have inport as a match, port-group with management port and port-group with all logical port behave same. But seems like in this case, the multicast deny acl is matching on inport == @mcastPortGroupDeny and that's the reason it need to be refreshed and hence we need to use the port-group with all the logical-port in it. Is that a right understanding?

The only concern is that with this patch the nbdb memory usage will jump from 500MB to 5G (for the current scale we are testing), but not sure if there is any better solution here.

@dceara
Copy link
Collaborator Author

dceara commented Dec 2, 2020

Thanks for the clarification @dceara. From our older discussion, i believe as far as acl match doesn't have inport as a match, port-group with management port and port-group with all logical port behave same. But seems like in this case, the multicast deny acl is matching on inport == @mcastPortGroupDeny and that's the reason it need to be refreshed and hence we need to use the port-group with all the logical-port in it. Is that a right understanding?

Unlike the reject ACLs that are applied on the cluster group, yes, the multicast deny ACLs must also match on inport/outport= because otherwise they will always drop multicast relay (routed) traffic when going to the cluster router.

The only concern is that with this patch the nbdb memory usage will jump from 500MB to 5G (for the current scale we are testing), but not sure if there is any better solution here.

I think there's some confusion here. The mcastPortGroupDeny port group was not the reason there was high memory usage before. The multicast port group just stores all pod ports in the cluster and always has exactly 2 ACLs.

If the DB is compacted regularly and that memory is reclaimed by the system (i.e., ovsdb-server/memory-trim-on-compaction is set to on) this should add up to a database file of a few MB for 10K pods. For example, on my machine I create a PG with 10K logical ports and made sure that the DB is compacted at the end of the test:

export OVN_NB_DAEMON=$(ovn-nbctl --detach)
ovn-nbctl ls-add ls
ovn-nbctl pg-add pg
for ((i=1; i < 10000; i++)); do
  ovn-nbctl -- --id=@id create logical_switch_port name=lsp$i \
    -- add logical_switch ls ports @id \
    -- add port_group pg ports @id
done
ovn-appctl -t ovnnb_db ovsdb-server/compact
ls -lh /etc/ovn/ovnnb_db.db                                              
-rw-r-----. 1 root root 1.6M Dec  3 00:20 /etc/ovn/ovnnb_db.db

Throughout the duration of the test above the NB ovsdb-server memory usage never goes above 100MB.

@vishnoianil
Copy link
Collaborator

Thanks for the clarification @dceara. From our older discussion, i believe as far as acl match doesn't have inport as a match, port-group with management port and port-group with all logical port behave same. But seems like in this case, the multicast deny acl is matching on inport == @mcastPortGroupDeny and that's the reason it need to be refreshed and hence we need to use the port-group with all the logical-port in it. Is that a right understanding?

Unlike the reject ACLs that are applied on the cluster group, yes, the multicast deny ACLs must also match on inport/outport= because otherwise they will always drop multicast relay (routed) traffic when going to the cluster router.

The only concern is that with this patch the nbdb memory usage will jump from 500MB to 5G (for the current scale we are testing), but not sure if there is any better solution here.

I think there's some confusion here. The mcastPortGroupDeny port group was not the reason there was high memory usage before. The multicast port group just stores all pod ports in the cluster and always has exactly 2 ACLs.

Yeah, not exactly mcastPortGroupDeny, but this is what i have seen. The initial problem with memory growth (around 14-20 G range), we were

  1. Creating 1980 ACLs (we still do).
  2. Each logical switch (100 in our scale), were referring to each of these 1980 ACLs.

So to improve it,

  1. I created cluster port group and added all the logical ports (~16000) from all the logical switches to it and
  2. 1980 ACLs were referenced from cluster-port-group.

That reduced the the memory usage from 14-20G to 5-6G range. In case of multicastPortGroupDeny case, you will have (1), but (2) it's just 2 entries.

At this point i ran the db compaction, and size was pretty small, but nb-db memory usage didn't come down from 5G. I was not aware of memory-trim-on-compaction flag.

To further improve it, i removed the logical ports from cluster-port-group and just added management ports, because that's all was needed in that case. And after that, it had

  1. cluster-port-group with 100 entries
  2. 1980 ACLs were referenced from this port -group.

This reduced the memory usage from 5G to 500MB. Did ran the db compaction, and db size came down, but nb-db memory usage didn't come down.
So i was just deducing from the scenario where port-group was containing 16K entries and it was referencing to 1980 ACLs (as compared to 2 in multicast case), and nb-db was eating 5-6G memory.

If the DB is compacted regularly and that memory is reclaimed by the system (i.e., ovsdb-server/memory-trim-on-compaction is set to on) this should add up to a database file of a few MB for 10K pods. For example, on my machine I create a PG with 10K logical ports and made sure that the DB is compacted at the end of the test:

export OVN_NB_DAEMON=$(ovn-nbctl --detach)
ovn-nbctl ls-add ls
ovn-nbctl pg-add pg
for ((i=1; i < 10000; i++)); do
  ovn-nbctl -- --id=@id create logical_switch_port name=lsp$i \
    -- add logical_switch ls ports @id \
    -- add port_group pg ports @id
done
ovn-appctl -t ovnnb_db ovsdb-server/compact
ls -lh /etc/ovn/ovnnb_db.db                                              
-rw-r-----. 1 root root 1.6M Dec  3 00:20 /etc/ovn/ovnnb_db.db

Throughout the duration of the test above the NB ovsdb-server memory usage never goes above 100MB.

Yeah this memory usage is comforting :).

@vishnoianil
Copy link
Collaborator

Added minor comment.

LGTM

@dceara dceara force-pushed the mld_support branch 4 times, most recently from b7cdcee to 86ef5ee Compare December 4, 2020 14:34
@dceara
Copy link
Collaborator Author

dceara commented Dec 4, 2020

/retest

1 similar comment
@dceara
Copy link
Collaborator Author

dceara commented Dec 8, 2020

/retest

@dceara
Copy link
Collaborator Author

dceara commented Dec 8, 2020

/retest

@@ -371,6 +371,19 @@ func (oc *Controller) SetupMaster(masterNodeName string) error {
return err
}

// Create a cluter-wide port group to deny all multicast traffic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is probably more efficient to just add a multicast_allow_router portgroup+acl with the ovn_cluster_router ports in it. Then we don't need a a port group with all the ports in it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. I'll have to try it out first though. There's no restriction in OVN wrt adding ls ports connected to a router to a port group but I don't think I ever saw this being used in practice so I'm not sure if there are caveats.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trozet It works fine, thanks for the suggestion. I just pushed an updated version.

dceara and others added 4 commits December 21, 2020 13:30
Commit 40a90f0 removed the multicast deny port group and used
instead the cluster port group.  However, this breaks pod-to-pod
multicast when pods reside on different nodes.  That is because OVN ACLs
are applied on all logical switch ports, including logical switch ports
connected to router ports.

Hence, an ACL of the form "if ip.mcast then deny" applied on the
clusterPortGroup will drop all multicast traffic that would normally be
routed by the cluster router even when multicast is allowed for a
namespace.

Instead, add a new (smaller) cluster wide group that only contains the
node logical switch ports connected to the cluster router.  This allows us
to define two allow ACLs for multicast traffic from/to node switches to/from
cluster router, therefore not breaking the namespace multicast allow policy
if pods reside on different nodes.

Fixes: 40a90f0 ("Migrate default deny multicast policy to port-group")
Signed-off-by: Dumitru Ceara <[email protected]>
Enable MLD Snoop and MLD relay to allow multicast connectivity across
nodes. Extend the IPv4 multicast network policies to make them
applicable to IPv6.

Signed-off-by: Dumitru Ceara <[email protected]>
Create a multicast source in one node and 2 listener in another
node.
One of the listener should join the group and receive the multicast
traffic, and the other listener should try to join another group
that does not exist and fail to receive any multicast traffic.

Organize the e2e folder correctly to avoid issues with go modules and
imports.

Disable IPv6 multicast tests for now because iperf in agnhost images
doesn't support binding a server to an IPv6 multicast group:

  bash-4.3# iperf -s -B ff00:0:3:3::3 -u -t 30 -i 5 -V
  error: Try again

Co-authored-by: Dumitru Ceara <[email protected]>
Signed-off-by: Antonio Ojea <[email protected]>
Signed-off-by: Dumitru Ceara <[email protected]>
@@ -471,6 +483,31 @@ func (oc *Controller) SetupMaster(masterNodeName string) error {
return nil
}

func addNodeLogicalSwitchPort(logicalSwitch, portName, portType, addresses, options string) (string, error) {
stdout, stderr, err := util.RunOVNNbctl("--", "--may-exist", "lsp-add", logicalSwitch, portName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use go-ovn bindings since we have API support for that now. Look for similar code in addLogicalPort in pods.go

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Yes, ideally we should use go-ovn bindings. The problem however is that we need to retrieve the UUID for logical switch ports connected to the cluster router. go-ovn doesn't allow users to explicitly create such ports because it doesn't expose any API to set the port type to "router", e.g., LSPSetType. Instead users should create the router ports with LinkSwitchToRouter().

That means we have three options:

  1. keep the current approach of using ovn-nbctl to create the node switches/routers, mgmt port, switch ports connected to the routers. This is what this PR does.
  2. refactor most of master.go to use go-ovn bindings and especially LinkSwitchToRouter().
  3. add support for LSPSetType in the go-ovn bindings.

I'd personally go for option 1 for now and choose between 2 and 3 later as they're out of the scope of multicast support. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. Let's do option 1 for now.

Copy link
Contributor

@trozet trozet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm just one minor question around the nbctl cmds in first commit

@@ -471,6 +471,31 @@ func (oc *Controller) SetupMaster(masterNodeName string) error {
return nil
}

func addNodeLogicalSwitchPort(logicalSwitch, portName, portType, addresses, options string) (string, error) {
stdout, stderr, err := util.RunOVNNbctl("--", "--may-exist", "lsp-add", logicalSwitch, portName,
"--", "lsp-set-type", portName, portType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is OK, since the tests passed...but just to make sure: is it OK that you always include these commands even when their value is empty? For example, setting port type to ""? Doesn't cause any errors?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trozet, it should be fine. Port type "" is the default and means the LSP is a "regular" VIF.

For lsp-set-options too, passing an empty string is fine as it will generate an empty map.

framework.ExpectNoError(err, "Error updating Namespace %v: %v", ns, err)
})

ginkgo.It("should be able to send multicast UDP traffic between nodes", func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @aojea and @dceara for adding these! Is there any ETA on when we could get ipv6 tests enabled? Do we need to have a github issue so we dont forget to add them later?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on the the upstream agnhost image update, but there are some issues with the process.
I think that the new image should be ready for next week.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK created: #1949

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants