Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NetworkPolicyReference to controlplane NetworkPolicy #1258

Merged
merged 1 commit into from
Sep 22, 2020

Conversation

tnqn
Copy link
Member

@tnqn tnqn commented Sep 16, 2020

NetworkPolicy in controlplane API group is the object that consumed by antrea-agents. Both Antrea NetworkPolicy and K8s NetworkPolicy will be converted to it. Currently, the namespace and name of the original NetworkPolicy are copied to the controlplane NetworkPolicy and <Namespace>/<Name> is used as the key func. Therefore, one K8s NetworkPolicy may overwrite the controlplane NetworkPolicy mapping to a Antrea NetworkPolicy that has the same namespace and name.

To fix it completely, the original NetworkPolicy name will not be used as the controlplane NetworkPolicy name directly. However, antrea-agents still need to know the information about the original NetworkPolicy for multiple purposes, e.g. metrics and realization status report, and traceflow observation correlation.

To keep backward compatibility, this patch adds a new field "SourceRef" that includes the information about the original NetworkPolicy to the struct of controlplane NetworkPolicy. Instead of assuming controlplane NetworkPolicy has the same name as the original NetworkPolicy, antrea-agents consumes the new field to get the information about the original NetworkPolicy.

For #1173 and #1172

Compatibility impact: None

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-hw-offload: to trigger ovs hardware offload test.
  • /skip-hw-offload: to skip ovs hardware offload test.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

@tnqn tnqn added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API. label Sep 16, 2020
@codecov-commenter
Copy link

codecov-commenter commented Sep 16, 2020

Codecov Report

Merging #1258 into master will decrease coverage by 1.25%.
The diff coverage is 56.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1258      +/-   ##
==========================================
- Coverage   56.29%   55.04%   -1.26%     
==========================================
  Files         109      110       +1     
  Lines       12045    10597    -1448     
==========================================
- Hits         6781     5833     -948     
+ Misses       4673     4191     -482     
+ Partials      591      573      -18     
Flag Coverage Δ
#integration-tests 44.89% <20.00%> (-1.22%) ⬇️
#unit-tests 41.90% <59.37%> (-0.64%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/agent/openflow/client.go 32.85% <ø> (-2.75%) ⬇️
pkg/apis/controlplane/helper.go 0.00% <0.00%> (ø)
pkg/apis/controlplane/v1beta1/helper.go 0.00% <0.00%> (ø)
pkg/agent/controller/networkpolicy/reconciler.go 66.50% <20.83%> (-2.57%) ⬇️
pkg/agent/openflow/network_policy.go 73.70% <50.00%> (+0.08%) ⬆️
pkg/agent/controller/networkpolicy/cache.go 79.34% <100.00%> (+2.02%) ⬆️
...ntroller/networkpolicy/networkpolicy_controller.go 61.26% <100.00%> (+2.72%) ⬆️
pkg/agent/types/networkpolicy.go 11.11% <100.00%> (-5.56%) ⬇️
...kg/controller/networkpolicy/antreanetworkpolicy.go 64.28% <100.00%> (+3.17%) ⬆️
...g/controller/networkpolicy/clusternetworkpolicy.go 63.41% <100.00%> (+2.74%) ⬆️
... and 112 more

@tnqn tnqn force-pushed the np-ref branch 2 times, most recently from dba53fe to 1e09de0 Compare September 16, 2020 15:11
@tnqn tnqn added this to the Antrea v0.10.0 release milestone Sep 16, 2020
@tnqn
Copy link
Member Author

tnqn commented Sep 16, 2020

/test-all

abhiraut
abhiraut previously approved these changes Sep 16, 2020
Copy link
Contributor

@abhiraut abhiraut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

pkg/controller/networkpolicy/antreanetworkpolicy_test.go Outdated Show resolved Hide resolved
pkg/controller/types/networkpolicy.go Outdated Show resolved Hide resolved
@abhiraut
Copy link
Contributor

/cc @suwang48404 .. note this changes the message of internal network policies

var rules []*rule
for _, obj := range objs {
rule := obj.(*rule)
if rule.PolicyRef.Name == npName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is K8s NP and ANP differentiated here? If there were NP and ANP with same ns/name, wouldn't rules for both those policies be added to rules still?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I tried to fix it but it seems not a simple fix as antctl never expects Antrea Policies before. I filed #1262 and listed existing issues. As this interface is consumed by antctl only, I would prefer addressing it with a separate PR for antctl. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. cc @GraysonWu as he is working on antctl support for ClusterNetworkPolicy. This PR and issue #1262 should be relevant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i noticed this too.. but since we have not yet implemented antctl so it won't make sense to handle it in this PR
@tnqn i already raised #927 issue long time back.. will dup your issue to it as Grayson is already assigned the initial PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Dyanngg @abhiraut @GraysonWu for taking care of this.

@tnqn
Copy link
Member Author

tnqn commented Sep 17, 2020

/test-all

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - this is not a compatible API changes right? @tnqn

@abhiraut
Copy link
Contributor

Question - this is not a compatible API changes right? @tnqn

i think this is one case where we may need to handle versioning ?

@tnqn
Copy link
Member Author

tnqn commented Sep 18, 2020

@jianjuns @abhiraut I added compatibility impact to the PR's description and proposed a staged solution, please see if it makes sense for you. I think at least it's riskless to do 1 for now and we could decide whether it's worth to do 2 and 3 later.

@tnqn tnqn force-pushed the np-ref branch 3 times, most recently from b117ec1 to 9bec8cd Compare September 18, 2020 15:02
@tnqn
Copy link
Member Author

tnqn commented Sep 18, 2020

@jianjuns @abhiraut I'm going to update PR's description to reflect the latest compatibility impact, posting previous impact and proposal here:

Compatibility impact if we change controlplane NetworkPolicy to cluster scope and use UID as name directly:

  • This change is backward-compatible for antrea-agent.
    Older antrea-agents can communicate with new antrea-controller as nothing has changed to the rule and antrea-agents never filtered policies by namespace.
    A few logging and messages of Traceflow observations will be impacted as "Namespace" is no longer set and "Name" is changed to the UID of the original NetworkPolicy, for example:
# the original log
NetworkPolicy default/allow-dns applied to Pods on this Node
# older agents would become
NetworkPolicy /561e923c-a9c6-43b6-80d8-fda1427e4b98 applied to Pods on this Node
$ new agents would become
NetworkPolicy K8sNetworkPolicy:default/allow-dns applied to Pods on this Node
  • It's not completely backward-compatible for antctl. Filtering by namespace and/or name will be broken because of the namespace scope change.
# antctl-0.9.3 get netpol
NAMESPACE NAME                                 APPLIED-TO                           RULES
<NONE>    561e923c-a9c6-43b6-80d8-fda1427e4b98 c4c59cfe-9160-5de5-a85b-01a58d11963e 1
# antctl-0.9.3 get netpol 561e923c-a9c6-43b6-80d8-fda1427e4b98
Error: an empty namespace may not be set when a resource name is provided
# antctl-0.9.3 get netpol -n default
Error: Not Found: the server could not find the requested resource

If we want to keep complete backward-compatibility for all clients, maybe we could do the following:

  1. For this release, we only add sourceRef field to v1beta1 and do not make any change to the content of namespace, name, and namespace scope. So features that need to know kinds of NetworkPolicies can get what they need without any other clients affected.
  2. In next release, we add v1beta2 version that changes the resource scope to cluster and use the original UID as the resource's name.
  3. After v1beta1 clients are out of support windows, we remove v1beta1.

@tnqn tnqn changed the title Fix name collision between Antrea NetworkPolicy and K8s NetworkPolicy Add NetworkPolicyReference to controlplane NetworkPolicy Sep 18, 2020
@tnqn
Copy link
Member Author

tnqn commented Sep 18, 2020

/test-all

@jianjuns
Copy link
Contributor

@tnqn I agreed with the two-stages approach you proposed. We can decide later how to address other issues and whether to increase API group version.

@antoninbas
Copy link
Contributor

The approach looks good, but I'd like to understand how this will fix #1173 in v0.10? It seems that ANP and K8s NP can still overwrite each other in the store as we are still using the same Name & Namespace in the internal policy object and we didn't update the store key function?

@tnqn
Copy link
Member Author

tnqn commented Sep 21, 2020

The approach looks good, but I'd like to understand how this will fix #1173 in v0.10? It seems that ANP and K8s NP can still overwrite each other in the store as we are still using the same Name & Namespace in the internal policy object and we didn't update the store key function?

@antoninbas It cannot fix #1173 and is only the first step that makes current internal API consumers not rely on the name and namespace of controlplane networkpolicy, and a prerequisite of supporting NetworkPolicy metrics/status. Changing key function or the name of controlplane networkpolicy may introduce little troubles to upgrade and backward compatibility as listed in #1258 (comment), though none of them are really serious, and we don't know its impact on external API consumers (cloud case) yet. As ANP is still alpha, do you think we could defer the fix to next release?

@antoninbas
Copy link
Contributor

@tnqn yes, sounds good. I was just double-checking that this was not indeed a fix

antoninbas
antoninbas previously approved these changes Sep 21, 2020
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one suggestion to improve a comment, otherwise LGTM

Comment on lines 81 to 87
// Reference to the original NetworkPolicy that the rule belongs to.
// Note it has different meaning from PolicyUID, PolicyName, and
// PolicyNamespace which are the metadata of controlplane NetworkPolicy.
// Although they are same for now, it might change in future, features that
// need the information of the original NetworkPolicy should use SourceRef.
SourceRef *v1beta1.NetworkPolicyReference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it has a different meaning
which are the metadata fields of the corresponding controlplane NetworkPolicy
it might change in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks!

@abhiraut
Copy link
Contributor

@jianjuns @abhiraut I'm going to update PR's description to reflect the latest compatibility impact, posting previous impact and proposal here:

Compatibility impact if we change controlplane NetworkPolicy to cluster scope and use UID as name directly:

  • This change is backward-compatible for antrea-agent.
    Older antrea-agents can communicate with new antrea-controller as nothing has changed to the rule and antrea-agents never filtered policies by namespace.
    A few logging and messages of Traceflow observations will be impacted as "Namespace" is no longer set and "Name" is changed to the UID of the original NetworkPolicy, for example:
# the original log
NetworkPolicy default/allow-dns applied to Pods on this Node
# older agents would become
NetworkPolicy /561e923c-a9c6-43b6-80d8-fda1427e4b98 applied to Pods on this Node
$ new agents would become
NetworkPolicy K8sNetworkPolicy:default/allow-dns applied to Pods on this Node
  • It's not completely backward-compatible for antctl. Filtering by namespace and/or name will be broken because of the namespace scope change.
# antctl-0.9.3 get netpol
NAMESPACE NAME                                 APPLIED-TO                           RULES
<NONE>    561e923c-a9c6-43b6-80d8-fda1427e4b98 c4c59cfe-9160-5de5-a85b-01a58d11963e 1
# antctl-0.9.3 get netpol 561e923c-a9c6-43b6-80d8-fda1427e4b98
Error: an empty namespace may not be set when a resource name is provided
# antctl-0.9.3 get netpol -n default
Error: Not Found: the server could not find the requested resource

If we want to keep complete backward-compatibility for all clients, maybe we could do the following:

  1. For this release, we only add sourceRef field to v1beta1 and do not make any change to the content of namespace, name, and namespace scope. So features that need to know kinds of NetworkPolicies can get what they need without any other clients affected.
  2. In next release, we add v1beta2 version that changes the resource scope to cluster and use the original UID as the resource's name.
  3. After v1beta1 clients are out of support windows, we remove v1beta1.

I was really hoping we could fix the issue in this release but i understand that it may not be enough time for us to implement v1beta2 versioning here.. lets target it for 0.11 .. in the middle of reviewing this PR

Copy link
Contributor

@abhiraut abhiraut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question. lgtm otherwise

func (r *CompletedRule) isAntreaNetworkPolicyRule() bool {
return r.PolicyPriority != nil
return r.SourceRef.Type != v1beta1.K8sNetworkPolicy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to be careful in terms of using this field right after upgrade .. ie what happens if the controller was not yet upgraded and agent was ? i.e. is sourceRef guaranteed to be not nil in such cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If upgrading by applying yaml, the old controller pod should be removed immediately so there should be no chance that new agent can talk to old controller. For other integrated solutions, we also assume controller is upgraded first. We may need a doc like https://kubernetes.io/docs/setup/release/version-skew-policy/#supported-component-upgrade-order to document it officially. It could be a part of #1273 which is tracking the version policy.

abhiraut
abhiraut previously approved these changes Sep 21, 2020
Copy link
Contributor

@abhiraut abhiraut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we recommend controller upgrade first followed by agents.. so we should be good

NetworkPolicy in controlplane API group is the object that consumed by
antrea-agents. Both Antrea NetworkPolicy and K8s NetworkPolicy will be
converted to it. Currently, the namespace and name of the original
NetworkPolicy are copied to the controlplane NetworkPolicy and
<Namespace>/<Name> is used as the key func. Therefore, one K8s
NetworkPolicy may overwrite the controlplane NetworkPolicy mapping to a
Antrea NetworkPolicy that has the same namespace and name.

To fix it completely, the original NetworkPolicy name will not be used
as the controlplane NetworkPolicy name directly. However, antrea-agents
still need to know the information about the original NetworkPolicy for
multiple purposes, e.g. metrics and realization status report, and
traceflow observation correlation.

To keep backward compatibility, this patch adds a new field "SourceRef"
that includes the information about the original NetworkPolicy to the
struct of controlplane NetworkPolicy. Instead of assuming controlplane
NetworkPolicy has the same name as the original NetworkPolicy,
antrea-agents consumes the new field to get the information about the
original NetworkPolicy.
@tnqn
Copy link
Member Author

tnqn commented Sep 22, 2020

/test-all

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member Author

tnqn commented Sep 22, 2020

/test-windows-conformance
/test-windows-networkpolicy

@lzhecheng
Copy link
Contributor

/test-windows-conformance

@tnqn
Copy link
Member Author

tnqn commented Sep 22, 2020

/test-windows-networkpolicy

1 similar comment
@tnqn
Copy link
Member Author

tnqn commented Sep 22, 2020

/test-windows-networkpolicy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants