Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to reapply changes #121

Merged
merged 1 commit into from
Apr 18, 2020
Merged

Try to reapply changes #121

merged 1 commit into from
Apr 18, 2020

Conversation

tyll
Copy link
Member

@tyll tyll commented Jun 13, 2019

This includes one shared commit with #119

Fixes #38


This change is Reviewable

@tyll tyll requested a review from thom311 June 13, 2019 22:25
@coveralls
Copy link

coveralls commented Jun 13, 2019

Pull Request Test Coverage Report for Build 330

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 247 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.3%) to 38.188%

Files with Coverage Reduction New Missed Lines %
/home/travis/build/linux-system-roles/network/library/network_connections.py 247 19.52%
Totals Coverage Status
Change from base Build 326: -0.3%
Covered Lines: 826
Relevant Lines: 2163

💛 - Coveralls


version_id = 0
flags = 0
return Util.call_async(device, "reapply", [connection, version_id, flags])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is right.

For one, the role should not pass a connection. Instead, it should first make sure that the connection profile is up to date, and then pass no connection.

Also, the selection of the device doesn't seem right. The interface name does not matter because you can have profile that are not tied to a device by interface-name. What matters is whether the profile you want to activate is already active (on the right device). So, you need to find the profile, then the device(s) on which it's active, and if that the right one, then call reapply.

@tyll tyll force-pushed the reapply branch 5 times, most recently from b9fb4a4 to 510744c Compare June 14, 2019 14:46
@tyll
Copy link
Member Author

tyll commented Jun 14, 2019

[citest bad]

@tyll
Copy link
Member Author

tyll commented Jun 14, 2019

[citest bad]

1 similar comment
@tyll
Copy link
Member Author

tyll commented Jun 24, 2019

[citest bad]

@pcahyna
Copy link
Member

pcahyna commented Aug 23, 2019

@thom311 I guess this is ready to be merged?
Also, does it require some new NM features, or will it work for all versions in RHEL 7 and RHEL 8?

@thom311
Copy link
Contributor

thom311 commented Aug 24, 2019

Also, does it require some new NM features, or will it work for all versions in RHEL 7 and RHEL 8?

well,optimally there are tests that confirm (somewhat) that it works well. In practice, I think it should work just fine on rhel-7 (and rhel-8 of course too).

I guess this is ready to be merged?

Branch looks good to me, but I didn't test it.

@pcahyna
Copy link
Member

pcahyna commented Jan 10, 2020

@tyll can this be merged?

@tyll
Copy link
Member Author

tyll commented Jan 10, 2020

@tyll can this be merged?

It is missing a specific test to verify that the implemented feature is really triggered which is a challenge with the current integration test framework. From the discussion I recall the idea was to call the python code without ansible on the target system to be able to monitor the link change at the same time on the tested host by using a function like https://github.com/nmstate/nmstate/blob/ca3450dc93a4dc3af9764c832214c2c250a92535/tests/integration/testlib/iproutelib.py#L37

So far I only tested this manually with a second shell. Maybe this could also be simulated with a async job.

@pcahyna
Copy link
Member

pcahyna commented Jan 10, 2020

@tyll ah, I haven't realized the test is missing, although from looking at the changes it should have been obvious. Sorry.

You say "call the python code without ansible on the target system to be able to monitor the link change at the same time on the tested host". Do you distinguish between "target system" and "tested host"? I.e. would it be a multihost test?

@pcahyna
Copy link
Member

pcahyna commented Jan 10, 2020

Also, isn't this intended to avoid disrupting services? Can't one just run a dummy service and check whether it has not been disrupted?
(I don't know right now in which way exactly does the device teardown and reactivation disrupt the service, this needs to be specified in order to detect it, as it is not described in detail in #38, just that it "is more disruptive than desired").

@tyll
Copy link
Member Author

tyll commented Jan 10, 2020

@tyll ah, I haven't realized the test is missing, although from looking at the changes it should have been obvious. Sorry.

You say "call the python code without ansible on the target system to be able to monitor the link change at the same time on the tested host". Do you distinguish between "target system" and "tested host"? I.e. would it be a multihost test?

I meant the target system with tested host. Basically use standard test roles in the original way to execute test code on the provisioned host instead of having the test completely defined in the playbook.

Also, isn't this intended to avoid disrupting services? Can't one just run a dummy service and check whether it has not been disrupted?
(I don't know right now in which way exactly does the device teardown and reactivation disrupt the service, this needs to be specified in order to detect it, as it is not described in detail in #38, just that it "is more disruptive than desired").

The disruption is that the interface link goes down for a short time. I guess the disruption is mainly visible in real-time data streaming services. When I tried to disrupt a simple service implemented by socat I did not see a problem with the original code. It might also be that this has a greater effect when switches are involved and a link state change might delay connectivity because of the Spanning Tree Protocol. In Nmstate we did not find a real-world test case for the CI.

@pcahyna
Copy link
Member

pcahyna commented Jan 10, 2020

The disruption is that the interface link goes down for a short time. I guess the disruption is mainly visible in real-time data streaming services. When I tried to disrupt a simple service implemented by socat I did not see a problem with the original code. It might also be that this has a greater effect when switches are involved and a link state change might delay connectivity because of the Spanning Tree Protocol. In Nmstate we did not find a real-world test case for the CI.

What if a socket is bound to the IP address that goes away, this has no effect on the socket?
If not, isn't this event broadcast on the routing socket?

@pcahyna
Copy link
Member

pcahyna commented Feb 20, 2020

if there is a link state change, wouldn't it be reported at the other end of a veth connection? Also, isn't such event reported to userspace on the routing socket (netlink)?

@tyll tyll added the Wait_Submitter The submitter needs to do something before this can move on label Apr 1, 2020
@tyll tyll force-pushed the reapply branch 3 times, most recently from 8a2ba89 to d6a1c59 Compare April 17, 2020 20:29
@tyll tyll removed the Wait_Submitter The submitter needs to do something before this can move on label Apr 17, 2020
@tyll
Copy link
Member Author

tyll commented Apr 17, 2020

To be able to move forward with the PR, I decided to add a check that will just look into the log output to see whether reapply was attempted. The exact effects of reapply can be tested as part of NetworkManager IMHO. If we have a request for a specific scenario for the role, we can still add a more specific test. Also when we have support for integration tests with pytest, we can improve/migrate this test.

@tyll tyll merged commit 20a20b6 into linux-system-roles:master Apr 18, 2020
@tyll tyll deleted the reapply branch April 18, 2020 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFE] with NetworkManager try to use Reapply as a graceful way to change configuration
4 participants