-
Notifications
You must be signed in to change notification settings - Fork 28
debugging Mesos and net-modules #90
Comments
@unclejack: Can you provide the slave logs corresponding to the failed logs? If possible, I'd like to take a look at the sequence of events in there. We can then talk about adding more logging to the module and see what would help streamline debugging even further. |
@karya0 Of course, you'll have the logs, the full JSON config of the application and all the relevant information in a few hours. |
The full log of the mesos-slave can be found below:
The JSON configuration I was using is:
I've used backoffSeconds to make it avoid spamming the logs with too many attempts to run. Mesos: 0.26 Please let me know if there are any other bits of information I should provide. update: /calico/calico_mesos isn't the Calico isolator, it's just a dummy isolator used for testing. It wasn't important to change that during testing with a dummy isolator. |
@karya0 Have you had a chance to take a look at this yet? |
@unclejack: Sorry for the delay in getting back to you. Can you tell me a bit more about your dummy isolator implementation? Is it going to create veth pairs for the container? Without it, the executor won't be able to bind to the the given IP address and hence it would fail to connect to the agent and eventually exit. |
@karya0 My dummy isolator doesn't set up veth pairs, but this isn't documented in the net-modules API. Could you tell me more about this and any other requirements around these side effects, please? |
Okay, so once the isolator is activate, it would ask the Mesos agent to create a network namespace for the container being launched. Thus, someone needs to setup the network for the new container. In the case of calico, the calico binary does all that as part of the "isolation" command. It also creates the appropriate routes to make it possible for the executor to talk to the agent and any other nodes as needed. |
We can work together on getting a README.md that is more catered towards writing a new isolation-service-provider. Does that sound reasonable? |
There's some info on it here, but its a tad outdated: https://github.com/mesosphere/net-modules/blob/master/docs/api.md#network-virtualizer-api |
@karya0 That sounds good. I'll write my implementation and send some PRs to update the documentation. @djosborne It's unfortunate that the API documentation is a bit outdated. I've been using that documentation to implement an IPAM and isolator. Up to date documentation would have saved me lots of hours of debugging. |
SGTM. |
I am facing the similar issue. Can you point me to latest doc. |
The logging provided by the Mesos slave, net-modules and Marathon doesn't provide many hints regarding the reason behind the startup failure of an application.
A setup which uses Marathon 0.14.0, Mesos 0.26 and the latest net-modules master exhibits the following behaviour:
Observations, things I've tried so far and things I've investigated:
Given the issue described above, I have a few questions
I've read all the documentation I could read and I've also read some of the code to try to figure out if I'm missing something obvious.
Please let me know if I should also provide some logs.
The text was updated successfully, but these errors were encountered: