-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd-nsc-vpp: issues with NOT default NSM_NAME #1826
Comments
By default, we are using the name of POD: https://github.com/networkservicemesh/deployments-k8s/blob/main/apps/nsc-memif/nsc.yaml#L29 |
@richardstone Could you share your diff in the deployment? |
Yes that is exactly how i'd like to use, but if I uncomment the mentioned part, it does not work. Here is my deployment:
|
OK, thanks! Could you also share logs from the container? |
Here is the log: cmd-nsc-vpp.txt |
@d-uzlov Could you have a look at this issue ASAP? |
@richardstone could you provide info about the cluster and operation system you are using? Also, which exact names did you test besides "endpoint-nsc"? Are the errors in logs the same for all of the names you tested? If not, could you post logs for those different cases? Am I understanding correctly that short names also don't work? Like, for example, just "nsc" as in the deployment config you posted. I wasn't able to reproduce the issue with the name you provided. However, I was able to get the same error when the name is too long. On my system "too long" is not applicable to names like "endpoint-nsc-7f9c9cddc9-hjsk5", I had to add ~15 symbols to it to make name too long, but maybe the limit is different for your system. |
Here are the logs I get when I try to set the NSM_NAME to the value from your logs:
|
Hi! Thanks for the fast response. |
Oh, it's good to know that we correctly identified the cause of the issue! The limitation comes from the fact that memif connection uses unix sockets, with file name containing connection id, and cmd-nsc-vpp uses the name from its config as part of the connection id. I guess we should either change the way we generate the socket name, or find some workaround. |
Yes, thanks a lot for the investigation! Is it possible that you'll change the method of socket name generation so there won't be differences in limits for the memif and kernel NSC names? |
Yeah, I believe that we will fix it to remove limitations. |
Very good, Thanks in advance! |
@richardstone @d-uzlov Good catch. @d-uzlov Is there any reason to set a Connection.ID at all in the cmd-nsc rather than letting the normal connection id generation kick in? |
@richardstone Looking more closely at this, while I wholeheartedly support your recommended workaround of not including the name in the connection id in cmd-nsc as a fix for NSM 1.0, we need to get a more comprehensive fix for post NSM 1.0 :) I am seeing some reports that using relative paths can provide a partial workaround. If true, that should give us a reliable way to work around the issue. Thoughts? |
@edwarnicke The best for me would be that I'd be able to use the same long name for the memif nsc that I use for the kernel one. As a workaround I started to look for a way to be able to take a substring of the full pod name as the NSM_NAME variable but I had no luck with finding a solution for that so far. |
I don't think there is any real benefit in manually setting the connection ID here. We already have path name for each path segment, which is usually used to identify where the connection came through. But we probably don't want to limit the users in which connection ids they can use, so I think memif should support long ids.
I was thinking about keeping a map of [connectionId -> UUID], and using these uuids for socket names instead of connection ids. Relative paths should work too, though I'm not sure if they will be convenient, since we would be changing current working directory of the program, and some clients may not expect this. Also, I didn't research this properly, but we could get issues with multithreading. |
@richardstone Net-net: are you OK for NSM 1.0 of we simply don't use the pod name in the connection id, and we can revisit for a better solution post NSM 1.0? |
This is a workaround for: networkservicemesh/deployments-k8s#1826 Signed-off-by: Ed Warnicke <[email protected]>
This is a workaround for: networkservicemesh/deployments-k8s#1826 Signed-off-by: Ed Warnicke <[email protected]>
This is a workaround for: networkservicemesh/deployments-k8s#1826 Signed-off-by: Ed Warnicke <[email protected]>
@edwarnicke It's okay for me. Thanks! |
This is a workaround for: networkservicemesh/deployments-k8s#1826 Signed-off-by: Ed Warnicke <[email protected]>
…cmd-nsc-vpp@main networkservicemesh/cmd-nsc-vpp# networkservicemesh/cmd-nsc-vpp PR link: https://github.com/networkservicemesh/cmd-nsc-vpp/pull/ networkservicemesh/cmd-nsc-vpp commit message: commit 43bb24c9953cf220ea08e52ab50fa853beb9084a Author: Ed Warnicke <[email protected]> Date: Fri Jun 18 08:39:34 2021 -0500 Fix use of nsc name in connection ignored (#174) This is a workaround for: #1826 Signed-off-by: Ed Warnicke <[email protected]> Signed-off-by: NSMBot <[email protected]>
@richardstone Temp fix is in. Lets leave this issue open to get a longer term fix. |
Hi!
When I set the NSM_NAME parameter of cmd-nsc-vpp to anything other than the default, I get this error:
Jun 17 16:19:42.714 [ERRO] [cmd:/bin/cmd-nsc-vpp] (19.1) proxyListener unable to listen on /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: listen unixpacket /tmp/memifproxy/endpoint-nsc-795886dc88-577t6-f96ec20f-ede0-499f-b0cc-819e8f735869/memif.socket: bind: invalid argument
The interface seems to be in place for a second, but there are no neighbors and the pod keeps restarting.
vppctl show interface address
local0 (dn):
memif1/0 (up):
L3 172.16.1.96/32
I followed this guide: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/use-cases/Memif2Memif
If I don't set the NSM_NAME parameter, it works correctly.
Is it possible that the given name is not handled correctly somewhere, or could you help me on where to look for issues?
The text was updated successfully, but these errors were encountered: