Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some problems with function ipc_dialer_dial #1964

Open
Alex1919810 opened this issue Dec 7, 2024 · 6 comments
Open

some problems with function ipc_dialer_dial #1964

Alex1919810 opened this issue Dec 7, 2024 · 6 comments
Labels

Comments

@Alex1919810
Copy link

it's my first time to use nng,when i use ipc to transport,this has a strange prome.
in posix_ipcdial.c:174,function ipc_dialer_dial,
if ((fd = socket(ss.ss_family, SOCK_STREAM | SOCK_CLOEXEC, 0)) < 0) {
nni_aio_finish_error(aio, nni_plat_errno(errno));
return;
}
i dont't know why this socket create so many times,and in certain scenes,the socket will not close,then the handle is all exhausted.
is this work sure?

@gdamore
Copy link
Contributor

gdamore commented Dec 7, 2024

I'm not sure I understand your question or problem.

That function dials once, but it does so by creating a file descriptor. If that fails, then we won't have created a file descriptor, and you won't have exhausted any resource.

Are you using Linux or a POSIX platform, or Windows. On Windows its a totally different set of logic, and a different function.

@Alex1919810
Copy link
Author

Alex1919810 commented Dec 8, 2024

Thanks for your response, sir. I will describe my problem further.
The usage environment is Linux, Ubuntu 18. The code uses nng's publish-subscribe mode and IPC communication.

The publisher code is similar to

nng_socket socket;
nng_ipc_register();
nng_pub0_open(&socket);
nng_listen(tmpSocket, ipcurl.c_str(), NULL, NNG_FLAG_NONBLOCK);

if listen failed,i will close socket

The subscribercode is similar to

nng_socket socket;
nng_ipc_register();
nng_sub0_open(&socket);
nng_setopt(socket, NNG_OPT_SUB_SUBSCRIBE, "", 0);
nng_dial(socket, ipcurl.c_str(), NULL, NNG_FLAG_NONBLOCK);
nng_setopt_ms(socket, NNG_OPT_RECONNMINT, NNG_DURATION_INFINITE);
nng_getopt_int(socket, NNG_OPT_RECVFD, &fd);

I added print in ipc_dialer_dial
屏幕截图 2024-12-08 125100

then i start my program
屏幕截图 2024-12-08 125434
the print taking over my screen

I found that when I started my program manually, although the sockets were constantly created, they were also constantly released, so the resources would not be exhausted in the end.

But when I restart the system, the program starts through the system auto-start script, it will keep creating sockets and never release them, eventually running out of resources.

Is there something wrong with the way I use it, sir?

@Alex1919810
Copy link
Author

more detail
屏幕截图 2024-12-08 185110
Under normal circumstances, the socket is created and closed if it is not connected.

屏幕截图 2024-12-08 182406
but when I restart the system,Please see the path “/tmp/podmark” in the picture,when close one socket,it will open more than one,and redundant sockets will not be released,eventually running out of resources.

@gdamore
Copy link
Contributor

gdamore commented Dec 9, 2024

I think I know what might be happening.

When the dial fails -- in this case because of the missing path, we wind up retrying the dial. So we should be opening and closing sockets pretty frequently. I think there is a cool down, but it is only 10 milliseconds which is meant to allow the system to get other work done.

The fact that we're churning file descriptors should be benign. You should not run out of them, as the closed ones can be reused.

Perhaps we should use some sort of exponential back off here, because the failure is not good.

Note that if you did not use NNG_FLAG_NONBLOCK in your dialing, then your application would get the failure immediately, and be able to deal with it, and we wouldn't keep retrying.

The NNG_FLAG_NONBLOCK you supplied to nng_dial() means that you're expecting NNG to keep trying in the background to connect (which is exactly what it is doing!)

Again, perhaps some more tunability about the retry rate and backoff could be used here. But if you care about this, I recommend not using NNG_FLAG_NONBLOCK for dialing.

(Actually, I almost never use that flag for anything, and I recommend avoiding it if at all possible. If you need asynchronous operations then using the aio methods will give superior results.)

@Alex1919810
Copy link
Author

thank you sir,I already understand why sockets are created and destroyed all the time,later i will try to use aio.

But I'm still a little confused,If I understand correctly, if the connection cannot be established immediately, only one connection task should be inserted into the task queue, which will be reflected in the printout as creating a socket, destroying one, and then creating another one.

In picture 2, it looks like there are multiple connection tasks inserted, causing the releases to not keep up with the creations?Is this work true?

@Alex1919810
Copy link
Author

iI tried not using NNG_FLAG_NONBLOCK and moving the reconnection operation (if the dial fails, I will manually dial again) to my own thread instead of letting nng control it, and the handle exhaustion problem disappeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants