Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API in WiFiClient to save heap memory #8732

Closed
wants to merge 1 commit into from
Closed

API in WiFiClient to save heap memory #8732

wants to merge 1 commit into from

Conversation

RobertGnz
Copy link
Contributor

When used by a server, heap memory decrease drastically after numerous client acces (Client=Server.available()) because sockets in TIME_WAIT remains too much time after issuing Client.stop by the server in comparison with the ESP8266 memory capacity.
These TIME_WAIT sockets causes a server crash (and random reboot) after some hours or sometimes some minutes running due to lack of available heap memory.
Note that the proposed method tcpCleanup in #1923 does not work because the instruction "extern struct tcp_pcb* tcp_tw_pcbs;" does NOT initialise tcp_tw_pcbs to the one used by the Client. It is only a declaration and because it is only a declaration, tcp_tw_pcbs is set to NULL. Consecuently tcp_abort(tcp_tw_pcbs) is never called.
The proposed request creates an API for an effective call to tcp_abort.
It has to be called before issuing an Client.stop().
With this method heap memory's server does not decrease even in case of frequent clients access an will run hours and days without crashing.
Using this API is of course optional.

When used by a server heap memory decrease drastically after numero client acces. Sockets in TIME_WAIT stay after the server issues a Client.stop which causes a server crash (and reboot) after some hours or some minutes. Note that the proposed method tcpCleanup in #1923 does not work because the instruction "extern struct tcp_pcb* tcp_tw_pcbs;" does NOT initialise tcp_tw_pcbs to the one used by the Client. It is only a declaration and because it is only a declaration, tcp_tw_pcbs is set to NULL. Consecuently tcp_abort(tcp_tw_pcbs) is never called.
@mcspr
Copy link
Collaborator

mcspr commented Nov 26, 2022

Reading doc and actual impl, tcp_abort -> tcp_abandon will free pcb from under you. After that, pcb ptr is technically freed and returned to the shared memory poll. Anything doing anything on pcb pointer is operating on uninitialized mem.

Seems kind of dangerous to include as public api, unless we handle things ourselves in the proper order. Another thing is tcp_... API should preferably go to ClientContext, which actually holds the pcb

@RobertGnz
Copy link
Contributor Author

1- I did not found any doc on tcp_abandon. Could you please give me a link to that doc ?

2- Yes, tcp_pcb* is held by ClientContext and thats the reason why I called _client->getPCB() which has to be called from inside Client object and passed to tcp_abort(tcp_pcb*) after initialization with _client->getPCB().

@mcspr
Copy link
Collaborator

mcspr commented Nov 26, 2022

:oops:, sorry

https://github.com/lwip-tcpip/lwip/blob/STABLE-2_1_3_RELEASE/src/core/tcp.c
https://github.com/lwip-tcpip/lwip/blob/STABLE-2_1_3_RELEASE/src/core/tcp_in.c (aka input)
https://www.nongnu.org/lwip/2_1_x/group__tcp__raw.html#ga468c2260ddb01582e966ddcf3c25ce61 (/search/ through the code is often easier, most of the api description is right there in the .c comments)

what I mean is ctx object itself does not know about the reset, or anyone trying to use it after that call. manual approach is working, but I got a feeling this is supposed to work in the lwip layer

returning to the issue itself... quick search of linked thread, since 2018 we still carry the timewait patch that is supposed to limit the pcb total count to 5
https://github.com/d-a-v/esp82xx-nonos-linklayer/blob/e8654cd6b67a243d382f0d577fa4f01204a282e2/patches/time-wait.patch
https://github.com/d-a-v/esp82xx-nonos-linklayer/blob/e8654cd6b67a243d382f0d577fa4f01204a282e2/glue-lwip/arduino/lwipopts.h#L3664

@RobertGnz
Copy link
Contributor Author

Ok thanks for the links.
My problem was accessing to the tcp_pcb pointer value corresponding to the WiFiClient in use ( Client=Server.available() ) and held in the client context in order to pass it to tcp_abort as requiered by this function.
I found than the only way to access its value was to use an additional method from inside WiFiClient class which in turn was able to call _client->getPCB() (because getPCB is a function of client context). Then I could pass this pointer to tcp_abort.

But reading the links you sent to me, I found that it is possible to kill the oldest connection in TIME_WAIT by mean of calling tcp_kill_timewait(void). No pointer to pass to this function, so easy to use.

I will try it making some calls to this function from time to time when the server is running and see what happens.

Just an additional remark:

It is said in the doc that tcp_kill_timewait is called from tcp_alloc() if no more connections are available. So when a server is running accepting a lot of connections from clients and stopping them one after the other, tcp_alloc should call tcp_kill_timewait from time to time not to face with a malloc failure.
But I have to say that it seems not to be the case because after some tens of minutes accepting client connections and closing them, my server crashes because of lack of heap memory when trying to accept a new connection.

@RobertGnz
Copy link
Contributor Author

RobertGnz commented Nov 27, 2022

@mcspr

Hello, as mentionned in my previous response I made an experiment using tcp_kill_timewait().

Inside the server I made a call after every Client.close but after some minutes of frequent clients connections (4 connections randomly every 30 seconds) the server crashes (not enough heap memory available).
I also tried another strategy consisting in calling tcp_kill_timewait() every 2 seconds when no clients where connected to the server and after every Client.close but the result was the same: server crashes after 30 minutes -> insufficient heap memory.

Obviously it was not enough to kill sockets in TIME_WAIT state. So the only way I found avoid insufficient heap memory available is to systematically abort client's pcb. For this I did not found any other solution than;

WiFiClient.h:

bool stopAndAbortPcb(); // new API

WiFiClient.cpp;

// Api for heap saving. Optional use instead of WiFiClient::stop to systematically retreive some heap memory when closing a client
bool WiFiClient::stopAndAbortPcb()
{
tcp_pcb* p;
if (!_client) return(true);
p = _client->getPCB();
this->stop(); // close the client
if (p) tcp_abort(p); // wich in turn calls tcp_abandon()
return(true);
}

Using this optional API the server works hours and hours without crashing. No lack of heap memory even with very frequent clients accesses to the server.

What would be your opinion on adding such an API in WiFiClient class ? Would you accept such a pull request if I'd propose it ?

@mcspr
Copy link
Collaborator

mcspr commented Nov 28, 2022

Ok, lets add that. But

  • just void abort(); and call _client->abort(); notice ClientContext::abort()
  • comment difference between stop() and abort()
  • update doc/ for our readthedocs

@RobertGnz RobertGnz closed this by deleting the head repository Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants