Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32: Port net module to netconn API. #2047

Merged
merged 4 commits into from
Jul 22, 2017

Conversation

devsaurus
Copy link
Member

Fixes #1970, #2014, #2044.

  • This PR is compliant with the other contributing guidelines as well (if not, please describe why).
  • I have thoroughly tested my contribution.
  • The code changes are reflected in the documentation at docs/en/*.

This PR ports the net module from raw API to netconn API in order to safeguard the inter-process communication between the NodeMCU task and lwip's tcpip_thread tasks. Refer to the Multithreading section in http://git.savannah.nongnu.org/cgit/lwip.git/tree/doc/rawapi.txt.

While this generated a high number of changes in the source file, the overall code structure almost completely remained the same:

  • calls to tcp_* and udp_* functions were replaced with their netconn_ counterparts
  • events are instantly routed through the task interface to their l*_cb functions
  • high-level DNS functionality untouched

The netconn API provides only one single callback function for event handling to the application code in contrast to the dedicated callbacks of tcp_accept, tcp_send etc. This generated the main structural difference since all events are fed into this single callback function first and have to be de-muxed to the specific handlers (which already existed and were re-used).

I tried to consider @djphoenix's results regarding sockets in TIME_WAIT state from #1838 but could only implement one of the two aspects: LWIP_SO_REUSE is enabled by default in sdkconfig.defaults but there's currently no clean way to decrease TCP_MSL in the sdk source tree. There's pending PR espressif/esp-idf#783 to expose TCP_MSL in menuconfig - once it's done we can set it to 5000 in sdkconfig.defaults as well.
Thus expect the default timeout of ~2 minutes until a closed server socket is finally deleted and heap is freed. Manually modify sdk/esp32-esp-idf/components/lwip/include/lwip/lwip/priv/tcp_priv.h if you need a workaround.

@devsaurus devsaurus requested a review from jmattsson July 18, 2017 19:08
Copy link
Member

@jmattsson jmattsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great, just a handful of comments and questions. Looks like using netconn simplifies things in this module, which is a nice change. Dare I ask about the whole PLUS / MINUS thing of the events? Or am I better off not knowing? :)

{
lnet_event *ev = (lnet_event *)malloc (sizeof (lnet_event) + p->len);
lnet_event *ev = (lnet_event *)malloc (sizeof (lnet_event) /*+ p->len*/);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nuke the + p->len completely? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course!


switch (type) {
case TYPE_TCP_CLIENT:
ud->client.cb_connect_ref = LUA_NOREF;
ud->client.cb_reconnect_ref = LUA_NOREF;
ud->client.cb_disconnect_ref = LUA_NOREF;
ud->client.hold = 0;
ud->client.num_held = 0;
ud->client.connecting = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ud->client.closing also be initialised here, or is that only relevant for UDP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's initialized for TCP by fall-through to UDP case ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, missed the fall-through, sorry!

{
SYS_ARCH_DECL_PROTECT(lev);

if (!netconn) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest moving the early return up above the SYS_ARCH_DECL_PROTECT - at first reading it had me worried about leaving something locked.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

// connection established, trigger Lua callback
ud->client.connecting = false;
post_net_connected(ud);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line left intentionally blank? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, to take a deep breath 😁

// new connection available from netconn_listen()
if (ud->netconn &&
ud->self_ref != LUA_NOREF &&
ud->server.cb_accept_ref != LUA_NOREF) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we haven't got the self_ref or cb_accept_ref? Do we leak connections? Should we do accept+immediate-close if we can't accept for real?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, good point, maybe. Never checked that. Is it possible to trigger this corner case at all? Not with net.listen(), it enforces a callback. Probably with a race when a client connects while the sever is being closed...

I can't accept right here, would need to feed this event down the complete task pipeline and handle in laccept_cb(). I observed nasty things happening when netconn_ functions are called from within lwip's thread. Symptoms are

I (25309) wifi: bcn_timout,ap_probe_send_start
I (27819) wifi: ap_probe_send over, resett wifi status to disassoc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the check for self_ref since it's not used at all in laccept_cb. It would open up a race condition in net_listen() when netconn_listen() was executed before the ref is actually set.

The check for cb_accept_ref fails when the server is in the process of being closed / shut down. Not accepting is ok in this case since the pending client request was stored in the server's netconn accept-mailbox and is going to be tcp_abort'ed & freed when the server netconn is finally closed and deleted.

if (ud->client.cb_sent_ref != LUA_NOREF) {
lua_rawgeti(L, LUA_REGISTRYINDEX, ud->client.cb_sent_ref);
lua_rawgeti(L, LUA_REGISTRYINDEX, ud->self_ref);
lua_call(L, 1, 0);
}
} else if (ud->type == TYPE_TCP_CLIENT) {
err = tcp_write(ud->tcp_pcb, data, datalen, TCP_WRITE_FLAG_COPY);
size_t bytes_written;
err = netconn_write_partly(ud->netconn, data, datalen, NETCONN_COPY, &bytes_written);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably bytes_written should be checked and we should do something if we can't send it all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to check what happens in this case.

if (ud->netconn && ud->type == TYPE_TCP_CLIENT && !ud->client.connecting) {
ud->client.connecting = true;
netconn_connect(ud->netconn, addr, ud->port);
} else if (!ud->netconn && ud->client.wait_dns == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you really want ud->netconn, not !ud->netconn here!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's existing logic, I just replaced !ud->pcb with !ud->netconn.
Not sure which corner case is covered here: no pcb/netconn and no pending dns request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I misread that line! I somehow saw it as !ud->netconn && ud->netconn.wait_dns and thus took the first part to be a null pointer guard.


if (p) {
netbuf_delete(p);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does payload need to be free'd explicitly, or is it linked to the netbuf?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latter. netbuf_data() just provides the data pointer and length of the embedded pbuf, while netbuf_delete() frees both the netbuf and the pbuf.


case NETCONN_EVT_ERROR:
post_net_err(ud, netconn_err(ud->netconn));
ud->client.closing = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this flag being set, but I don't see us acting on it to force a close. Is that not needed?

Copy link
Member Author

@devsaurus devsaurus Jul 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netconn triggers the callback multiple times in case of an error and when the connection goes down in general. The flag is used some lines above to prevent us from further posting to the Lua task. Lua cbs would operate on outdated/closed/freed userdata otherwise.

if (!netconn) return;

SYS_ARCH_PROTECT(lev);
if (netconn->socket < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing initialises netconn->socket to -1 from what I can see? Or does the netconn layer do that implicitly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's even a "feature" of netconn to support the socket layer.
I borrowed this logic to record early receive events from there.

@devsaurus
Copy link
Member Author

devsaurus commented Jul 19, 2017

Thanks for the review, Johny. I'll follow up on the pending topics soon.

  • accept event triggered on incomplete userdata
  • stress/corner tests against server sockets
    use closing flag also for server sockets to block any callback activity upon error or intentional close
  • what if netconn_write_partly() actually writes partly?

Dare I ask about the whole PLUS / MINUS thing of the events?

The PLUS/MINUS event mimic appears to be specifically tailored to the needs of the socket API. netconn layer makes assumptions about the upper layer's implementation with select() on file descriptors etc. and delivers events accordingly.

  • SENDPLUS / SENDMINUS are signaled to toggle a binary flag in socket layer, upon specific events from raw API.
  • RCVPLUS / RCVMINUS shall increment / decrement an event counter in socket upon other specific events.

Raw events from tcp_error(), tcp_sent(), tcp_accept() etc. are muxed into this scheme and one needs to understand the socket layer implementation to make sense out of netconn's events.

Or am I better off not knowing? :)

Err, this was the worst part in the porting job, really 😃

@devsaurus
Copy link
Member Author

I've pushed the last update to your review feedback. net.send() will throw an error now if payload is not completely queued to lwip since this situation can't be handled cleanly with the current design. We would need to ref the data strings in a list and unref from the complementing sent callback.
This is overkill IMO and is not needed if the application sticks to the guideline to call the next net.send() from the callback and not call send multiple times in a row.
However, if a valid use case proves me wrong then net_send() would need to ref the string objects until the payload is consumed.

@jmattsson jmattsson merged commit 977dab2 into nodemcu:dev-esp32 Jul 22, 2017
@devsaurus devsaurus deleted the esp32-netconn branch July 22, 2017 08:55
@AadiMehta
Copy link

Can someone please point me towards the documentation of this newly committed net module for esp32. Thanks!

@jmattsson
Copy link
Member

It's the same Lua interface as before, just the underlying implementation that changed. Just keep on using net :)

@marcelstoer
Copy link
Member

marcelstoer commented Jul 25, 2017

Except there's no net.md in https://github.com/nodemcu/nodemcu-firmware/blob/dev-esp32/mkdocs.yml ... because it doesn't exist yet.

@devsaurus
Copy link
Member Author

devsaurus commented Jul 25, 2017

Except there's no net.md

See #2053.

@marcelstoer
Copy link
Member

@AadiMehta
Copy link

@devsaurus @marcelstoer Thanks a lot! I am in the process of migrating nodemcu lua code from ESP8266 to ESP32, and this helps a lot to debug issues. Also eagerly waiting for other modules support in ESP32!
Thanks a lot 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants