Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32 HardwareSerial Callbacks has variable time delay #2445

Open
Asanga-Viraj opened this issue Dec 10, 2021 · 12 comments
Open

ESP32 HardwareSerial Callbacks has variable time delay #2445

Asanga-Viraj opened this issue Dec 10, 2021 · 12 comments

Comments

@Asanga-Viraj
Copy link

Asanga-Viraj commented Dec 10, 2021

Hi,

My application is to implement a Serial2 handler to run serial protocol like ModBus on ESP32. My another microcontroller is polling for reply from ESP32 with baud rate 38400. The time between two packets is 25ms. I attached data received and transmit events like below.

HardwareSerial Serial2(2);

Serial2.setRxBufferSize(100);
Serial2.setTxBufferSize(100);
Serial2.setTxWait(true);
Serial2.onDataReceived(onDataCallback);
Serial2.onTransmitComplete(onTxCompleteDelegate);
Serial2.begin(38400);

I experienced that the receive complete event is not calling at the end of the receiving. I hooked up a oscilloscope and check the rx data line and receive event calling by toggling digital pin inside the event function. There is a variable delay from the end of the rx until calling the onDataCallback event. Also this issue is on transmit complete event as well.

One more thing. If I connected ESP32 RX2 with my other controller polling and initialise the Serial2 without 'Serial2.onDataReceived(onDataCallback);', ESP32 resets due to watch dog reset.

Thanks.

@Asanga-Viraj
Copy link
Author

The I went deep inside of the Sming framework. I think I found the issue. In the file Sming/Core/HardwareSrial.cpp line 184

	// If required, queue a callback
	if((status & serial->statusMask) != 0 && !serial->callbackQueued) {
		System.queueCallback(staticOnStatusChange, serial);
		serial->callbackQueued = true;
	}

The system quecallback takes variable time to call it. This calling time delay is too much for some protocol implementation like ModBus.

@teicors
Copy link

teicors commented Dec 10, 2021

If I'm not out of topic, I'm having a similar behavior with ModBus and a basic html page: when I "wget" the jquery.js.gz from spiffs, I'm having this result

jquery.js.gz 100%[=============================>] 29,45K 5,04KB/s in 5,8s

and this from the debug window

34209686 TCP 3fff2b28 onReadyToSendData: 2
34210643 TCP 3fff2b28 Written: 1024, Available: 1390, isFinished: 0, PushCount: 1
34212718 TCP 3fff2b28 Written: 366, Available: 366, isFinished: 0, PushCount: 2
34220475 TCP 3fff2b28 sent: 1390
34222476 TCP 3fff2b28 onReadyToSendData: 2
34236003 TCP 3fff2b28 Written: 1024, Available: 1390, isFinished: 0, PushCount: 1
34236938 TCP 3fff2b28 Written: 366, Available: 366, isFinished: 0, PushCount: 2
34241910 TCP 3fff2b28 sent: 1390
34243493 TCP 3fff2b28 onReadyToSendData: 2
34247954 TCP 3fff2b28 Written: 1024, Available: 1390, isFinished: 0, PushCount: 1
34254714 TCP 3fff2b28 Written: 366, Available: 366, isFinished: 0, PushCount: 2
34262567 TCP 3fff2b28 sent: 1390

I'm using an ESP8266 with ModBus connected to the first serial and the debug (as in the example) at the second serial. The code is a simple join of the modbus example and a spiffs area where are placed an index.html, jquery.js.gz and bootstrap.css.gz
I thought an error in my code, but looking at this message, I think is not my fault.

With another my project non using ModBus,I catch this result with the wget
jquery.js 100%[===========================================>] 84,29K 274KB/s in 0,3s

@mikee47
Copy link
Contributor

mikee47 commented Dec 10, 2021

I found a bug! Try this patch and see if it fixes your problem. Run make Sming-build to pick up the change.

diff --git a/Sming/Arch/Esp32/Components/driver/uart.cpp b/Sming/Arch/Esp32/Components/driver/uart.cpp
index b6cd59f0c..78a14e48c 100644
--- a/Sming/Arch/Esp32/Components/driver/uart.cpp
+++ b/Sming/Arch/Esp32/Components/driver/uart.cpp
@@ -236,7 +236,7 @@ void IRAM_ATTR uart_isr(smg_uart_instance_t* inst)
 			if(usis.rxfifo_ovf) {
 				uart_ll_disable_intr_mask(dev, UART_INTR_RXFIFO_OVF);
 			} else if(read == 0) {
-				uart_ll_ena_intr_mask(dev, UART_INTR_RXFIFO_FULL | UART_INTR_RXFIFO_TOUT);
+				uart_ll_disable_intr_mask(dev, UART_INTR_RXFIFO_FULL | UART_INTR_RXFIFO_TOUT);
 			}
 		}
 

@mikee47 mikee47 changed the title ESP32 HadwareSerial Callbacks has variable time delay ESP32 HardwareSerial Callbacks has variable time delay Dec 10, 2021
@mikee47
Copy link
Contributor

mikee47 commented Dec 10, 2021

If you need tight timing on the callback (e.g. switching modbus transceiver from TX to RX) then use interrupt callbacks by calling HardwareSerial::setUartCallback(mySerialCallbackHandler). The callback might look something like this:

void handleReceiveEvent(uint32_t status)
{
  // ...
}

void IRAM_ATTR mySerialCallbackHandler(smg_uart_t* uart, uint32_t status)
{
  // Revert back to receive when transmit completes
  if(status & UART_STATUS_TXFIFO_EMPTY) {
    DigitalWrite(MODBUS_DIR, IN);
  }

  // Handle receive events
  if(status & (UART_STATUS_RXFIFO_FULL | UART_STATUS_RXFIFO_TOUT | UART_STATUS_RXFIFO_OVF)) {
    System.queueCallback(handleReceiveEvent);
  }
}

@mikee47
Copy link
Contributor

mikee47 commented Dec 10, 2021

Also worth doing a search for 'modbus' in previous issues, there's been some discussion about this issue.

@slaff
Copy link
Contributor

slaff commented Dec 10, 2021

I found a bug!

@mikee47 Can you send a PR with the fix?

@Asanga-Viraj
Copy link
Author

Asanga-Viraj commented Dec 13, 2021

Thanks everyone for the support.

I replaced default Sming uart call back using HardwareSerial::setUartCallback(mySerialCallbackHandler). Still got the same variable delay (sometimes more than 30ms) when calling of System.queueCallback(handleReceiveEvent); handleReceiveEvent.
The same delay issues gets because the function handleReceiveEvent is calling over System que task. Then I replaced system que with timer calling eventTimer.initializeUs(400, handleReceiveEvent).startOnce();. The serial ModBus replies as I expected.

According to my application, the ESP32 serve a web page over WiFi STA mode. When handleReceiveEvent is not calling over system queue event (over a timer), I did browsing web page over LAN. The ESP resets and shows like below. If I turned of ModBus polling, ESP doesn't reset and web page works well.

1751276 '/ajax/data' registered
1751362 '/ajax/hrbeat' registered
1751455 '/favicon.ico' registered
1753854 '*' registered
1756173 === STA WEB SERVER STARTED ===
1759358 10.150.0.189
1761274 ==========================

I (2189) esp_netif_handlers: sta ip: 10.150.0.189, mask: 255.255.254.0, gw: 10.150.0.254
19468701 open('device_logo.png.gz'): NOT_FOUND (-11002)
19469104 File 'device_logo.png.gz' open error: NOT_FOUND
Re-enable cpu cache.
Guru Meditation Error: Core  1 panic'ed (IllegalInstruction). Exception was unhandled.
Memory dump at 0x401140e8: f01d2028 82004136 889c3102
Core  1 register dump:
PC      : 0x401140ee  PS      : 0x00060031  A0      : 0x80086f1c  A1      : 0x3ffb0b80
A2      : 0x00000100  A3      : 0x00000111  A4      : 0x3ffccc51  A5      : 0x000000ff
A6      : 0x00000000  A7      : 0x00000004  A8      : 0x80085da1  A9      : 0x3ffb0b60
A10     : 0x00000000  A11     : 0x00000000  A12     : 0x00000190  A13     : 0x00000000
A14     : 0x40085d08  A15     : 0x00000000  SAR     : 0x00000000  EXCCAUSE: 0x00000000
EXCVADDR: 0x00000000  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0x00000000

Backtrace:0x401140eb:0x3ffb0b80 0x40086f19:0x3ffb0ba0 0x40081d39:0x3ffb0c50 0x400825ce:0x3ffafee0 0x400866fc:0x3ffaff00


ELF file SHA256: 46373f51be37f735

Entering gdb stub now.
$T04#b8

I think (not sure), the problem is not in the UART ISR, it is in the System queues. System queue takes more than 30ms to call a CallBack function and it is varying even we have fast CPU on the board.

@slaff
Copy link
Contributor

slaff commented Dec 13, 2021

@mikee47
Copy link
Contributor

mikee47 commented Dec 13, 2021

@Asanga-Viraj Or run in the debugger https://smingdev.readthedocs.io/en/latest/arch/esp32/debugging/index.html. Re-build first though:

make clean components-clean
make -j ENABLE_GDB=1
make flashapp gdb

@mikee47
Copy link
Contributor

mikee47 commented Dec 13, 2021

I've done some basic checks on system task vs. timer and cannot as yet see any difference. @Asanga-Viraj Can you push your app. or a simplified version to github? I can then investigate further...

@Asanga-Viraj
Copy link
Author

Asanga-Viraj commented Dec 13, 2021

Thanks everyone. Sorry for my late reply.

I called UART interrupts Tx complete delegate and RX complete delegate directly without going through the system queues. This was happened when ESP trying to send binary files to the web page on requests. I re-run the issue and got this back trace.

0x40115eef: HardwareUart::getStatus() at ??:?
0x40086e9d: (anonymous namespace)::uart_isr((anonymous namespace)::smg_uart_instance_t*) at c:/tools/Sming/Sming/Arch/Esp32/Components/driver/uart.cpp:278
0x40081d39: _xt_lowint1 at C:/tools/esp-idf/v4.3/components/freertos/port/xtensa/xtensa_vectors.S:1105
0x400825ce: spi_flash_op_block_func at C:/tools/esp-idf/v4.3/components/spi_flash/cache_utils.c:114
0x40086680: ipc_task at C:/tools/esp-idf/v4.3/components/esp_ipc/ipc.c:62

This make sense. Uart interrupted the flash operations because Uarts interrupt is not going over System queue.
I fixed this issue by stopping the Uart interrupts (disabling Serial) for 2 second when HTTP server requests files.

Sorry, I don't have ESP debug tool now. I will buy one.

My initial question was System queues are not fast. It takes average time 30ms and it varies between 5ms-50ms. I think this is nothing to do with Sming framework. It is inside of the ESP FreeRTOS. For now, I can handle it.

My app consist of lots of things. I will upload simplified version later.

@mikee47
Copy link
Contributor

mikee47 commented Nov 19, 2024

So I'm working on an esp32-s2 application which services interrupts via the task queue, and running into the 'slow task queue' issue here.

When the interrupt service routine is triggered I queue a task to defer processing, and this appears to take exactly 10ms. This is exactly one FreeRTOS tick. The best resolution is 1ms which can be changed running make sdk-menuconfig and setting FREERTOS_HZ to the maximum 1000.

The alternative is to use a short-lived SimpleTimer instead. These software timers are serviced from a hardware timer interrupt so do not depend on scheduling. They do! The hardware timer interrupt uses the task queue as well, and yet it doesn't have the latency issues. I wonder if it's something to do with direct-to-task notifications which the IDF uses for the timer task.

https://github.com/espressif/esp-idf/blob/master/components/esp_timer/src/esp_timer.c#L456
https://www.freertos.org/Documentation/02-Kernel/02-Kernel-features/03-Direct-to-task-notifications/01-Task-notifications

With PR #2913 networked applications behave as described in this issue, whilst non-networked applications do not. So could it be the tcpip task which is responsible for this difference? The wifi task runs at the highest priority so it seems more likely that it is the cause of this delay. But that doesn't explain why timers don't have the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants