Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Random freezes during printing over serial connection (skr 1.4 turbo) #21010

Open
m-kozlowski opened this issue Feb 6, 2021 · 255 comments
Open

Comments

@m-kozlowski
Copy link

Bug Description

I'm experiencing radom freezes when printing with Octoprint over serial connection.
Using SKR 1.4 turbo, RPI4, directly connected to TFT serial interface (SERIAL_PORT 0).
Running bugfix-2x 004bed8
I am not able to identify any specific circumstances of the error - sometimes it happens on first layer, and sometimes after few hours of printing the same gcode.
Print head stops, screen (ulti controler) stops refreshing, communication freezes (there is "timeout" message in octoprint terminal window).

I'm suspecting it's related to recent serial changes, because:

  • I'm not experiencing this issue when printing from SD card
  • I've reverted to 811bb79 and so far it seems to be working fine.

Configuration Files

Configuration.zip

@rhapsodyv
Copy link
Member

Can you test using another cable and another computer?

Skr uses usb serial, that is a lot stable than normal hardware serial. In the past, some users reported something similar and the issue was in the host.

@m-kozlowski
Copy link
Author

Sure, will try. But it will be another rpi4 as I don't have anything else with hardware serial at hand.

However I'm not convinced if it's host to blame. It has printed hundreds of hours in this setup. And right now it's finishing ~18 hour print running downgraded Marlin.

@rhapsodyv
Copy link
Member

@X-Ryl669 can it be related with #20783 and #20932 ?

@rhapsodyv
Copy link
Member

Which version of marlin is working fine for you?

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Feb 8, 2021

What happen when the serial is freezing ? Do you have emergency parser enabled (previously) ? If yes, can you send some emergency commands to see if it react ?

Can you post the last serial output log so we can see what's the last message that worked ?

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Feb 8, 2021

I've read your configuration file.

I'm trying to validate some hypothesis here:

  1. Checking if the printer is crashed when it happens
  2. Checking if the g-code parser is waiting for more data and is struck
  3. Checking if my changes have changed the behavior of the printer

For 1:

  • Please pull Add post mortem debugging #20492 PR and build it (you'll need to enable MARLIN_DEV_MODE and POSTMORTEM_DEBUGGING in Configuration_adv.h) then when it crashes, copy the output from the serial output here

For 2:

  • Enable emergency parser in your Configuration_adv.h
  • When it happens, try to send a M108 manually and see if it's answering

For 3:

  • If none of the previous test are successful, please say so, I'll try to contact you directly to send you modified version of sources file to spot what happens.

@m-kozlowski
Copy link
Author

Applied changes as described above and started ~4 hours print with serial communication logging enabled.
Will report back.

In the meantime: I ran few short prints overnight, and was unable to trigger issue. Both on new-rpi4 and old-rpi4.
The only thing that changed is that i plugged USB cable in, while still printing on hardware serial.
Not much to base any conclusions, but it might be some grounding issue (?). Gonna investigate further.

@m-kozlowski
Copy link
Author

Today I haven't been able to repeat the problem on any of the two raspberry pi's and any version of Marlin.
I'll leave MARLIN_DEV_MODE and EMERGENCY_PARSER enabled and reopen the ticket if I capture anything.
Meanwhile, I am closing the ticket, recognizing that it could be a wiring problem.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Feb 8, 2021

Or it's a bug that's disappearing when EMERGENCY_PARSER is enabled... Feel free to reopen if it re-appear once you disable emergency parser.

@m-kozlowski
Copy link
Author

Happened again.
Bad news is I was running 9d24ee8 without "post mortem" PR.
Better news is that EMERGENCY_PARSER and MARLIN_DEV_MODE was on.

Printer does not react to M108.
Last lines of serial communication:

Send: N119024 M117 98% L=225/240*98
Recv: //action:notification 98% L=225/240
Recv: ok N119024 P0 B3
Send: N119025 G1 F3840.000*97
Recv: ok N119025 P1 B3
Send: N119026 G1 X109.294 Y110.515 E-0.19172*120
Recv: ok N119026 P0 B3
Send: N119027 G1 X109.518 Y110.953 E-0.18245*114
Recv: ok N119027 P0 B3
Send: N119028 G1 X109.663 Y111.429 E-0.18478*123
Recv: ok N119028 P0 B3
Send: N119029 G1 X109.706 Y111.711 E-0.10580*126
Recv: ok N119029 P0 B3
Send: N119030 G1 X109.722 Y112.000 E-0.10737*122
Recv: ok N119030 P0 B3
Send: N119031 G1 X109.706 Y112.289 E-0.10737*126
Recv: ok N119031 P0 B3
Send: N119032 G1 X109.663 Y112.571 E-0.10580*113
Recv: ok N119032 P0 B3
Send: N119033 G1 X109.518 Y113.047 E-0.18478*112
Recv: ok N119033 P0 B3
Send: N119034 G1 X109.294 Y113.485 E-0.18245*118
Recv: ok N119034 P0 B3
Send: N119035 G1 X109.129 Y113.721 E-0.10706*117
Recv: ok N119035 P0 B3
Send: N119036 G1 X108.938 Y113.941 E-0.10789*112
Recv: ok N119036 P0 B3
Send: N119037 G1 X108.727 Y114.138 E-0.10724*119
Recv: ok N119037 P0 B3
Send: N119038 G1 X108.489 Y114.318 E-0.11077*127
Recv: ok N119038 P0 B3
Send: N119039 G1 X108.248 Y114.463 E-0.10426*127
Recv: ok N119039 P0 B3
Send: N119040 G1 X107.790 Y114.659 E-0.18490*112
Recv: ok N119040 P0 B3
Send: N119041 G1 X107.507 Y114.735 E-0.10892*112
Recv: ok N119041 P0 B3
Send: N119042 G1 X107.218 Y114.781 E-0.10849*115
Recv: ok N119042 P0 B3
Send: N119043 G1 X106.925 Y114.796 E-0.10868*115
Recv: ok N119043 P0 B3
Send: N119044 G1 X106.633 Y114.781 E-0.10868*122
Recv: ok N119044 P0 B3
Send: N119045 G1 X106.348 Y114.736 E-0.10704*123
Recv: ok N119045 P0 B3
Send: N119046 G1 X105.875 Y114.591 E-0.18365*122
Recv: ok N119046 P0 B3
Send: N119047 G1 X105.613 Y114.469 E-0.10731*126
Recv: ok N119047 P0 B3
Send: N119048 G1 X105.365 Y114.320 E-0.10717*123
Recv: ok N119048 P0 B3
Send: N119049 G1 X104.980 Y114.008 E-0.18388*121
Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
Send: N119050 M105*27
Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
Send: N119051 M105*26
Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
Send: N119052 M105*25
Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
Send: N119053 M105*24
Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
Send: N119054 M105*31
No response from printer after 6 consecutive communication timeouts, considering it dead. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
Changing monitoring state from "Printing" to "Offline (Error: Too many consecutive timeouts, printer still connected and alive?)"
Connection closed, closing down monitor

I have st-link v2 plugged in that I'm using to flash firmware, but I guess it can be used to do some debugging if someone guide me. I can leave the printer in current state for a while.

@m-kozlowski m-kozlowski reopened this Feb 11, 2021
@X-Ryl669
Copy link
Contributor

Ok, so I hope you've openocd installed. In that case, you'll have to start the server:

$ ~/.platformio/packages/tool-openocd/bin/openocd -d2 -s ~/.platformio/packages/tool-openocd/scripts -f interface/stlink.cfg  -f target/stm32f1x.cfg

Replace stm32f1x.cfg by the configuration for your CPU

Then in another terminal, run gdb like this:

$ ~/.platformio/packages/toolchain-gccarmnoneeabi/bin/arm-none-eabi-gdb /path/to/your/marlin/repository/.pio/build/your_env/debug/firmware.elf

While in GDB, type:

(gdb) tar ext:3333
Remote debugging using :3333
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x0801b272 in ?? ()
(gdb) bt

And post the output of the commands above

If you are a bit lost, I've written a post about how to debug Marlin and there for setting up OpenOCD.
Skip the part about unlocking the flash since you don't have a STM32.

@m-kozlowski
Copy link
Author

m-kozlowski commented Feb 11, 2021

(there's no debug subdir, so I'm using /usr/src/Marlin/.pio/build/LPC1769/firmware.elf)

(gdb) bt
#0  0x000019a8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) info frame
Stack level 0, frame at 0x10007fc0:
 pc = 0x19a8; saved pc = <not saved>
 Outermost frame: previous frame identical to this frame (corrupt stack?)
 Arglist at 0x10007fc0, args:
 Locals at 0x10007fc0, Previous frame's sp is 0x10007fc0

(gdb) info symbol 0x000019a8
No symbol matches 0x000019a8.
(gdb) disassemble /m 0x000019a8
No function contains specified address.

@X-Ryl669
Copy link
Contributor

So, it's useless... 😢 Your printer has crashed. You'll have to use #20492 to get a stack trace to print on the serial port (you can directly pull this, it was updated with bugfix recently). Also, please compile your firmware with platformio debug -e your_env instead of platformio run -e so the debug symbol are built in the elf file and will be useful for debugging.

@X-Ryl669
Copy link
Contributor

Just a note for developers: by default, the crash handlers in the framework don't remember the crash's stack trace. So until #20492 is merged (which does remember the stack trace and allow backtracing), we're blind when such crash happens.

@Snargaloo
Copy link

I am experiencing a similar if not exact issue. It took me a few weeks to isolate this. I know if I use 2.0.x-bugfix at commit c929fb5 everything prints perfectly.

I am unable to successfully compile again until commit c74f972. From this commit forward I experience issues with serial communication that ultimately cause some prints to fail. Not every type of print will fail. Models with straight lines and sharp corners will complete. Models with curves or circles will experience many pauses and exhibit blobs on the surface of the model.

I have used the Arc Welder OctoPrint plugin for a few months and have experienced pleasing results with it. If I enable Arc Welder compression, the print will most likely fail.

In order to find which commit was causing the issue, I had to find a situation that would duplicate the failure consistently.

This is the description of the hardware I am using:

I have a BTT SKR 1.4 Turbo connected to a BTT TFT-35 E3 V3.0 which is connected to an RPi4. I have the RPi4 Bluetooth device disabled and the first PL011 (UART0) is the primary UART. The RPi4 is connected to the TFT from internal GPIO pins on RPi4 to WIFI connector on TFT. The WIFI connector on the TFT automatically echos all data coming to and from the serial connection from the TFT to the SKR 1.4 Turbo.

For software, I do not use the OctoPi image on the RPi4. I use a current build of RaspberryPi OS with a pure Python3 environment. I connect to the printer using /dev/ttyAMA0. I use the current release of the Arc Welder plugin (1.1.0rc2). I have Meatpack enabled in Marlin, however, I have it disabled in OctoPrint since it won't allow OctoPrint to connect to the printer with my setup (See OctoPrint-Meatpack Issue #12).

These are the configuration files that are working:
BTT SKR 1.4 Turbo: Marlin bugfix-2.0.x c929fb5 with these changes: changes_for_c929fb5.zip
BTT TFT-35 E3 V3.0: BIGTREETECH-TouchScreenFirmware 923c5e0 with these changes: changes_for_923c5e0.zip

All the commits after the above for Marlin will not allow my printer to function properly. I am attaching the most recent configuration I tested that failed:
BTT SKR 1.4 Turbo: Marlin bugfix-2.0.x cbf325a with these changes: changes_for_cbf325a.zip
BTT TFT-35 E3 V3.0: BIGTREETECH-TouchScreenFirmware 923c5e0 with these changes: changes_for_923c5e0.zip

I will attach the model I was using for testing along with the sliced gcode and the arc-welded sliced gcode:

This is the model STL: Z bracket spacer V3.zip
I use the current development branch of Cura that I compile from source. This is the sliced model g-code: Z_bracket_spacer_V3_eSun_PETG_(Translucent_Orange)_FR8939-.zip
This is the Cura sliced model after it is processed through Arc-Welder: Z_bracket_spacer_V3_eSun_PETG_(Translucent_Orange)_FR8939-.aw.zip

To duplicate the failure, I simply load and print the Arc-Welded gcode in OctoPrint. The result is around line 1200, the serial communication gets out of sync and eventually results in serial timeout. I have attached a terminal log file showing the situation from the terminal view of OctoPrint. The error occurs at line 1212 this time (not always same line): Terminal.zip. The TFT will show a popup with the line mismatch and will beep continuously. I need to cycle power at the printer to recover.

I can provide any additional information or perform testing as requested.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Feb 14, 2021

Right, from the log, it seems that Octoprint continue feeding lines to the printer while the printer asked to resend. At some point, it fails. Can you post the same terminal output with the previous version (c929fb5) ?
The new code shouldn't have changed much from the serial parsing and processing.

The strange thing in the log are:

  1. The printer asks to resend N1213, (it told octoprint to wait when it was sent initially, but octoprint ignored the message)
  2. It then ACK a old line multiple time (N1181) with different G-code each time, so it's like it's not making any progress to the receiving buffer...

Meanwhile, if you can pull #20492 and build that with both "MARLIN_DEV_MODE" and "POSTMORTEM_DEBUGGING" enabled in Configuration_adv.h, when the crash will happen, it should dump the backtrace of the printer's crash so we'll have a better idea about where it failed.

Please notice that the OP had his printer crashed and not doing anything anymore, yours seems to run an emergency/error condition but it still run, so the above might seem less relevant.

@X-Ryl669
Copy link
Contributor

Ok, I think I have an idea. I'll send you a file to replace and test with it later today.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Feb 15, 2021

Please apply the following patch on the current bugfix:

diff --git a/Marlin/src/core/language.h b/Marlin/src/core/language.h
index 923ad903cb..a813ec4ba1 100644
--- a/Marlin/src/core/language.h
+++ b/Marlin/src/core/language.h
@@ -131,6 +131,7 @@
 #define STR_WATCHDOG_FIRED                  "Watchdog timeout. Reset required."
 #define STR_ERR_KILLED                      "Printer halted. kill() called!"
 #define STR_ERR_STOPPED                     "Printer stopped due to errors. Fix the error and use M999 to restart. (Temperature is reset. Set it after restarting)"
+#define STR_ERR_SERIAL_MISMATCH             "Serial status mismatch"
 #define STR_BUSY_PROCESSING                 "busy: processing"
 #define STR_BUSY_PAUSED_FOR_USER            "busy: paused for user"
 #define STR_BUSY_PAUSED_FOR_INPUT           "busy: paused for input"
diff --git a/Marlin/src/gcode/queue.cpp b/Marlin/src/gcode/queue.cpp
index c49247912c..063b9b3dfb 100644
--- a/Marlin/src/gcode/queue.cpp
+++ b/Marlin/src/gcode/queue.cpp
@@ -324,29 +324,19 @@ void GCodeQueue::flush_and_request_resend() {
   ok_to_send();
 }

-inline bool serial_data_available() {
-  byte data_available = 0;
-  if (MYSERIAL0.available()) data_available++;
-  #ifdef SERIAL_PORT_2
-    const bool port2_open = TERN1(HAS_ETHERNET, ethernet.have_telnet_client);
-    if (port2_open && MYSERIAL1.available()) data_available++;
-  #endif
-  return data_available > 0;
-}
-
-inline int read_serial(const uint8_t index) {
-  switch (index) {
-    case 0: return MYSERIAL0.read();
-    case 1: {
-      #if HAS_MULTI_SERIAL
-        const bool port2_open = TERN1(HAS_ETHERNET, ethernet.have_telnet_client);
-        if (port2_open) return MYSERIAL1.read();
-      #endif
+// Multiserial already handle the dispatch to/from multiple port by itself
+inline bool serial_data_available(uint8_t index = SERIAL_ALL) {
+  if (index == SERIAL_ALL) {
+    for (index = 0; index < NUM_SERIAL; index++) {
+      if (SERIAL_IMPL.available(index)) return true;
     }
-    default: return -1;
+    return false;
   }
+  return SERIAL_IMPL.available(index);
 }

+inline int read_serial(const uint8_t index) { return SERIAL_IMPL.read(index); }
+
 void GCodeQueue::gcode_line_error(PGM_P const err, const serial_index_t serial_ind) {
   PORT_REDIRECT(SERIAL_PORTMASK(serial_ind)); // Reply to the serial port that sent the command
   SERIAL_ERROR_START();
@@ -468,14 +458,32 @@ void GCodeQueue::get_serial_commands() {
     }
   #endif

-  /**
-   * Loop while serial characters are incoming and the queue is not full
-   */
-  while (length < BUFSIZE && serial_data_available()) {
+  // Loop while serial characters are incoming and the queue is not full
+  bool hasData = true;
+  while (length < BUFSIZE && hasData) {
+    // Unless a serial port has data, this will exit on next iteration
+    hasData = false;
+
     LOOP_L_N(p, NUM_SERIAL) {
+      // Is there data available on this specific port ?
+      if (!serial_data_available(p)) continue;
+
+      hasData = true; // Let's make progress

       const int c = read_serial(p);
-      if (c < 0) continue;
+      if (c < 0) {
+        // This should never happen, let's log it
+        PORT_REDIRECT(SERIAL_PORTMASK(p));     // Reply to the serial port that sent the command
+        #if BOTH(MARLIN_DEV_MODE, POSTMORTEM_DEBUGGING)
+          // Crash here to get more information why it failed
+          BUG_ON("SP available but read -1");
+        #else
+          SERIAL_ERROR_START();
+          SERIAL_ECHOLNPGM(STR_ERR_SERIAL_MISMATCH);
+          SERIAL_FLUSH();
+          continue;
+        #endif
+      }

       #if ENABLED(MEATPACK)
         meatpack.handle_rx_char(uint8_t(c), p);

Test it and let me know if it fixes it.
Please beware that this patch will not work if used with #20492 (it's missing some code I haven't submitted yet), so please test with the patch directly applied on bugfix, and if it does not work, checkout #20492, update your config_adv and reproduce the crash and post the crash trace.

Thanks!

@Snargaloo
Copy link

I ran the same Arc-Welded g-code again to firmware c929fb5 with the aforementioned changes. The print completed with no issues so I have attached a terminal log showing lines 1200 - 1300 as comparison: Terminal (working).zip.

I will now compile and run the pull #20492 and report results back here.

I received an ST-Link/V2 yesterday if this will be needed for further debugging.

@Snargaloo
Copy link

I just read your previous comments about the patch. I have applied this patch to bugfix 31a434b and noticed that there are some lines of code in the patch that read:
+ #if BOTH(MARLIN_DEV_MODE, POSTMORTEM_DEBUGGING)
+ // Crash here to get more information why it failed

This bugfix commit does not have POSTMORTEM_DEBUGGING option in Configuration_adv.h. Will this patch still perform as expected? Please advise if I should continue.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Feb 15, 2021

If it does not apply feel free to remove the #if to #else part. It's not important right now, as I haven't sent the BUG_ON macro yet for inclusion.

By looking at the log, it seems that the new code is lagging behind the host a lot compared to the old code. I can't explain it yet. BTW, thank you for your report, it's very detailed.

@Snargaloo
Copy link

Snargaloo commented Feb 16, 2021

I ran the same Arc-welded g-code to commit 31a434b with your patch applied for the total changes of: changes_for_31a434b.zip

The print failed in the typical fashion. This is the terminal log starting at line 1200: Terminal (Failure with patched 31a434b).zip

For a more thorough analysis, I decided to enable serial logging in OctoPrint and ran the same g-code again. An anomaly occurred during g29 probing and BL-Touch reported an error near center of bed. The firmware threw an error to OctoPrint which which closed the terminal connection. However, since I have G29_Retry_and_Recover enabled, the G29 process started over again and completed before the firmware executed the Halt function. This is the serial log from this run: serial (BL-Touch Error patched 31a434b).zip

I restarted the printer and repeated the print. This is the serial log: serial (Failure with patched 31a434b).zip

For comparison, I flashed back to the firmware compiled from c929fb5 and ran the same g-code. The print completed with no issues. This is the serial log: serial (Success with c929fb5).zip

Comparing the serial logs, it is apparent that the 31a434b commit (with patch) starts asking to host to wait during idle even before the print is sent over. The serial log from commit c929fb5 does not have the word wait in it even once.

@X-Ryl669
Copy link
Contributor

In the good log, there is this:

2021-02-15 15:02:07,721 - Recv: e��&m�¦üNot SD printing

and this:

2021-02-15 15:02:37,755 - Recv: Not SD printing
2021-02-15 15:02:37,757 - Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.

It seems that they are some issue with the communication reliability, and the new code does not succeed dealing with it.
Also, while playing the "match the 7 difference" game between the good and bad version, the major difference is that, in the good logs, the printer ACK each line almost immediately, while in the bad logs it's ACK'ing the line with 20 lines of lag.

Can you send me your firmware.elf file ?

As far as I understand, wait message comes from this:

  #if NO_TIMEOUTS > 0
    static millis_t last_command_time = 0;
    const millis_t ms = millis();
    if (length == 0 && !serial_data_available() && ELAPSED(ms, last_command_time + NO_TIMEOUTS)) {
      SERIAL_ECHOLNPGM(STR_WAIT);
      last_command_time = ms;
    }
  #endif

So length is zero when the host has not send any comment yet. last_command_time is zero too (until it receives a command). So it all boils down to serial_data_available() returning 0 in the new code and not in the previous code. I need the firmware.elf of the previous and new code so I can compare the generated binary and figure out the difference.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Feb 16, 2021

Another remark:

2021-02-15 12:53:56,362 - Send: N1 M115*39
[...]
2021-02-15 12:53:56,783 - Recv: ok N1 P63 B31
then
2021-02-15 12:53:56,803 - Send: M21
2021-02-15 12:54:00,820 - Recv: echo:No SD card
2021-02-15 12:54:00,826 - Recv: ok P63 B31
2021-02-15 12:54:00,828 - Recv: wait

If we look at the given timestamp, the first exchange is correct, the answer coming ~400ms after the query. In the second exchange, the answer comes 4 seconds after the query, so it's a lot larger than the timeout time (set to 1s in your config), so it's logical that the wait message is generated on next serial loop iteration. The real question comes down to why does processing M21 command took 4 seconds ?

EDIT: Seems like SD card initialization is slow, so it's expected.

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Feb 16, 2021

In the good log, the SD card detection also takes 4s to run:

2021-02-15 15:00:28,479 - Send: M21
2021-02-15 15:00:32,503 - Recv: echo:No SD card
2021-02-15 15:00:32,511 - Recv: ok P63 B31

2021-02-15 15:00:32,515 - Send: M155 S2
2021-02-15 15:00:32,575 - Recv:  T:29.37 /0.00 (3870.0000) B:35.16 /0.00 (3824.0000) @:0 B@:0
2021-02-15 15:00:32,578 - Recv: Not SD printing
2021-02-15 15:00:32,594 - Recv: ok P63 B31

Yet no wait is generated, and from the logic of the code, it should have been.

@Snargaloo
Copy link

It seems compiling triggered the new sanity check. Here is the log: Terminal (Failure with patched a73cff8).txt

@X-Ryl669
Copy link
Contributor

X-Ryl669 commented Mar 12, 2021

Yep, it means that you've made a mistake in your configuration. You're defining MP on serial port 2 but you don't have a serial_port_2 defined. You can skip the patch, it's now merged in bugfix.

@Snargaloo
Copy link

In Configuration.h, I have:
#define SERIAL_PORT 0
#define SERIAL_PORT_2 3

@X-Ryl669
Copy link
Contributor

I was under the impression that LCD_SERIAL was for displays that required Marlin to draw graphics on them.

Maybe you are right, I don't know what it's used for (too many configuration for me to grasp). In that case, I'll have to add a SERIAL_PORT_3 macro to the configuration... 😞

@Snargaloo
Copy link

Snargaloo commented Mar 12, 2021

This is my current git stash: changes_for_a73cff8.zip

Is there something mis-configured? I had both TFT and RPI communicating.

@X-Ryl669
Copy link
Contributor

There's something mixed up. The test for the error is "MP_on_sp2 && !SP2"

@Snargaloo
Copy link

I was under the impression that LCD_SERIAL was for displays that required Marlin to draw graphics on them.

Maybe you are right, I don't know what it's used for (too many configuration for me to grasp). In that case, I'll have to add a SERIAL_PORT_3 macro to the configuration... 😞

On the bright side, this will benefit users with an SKR 1.4/Turbo that are suffering from a different issue. The SKR 1.4 Turbo kit I purchased came with an ESP-032 module that plugs into the WIFI socket of the SKR. It was intended for the host to connect to USB, TFT-35 to connect to TFT port, and ESP-032 to connect to WIFI port. I believe most people gave up on the ESP-032 module early since Marlin could not support all three ports at once.

@Snargaloo
Copy link

There's something mixed up. The test for the error is "MP_on_sp2 && !SP2"

I removed the patch and rebased to the current bugfix. Perhaps something happened during the patch apply. Compilation did not trigger the new sanity check this time. But it did fail. Here is the log: Terminal (Failure with 3107d8a).txt

1 similar comment
@Snargaloo
Copy link

There's something mixed up. The test for the error is "MP_on_sp2 && !SP2"

I removed the patch and rebased to the current bugfix. Perhaps something happened during the patch apply. Compilation did not trigger the new sanity check this time. But it did fail. Here is the log: Terminal (Failure with 3107d8a).txt

@X-Ryl669
Copy link
Contributor

Try #21336 it should fix the build issue.

@Snargaloo
Copy link

PR #21336 successfully compiled. I will now attempt to connect with Meatpack enabled and try a test print.

@Snargaloo
Copy link

I have TFT-35 connected to SERIAL_PORT and RPi connected to SERIAL_PORT_2. I have MEATPACK_ON_SERIAL_PORT_2 enabled and MP_DEBUG enabled. Meatpack plug-in enabled in OctoPrint. I performed a test print with the troublesome print I used previously. The print completed successfully.

This is the serial.log: serial (Success with PR_21336).log

The TFT did not lock up and remained functional.

I presume the odd output of the serial.log is due to MP_DEBUG?

I did notice some sort of corruption at:
2021-03-12 11:28:08,099 - Recv: RB: h�H%¥�RRz­�9å�AÙÍ� ÍÅ)ÿRB: N

The corruption occurred between the time the bed finished heating and the hot end began heating. This also occurred in a previous test during the same process in this previous log: serial (Success with c929fb5).log:
2021-02-15 15:02:07,721 - Recv: e��&m�¦üNot SD printing

I will see if I can isolate it to a hardware issue.

I noticed that the free buffer indicator of ADVANCED_OK actually achieved lower values up to 29 at one point. Previously, the indicator would stay stuck at B31. Perhaps I will try the BufferBuddy plug-in to verify functionality.

The only drawback/bug I noticed is that the M117 display progress updates are not being echoed to SERIAL_PORT for the TFT. Perhaps other things are not echoing to the other serial port, however, this is all I observed during this test.

@thinkyhead
Copy link
Member

Please test the bugfix-2.0.x branch to see where it stands. If the problem has been resolved then we can close this issue. If the issue isn't resolved yet, then we should investigate further.

@jongough
Copy link

jongough commented Sep 1, 2021

I am getting the same random serial port disconnects when using a Raspberry Pi 3A running octoprint talking to an Ender-6 with Creality V4.3.1 STM32F1 board. I am currently using bugfix-2.0.x 01d1192a with the attached config files:
configuration.zip
If I revert to the standard creality build for this machine, currently 'Ender-6-V1.0.4.9-Endstop.bin', I do not get the error.

When the error occurs I get this in the octoprint log:

Recv: ok N2860 P0 B3
Send: N2861 M73 P12*89
Recv: echo:Unknown command: "M73 P12"
Unexpected error while reading serial port, please consult octoprint.log for details: SerialException: 'device reports readiness to read but returned no data (device disconnected or multiple access on port?)' @ comm.py:_readline:3860
Please see https://faq.octoprint.org/serialerror for possible reasons of this.
Changing monitoring state from "Printing" to "Offline after error"
Connection closed, closing down monitor

Here is the relevant bit from the octoprint serial log:

2021-09-02 07:58:08,268 - Unexpected error while reading serial port, please consult octoprint.log for details: SerialException: 'device reports readiness to read but returned no data (device disconnected or multiple access on port?)' @ comm.py:_readline:3860
2021-09-02 07:58:08,268 - Please see https://faq.octoprint.org/serialerror for possible reasons of this.
2021-09-02 07:58:08,289 - Changing monitoring state from "Printing" to "Offline after error"
2021-09-02 07:58:08,315 - Connection closed, closing down monitor

and the relevant part from octoprint.log:

2021-09-02 07:58:08,083 - octoprint.plugin - DEBUG - Calling on_event on printscheduler
2021-09-02 07:58:08,234 - octoprint.util.comm - ERROR - Unexpected error while reading from serial port
Traceback (most recent call last):
  File "/home/pi/oprint/lib/python3.7/site-packages/octoprint/util/comm.py", line 3860, in _readline
    ret = self._serial.readline()
  File "/home/pi/oprint/lib/python3.7/site-packages/octoprint/util/comm.py", line 6488, in readline
    c = self.read(1)
  File "/home/pi/oprint/lib/python3.7/site-packages/serial/serialposix.py", line 596, in read
    'device reports readiness to read but returned no data '
serial.serialutil.SerialException: device reports readiness to read but returned no data (device disconnected or multiple access on port?)
2021-09-02 07:58:08,268 - octoprint.util.comm - ERROR - Please see https://faq.octoprint.org/serialerror for possible reasons of this.
2021-09-02 07:58:08,271 - octoprint.events.fire - DEBUG - Firing event: UpdatedFiles (Payload: {'type': 'printables'})
2021-09-02 07:58:08,272 - octoprint.events - DEBUG - Sending action to <bound method PrinterStateConnection._onEvent of <octoprint.server.util.sockjs.PrinterStateConnection object at 0x661f9050>>
2021-09-02 07:58:08,275 - octoprint.plugin - DEBUG - Calling on_event on action_command_notification

I need to make use of the new facilities in Marlin 2.0 (filament change, better bed leveling, etc.) and this is stopping this occurring.

@4lfamyk
Copy link

4lfamyk commented Oct 2, 2021

Hi
Is there any update to this? I seem to be having similar issues, with the printhead freezing mid-print for a couple of seconds and then resuming the print. It happens randomly, on a recently upgraded Ender3 with BTT SKR 1.4 Turbo with TMC2209 steppers coupled with Raspberry Pi 3B+ and Octoprint.
For Marlin, I am using bugfix-2.0.x as of f2ba845
Do let me know what all additional info I should provide?

@BastianPoe
Copy link

May this be related: raspberrypi/linux#2406? The suggested "workaround" is to reduce USB to USB1.1 which seems to work around most (all?) of the problems. It certainly worked for me.

@nrietsch
Copy link

nrietsch commented Dec 22, 2021

I am experiencing the same issue.
I have been running Marlin 2.0.7.2 with octoprint, SKR 1.4 Turbo and Ender 3 Pro without issues.
I used the marlin-config program to switch to Marlin 2.0.9.2 and started to get random communication errors between the Pi and and the SKR for no apparent reason at all.
I'm even having those connection errors when the printer is idle.

Reverted back to Marlin 2.0.7.2 and no more communication errors at all.

Given that the status of the issue is Open, I'm guessing that it's not part of Marlin 2.0.9.3, correct?

@RipperDrone
Copy link

RipperDrone commented Jan 16, 2022

Getting similar issue on USB comm to my Ender 3 V2 driven by SKR Mini E3 2.0: Random mid-print stops/freezes with heaters + fans (thanks God) remaining ON, however display showing 'continue' button and the printer does still accept commands. However, Cura shows '...printing' as status. Not sure what's going on, but I know for sure this hasn't happened on previous version of Marlin I had flashed before (think it was 2.0.8)...

My setup: No Octoprint - direct USB connection from my PC (Cura on Win11x64) to the Ender 3 v2.

Serial comm / buffer management seems flawed somehow?

@humberry
Copy link

I can confirm that it is not a good idea to establish the serial connection after starting to print from the SD card. My Prusa MK3S (firmware 3.10.1-4697) stopped immediately, although i wasn't sending any command, only reading the incomming messages. Tested with a Raspberry Pi Zero, Raspberry Pi OS Buster, Python3 and pyserial. Can't say on which Marlin version (1.0.x) this firmware is based, sorry. I hope this is no pure Marlin 2.x issue thread. Let me know if i can check something.

@Quas7
Copy link
Contributor

Quas7 commented Oct 26, 2022

Could be related.

I ran into a communication issue with Octoprint with an older 2.1.6-bugfix (c5af435)
Hardware is a Flsun Q5 delta with MKS Robin Nano 1.2 (ST controller).

This happend two times in a row now after a few hours of printing but I also updated Octoprint to 1.8.6 right before. ;(

Resendrate reported by Octoprint is "75/140k lines (0%)" so the com is somewhat stable at 250kbaud and I had no issues like that before. But I can not exclude having a kind of burst on the USB link that results in multiple signal corruptions at once (but its a shielded 20cm USB cable...).

Nevertheless, the firmware is requesting resending 034497 instead of 134497 and Marlin also sends quite corrupted error messages before that. It is looking almost like sending and receiving got scrambled together.

I will update now to the latest Marlin bugfix 2.1.x (6b4d7b9) and run some trials. If the error persists, I will start a seperate issue report.
Octoprint issue filed as well OctoPrint/OctoPrint#4671 but it looks like I will sit in between chairs here. ;)

Send: N134496 G1 X20.229 Y-54.548 E10.7819*119
Recv: ok N134496 P16 B31
Send: N134497 G1 X18.64 Y-54.548 E10.8357*72
Recv:  T:235.48 /236.00 B:61.42 /60.00 @:79 B@:0
Recv: ErrorNchecksum mism1
Recv: ch, Last Line: 134496
Recv: Resend: 034497
Should resend line 34497 but no sufficient history is available, can't resend
Changing monitoring state from "Printing" to "Error"
Changing monitoring state from "Error" to "Offline after error"
Connection closed, closing down monitor
Closing down send loop

With Octoprint in safemode I got this print stall event with 115200 baud.
Somehow the "ok message" was appended to the temperature line and the "ok" never reached Octoprint to continue sending.

2022-10-26 12:50:55,592 - Send: N2567 G1 X26.535 Y-30.681 E250.89253*112
2022-10-26 12:50:55,602 - Recv: ok N2567 P1 B31
2022-10-26 12:50:55,605 - Send: N2568 G1 X56.342 Y-.873 E252.49882*118
2022-10-26 12:50:57,035 - Recv:  T:235.94 /236.00 B:60.34 /60.00 @:53 B@:00ok N2568 P0 B31
2022-10-26 12:50:58,881 - Recv:  T:236.11 /236.00 B:60.43 /60.00 @:50 B@:0
2022-10-26 12:51:00,882 - Recv:  T:235.92 /236.00 B:59.82 /60.00 @:53 B@:0
2022-10-26 12:51:02,882 - Recv:  T:236.00 /236.00 B:60.10 /60.00 @:51 B@:0
2022-10-26 12:51:04,882 - Recv:  T:235.91 /236.00 B:60.04 /60.00 @:53 B@:127
...
2022-10-26 12:55:52,867 - Recv:  T:235.72 /236.00 B:60.28 /60.00 @:53 B@:0
2022-10-26 12:55:54,866 - Recv:  T:235.91 /236.00 B:60.34 /60.00 @:50 B@:0
2022-10-26 12:55:56,867 - Recv:  T:236.09 /236.00 B:60.38 /60.00 @:48 B@:0
2022-10-26 12:55:57,513 - Recv: Hotend Idle Timeout
2022-10-26 12:55:57,520 - Recv: //action:notification Hotend Idle Timeout

@christiancg
Copy link

christiancg commented Jan 3, 2024

Configuration.zip
Has this issue been solved? I am facing something similar. The connection between the printer and Octoprint just fails at some point with a serial error and the printer stops printing. No issues while printing through the SD card. The printer still reacts to commands sent with the display and Octoprint auto re connects to the printer (as I have the auto connect plugin), but the print has already failed, which is very annoying. Seems to be a random issue, as I cannot predict when it will happen.

My setup is a BTT SKR 1.4 Turbo running Marlin 2.1.2.1 with a BTT TFT 3.5 B1 display. The raspberry I'm using to run octoprint is a 3 model B+, running version 1.9.3.

I've lowered the baud rate to 115200 to both the host and the display because I thought that might be the problem.

I'm attaching the configuration files, but I think it is pretty standard. The only thing a little bit different is that I have 2 extruders with a Chimera clone hotend. The printer is a heavily customized Hellbot Magna Dual.

@EvilGremlin
Copy link
Contributor

EvilGremlin commented Jan 3, 2024

With native USB baudrate is ignored. What resend rate octoprint show?
All USB connection issues are due to bad cable and bad pi power. Get better and shorter cable, set jmper on boart to not use USB power. Also raspi PSU might be underpowered, because raspi power design is garbage which they stubbornly refuse to fix.

@christiancg
Copy link

christiancg commented Jan 3, 2024

Resend rate is always 0. I've bought a special power brick for the Raspberry PI to be sure that the power delivery is ok, but it is still stating that there are power issues. I've had no luck with that, specially because I live in Argentina and getting good stuff from abroad is not always an option. I had forgotten to attach the configuration files, so I edited the original comment and added it there

@Quas7
Copy link
Contributor

Quas7 commented Jan 3, 2024

I always power directly to the rpi header 5V pin (losing the input protection if you do not add a Zener or TVS diode) or I use a USB supply that delivers over 5.2V - everything lower made issues.

But regarding the serial connection I switched to SD card upload with 1Mbaud for my mks nano board. With this board it was not possible to get a reliable connection. I would have exchanged the PSU of the printer as my next step as some cheap psu glitch under load changes (heat bed). Maybe try printing without heating the bed first.

@benjmharless
Copy link

benjmharless commented Nov 20, 2024

Has a solution to the random mid-print freezing been found? I am having the same issue with a Ender 3 with SKR 1.4 turbo with TMC2209 stepper drivers and BTT TFT35. No Octoprint. I also have a H2 500C hotend added. I am running marlin 2.1.2. I am currently trying to rule out power and heat issues.
If anyone has a possible solution I would appreciate it. Thanks!

@benjmharless
Copy link

I fixed my issue by enabling RX_BUFFER_SIZE and TX_BUFFER SIZE and setting them to the highest option, and by changing the serial ports to match this: https://user-images.githubusercontent.com/54359396/132397091-d596abcf-750f-422a-bb59-afafc246dc58.jpg
I am also now running Marlin 2.1.2.5
Hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests