-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Random freezes during printing over serial connection (skr 1.4 turbo) #21010
Comments
Can you test using another cable and another computer? Skr uses usb serial, that is a lot stable than normal hardware serial. In the past, some users reported something similar and the issue was in the host. |
Sure, will try. But it will be another rpi4 as I don't have anything else with hardware serial at hand. However I'm not convinced if it's host to blame. It has printed hundreds of hours in this setup. And right now it's finishing ~18 hour print running downgraded Marlin. |
Which version of marlin is working fine for you? |
What happen when the serial is freezing ? Do you have emergency parser enabled (previously) ? If yes, can you send some emergency commands to see if it react ? Can you post the last serial output log so we can see what's the last message that worked ? |
I've read your configuration file. I'm trying to validate some hypothesis here:
For 1:
For 2:
For 3:
|
Applied changes as described above and started ~4 hours print with serial communication logging enabled. In the meantime: I ran few short prints overnight, and was unable to trigger issue. Both on new-rpi4 and old-rpi4. |
Today I haven't been able to repeat the problem on any of the two raspberry pi's and any version of Marlin. |
Or it's a bug that's disappearing when EMERGENCY_PARSER is enabled... Feel free to reopen if it re-appear once you disable emergency parser. |
Happened again. Printer does not react to M108.
I have st-link v2 plugged in that I'm using to flash firmware, but I guess it can be used to do some debugging if someone guide me. I can leave the printer in current state for a while. |
Ok, so I hope you've openocd installed. In that case, you'll have to start the server:
Replace Then in another terminal, run gdb like this:
While in GDB, type:
And post the output of the commands above If you are a bit lost, I've written a post about how to debug Marlin and there for setting up OpenOCD. |
(there's no debug subdir, so I'm using /usr/src/Marlin/.pio/build/LPC1769/firmware.elf)
|
So, it's useless... 😢 Your printer has crashed. You'll have to use #20492 to get a stack trace to print on the serial port (you can directly pull this, it was updated with bugfix recently). Also, please compile your firmware with |
Just a note for developers: by default, the crash handlers in the framework don't remember the crash's stack trace. So until #20492 is merged (which does remember the stack trace and allow backtracing), we're blind when such crash happens. |
I am experiencing a similar if not exact issue. It took me a few weeks to isolate this. I know if I use 2.0.x-bugfix at commit c929fb5 everything prints perfectly. I am unable to successfully compile again until commit c74f972. From this commit forward I experience issues with serial communication that ultimately cause some prints to fail. Not every type of print will fail. Models with straight lines and sharp corners will complete. Models with curves or circles will experience many pauses and exhibit blobs on the surface of the model. I have used the Arc Welder OctoPrint plugin for a few months and have experienced pleasing results with it. If I enable Arc Welder compression, the print will most likely fail. In order to find which commit was causing the issue, I had to find a situation that would duplicate the failure consistently. This is the description of the hardware I am using: I have a BTT SKR 1.4 Turbo connected to a BTT TFT-35 E3 V3.0 which is connected to an RPi4. I have the RPi4 Bluetooth device disabled and the first PL011 (UART0) is the primary UART. The RPi4 is connected to the TFT from internal GPIO pins on RPi4 to WIFI connector on TFT. The WIFI connector on the TFT automatically echos all data coming to and from the serial connection from the TFT to the SKR 1.4 Turbo. For software, I do not use the OctoPi image on the RPi4. I use a current build of RaspberryPi OS with a pure Python3 environment. I connect to the printer using /dev/ttyAMA0. I use the current release of the Arc Welder plugin (1.1.0rc2). I have Meatpack enabled in Marlin, however, I have it disabled in OctoPrint since it won't allow OctoPrint to connect to the printer with my setup (See OctoPrint-Meatpack Issue #12). These are the configuration files that are working: All the commits after the above for Marlin will not allow my printer to function properly. I am attaching the most recent configuration I tested that failed: I will attach the model I was using for testing along with the sliced gcode and the arc-welded sliced gcode: This is the model STL: Z bracket spacer V3.zip To duplicate the failure, I simply load and print the Arc-Welded gcode in OctoPrint. The result is around line 1200, the serial communication gets out of sync and eventually results in serial timeout. I have attached a terminal log file showing the situation from the terminal view of OctoPrint. The error occurs at line 1212 this time (not always same line): Terminal.zip. The TFT will show a popup with the line mismatch and will beep continuously. I need to cycle power at the printer to recover. I can provide any additional information or perform testing as requested. |
Right, from the log, it seems that Octoprint continue feeding lines to the printer while the printer asked to resend. At some point, it fails. Can you post the same terminal output with the previous version (c929fb5) ? The strange thing in the log are:
Meanwhile, if you can pull #20492 and build that with both "MARLIN_DEV_MODE" and "POSTMORTEM_DEBUGGING" enabled in Please notice that the OP had his printer crashed and not doing anything anymore, yours seems to run an emergency/error condition but it still run, so the above might seem less relevant. |
Ok, I think I have an idea. I'll send you a file to replace and test with it later today. |
Please apply the following patch on the current bugfix:
Test it and let me know if it fixes it. Thanks! |
I ran the same Arc-Welded g-code again to firmware c929fb5 with the aforementioned changes. The print completed with no issues so I have attached a terminal log showing lines 1200 - 1300 as comparison: Terminal (working).zip. I will now compile and run the pull #20492 and report results back here. I received an ST-Link/V2 yesterday if this will be needed for further debugging. |
I just read your previous comments about the patch. I have applied this patch to bugfix 31a434b and noticed that there are some lines of code in the patch that read: This bugfix commit does not have POSTMORTEM_DEBUGGING option in Configuration_adv.h. Will this patch still perform as expected? Please advise if I should continue. |
If it does not apply feel free to remove the By looking at the log, it seems that the new code is lagging behind the host a lot compared to the old code. I can't explain it yet. BTW, thank you for your report, it's very detailed. |
I ran the same Arc-welded g-code to commit 31a434b with your patch applied for the total changes of: changes_for_31a434b.zip The print failed in the typical fashion. This is the terminal log starting at line 1200: Terminal (Failure with patched 31a434b).zip For a more thorough analysis, I decided to enable serial logging in OctoPrint and ran the same g-code again. An anomaly occurred during g29 probing and BL-Touch reported an error near center of bed. The firmware threw an error to OctoPrint which which closed the terminal connection. However, since I have G29_Retry_and_Recover enabled, the G29 process started over again and completed before the firmware executed the Halt function. This is the serial log from this run: serial (BL-Touch Error patched 31a434b).zip I restarted the printer and repeated the print. This is the serial log: serial (Failure with patched 31a434b).zip For comparison, I flashed back to the firmware compiled from c929fb5 and ran the same g-code. The print completed with no issues. This is the serial log: serial (Success with c929fb5).zip Comparing the serial logs, it is apparent that the 31a434b commit (with patch) starts asking to host to wait during idle even before the print is sent over. The serial log from commit c929fb5 does not have the word wait in it even once. |
In the good log, there is this:
and this:
It seems that they are some issue with the communication reliability, and the new code does not succeed dealing with it. Can you send me your firmware.elf file ? As far as I understand, wait message comes from this:
So length is zero when the host has not send any comment yet. last_command_time is zero too (until it receives a command). So it all boils down to serial_data_available() returning 0 in the new code and not in the previous code. I need the firmware.elf of the previous and new code so I can compare the generated binary and figure out the difference. |
Another remark:
If we look at the given timestamp, the first exchange is correct, the answer coming ~400ms after the query. In the second exchange, the answer comes 4 seconds after the query, so it's a lot larger than the timeout time (set to 1s in your config), so it's logical that the wait message is generated on next serial loop iteration. The real question comes down to why does processing M21 command took 4 seconds ? EDIT: Seems like SD card initialization is slow, so it's expected. |
In the good log, the SD card detection also takes 4s to run:
Yet no wait is generated, and from the logic of the code, it should have been. |
It seems compiling triggered the new sanity check. Here is the log: Terminal (Failure with patched a73cff8).txt |
Yep, it means that you've made a mistake in your configuration. You're defining MP on serial port 2 but you don't have a serial_port_2 defined. You can skip the patch, it's now merged in bugfix. |
In Configuration.h, I have: |
Maybe you are right, I don't know what it's used for (too many configuration for me to grasp). In that case, I'll have to add a SERIAL_PORT_3 macro to the configuration... 😞 |
This is my current git stash: changes_for_a73cff8.zip Is there something mis-configured? I had both TFT and RPI communicating. |
There's something mixed up. The test for the error is "MP_on_sp2 && !SP2" |
On the bright side, this will benefit users with an SKR 1.4/Turbo that are suffering from a different issue. The SKR 1.4 Turbo kit I purchased came with an ESP-032 module that plugs into the WIFI socket of the SKR. It was intended for the host to connect to USB, TFT-35 to connect to TFT port, and ESP-032 to connect to WIFI port. I believe most people gave up on the ESP-032 module early since Marlin could not support all three ports at once. |
I removed the patch and rebased to the current bugfix. Perhaps something happened during the patch apply. Compilation did not trigger the new sanity check this time. But it did fail. Here is the log: Terminal (Failure with 3107d8a).txt |
1 similar comment
I removed the patch and rebased to the current bugfix. Perhaps something happened during the patch apply. Compilation did not trigger the new sanity check this time. But it did fail. Here is the log: Terminal (Failure with 3107d8a).txt |
Try #21336 it should fix the build issue. |
PR #21336 successfully compiled. I will now attempt to connect with Meatpack enabled and try a test print. |
I have TFT-35 connected to SERIAL_PORT and RPi connected to SERIAL_PORT_2. I have MEATPACK_ON_SERIAL_PORT_2 enabled and MP_DEBUG enabled. Meatpack plug-in enabled in OctoPrint. I performed a test print with the troublesome print I used previously. The print completed successfully. This is the serial.log: serial (Success with PR_21336).log The TFT did not lock up and remained functional. I presume the odd output of the serial.log is due to MP_DEBUG? I did notice some sort of corruption at: The corruption occurred between the time the bed finished heating and the hot end began heating. This also occurred in a previous test during the same process in this previous log: serial (Success with c929fb5).log: I will see if I can isolate it to a hardware issue. I noticed that the free buffer indicator of ADVANCED_OK actually achieved lower values up to 29 at one point. Previously, the indicator would stay stuck at B31. Perhaps I will try the BufferBuddy plug-in to verify functionality. The only drawback/bug I noticed is that the M117 display progress updates are not being echoed to SERIAL_PORT for the TFT. Perhaps other things are not echoing to the other serial port, however, this is all I observed during this test. |
Please test the |
I am getting the same random serial port disconnects when using a Raspberry Pi 3A running octoprint talking to an Ender-6 with Creality V4.3.1 STM32F1 board. I am currently using bugfix-2.0.x 01d1192a with the attached config files: When the error occurs I get this in the octoprint log:
Here is the relevant bit from the octoprint serial log:
and the relevant part from octoprint.log:
I need to make use of the new facilities in Marlin 2.0 (filament change, better bed leveling, etc.) and this is stopping this occurring. |
Hi |
May this be related: raspberrypi/linux#2406? The suggested "workaround" is to reduce USB to USB1.1 which seems to work around most (all?) of the problems. It certainly worked for me. |
I am experiencing the same issue. Reverted back to Marlin 2.0.7.2 and no more communication errors at all. Given that the status of the issue is Open, I'm guessing that it's not part of Marlin 2.0.9.3, correct? |
Getting similar issue on USB comm to my Ender 3 V2 driven by SKR Mini E3 2.0: Random mid-print stops/freezes with heaters + fans (thanks God) remaining ON, however display showing 'continue' button and the printer does still accept commands. However, Cura shows '...printing' as status. Not sure what's going on, but I know for sure this hasn't happened on previous version of Marlin I had flashed before (think it was 2.0.8)... My setup: No Octoprint - direct USB connection from my PC (Cura on Win11x64) to the Ender 3 v2. Serial comm / buffer management seems flawed somehow? |
I can confirm that it is not a good idea to establish the serial connection after starting to print from the SD card. My Prusa MK3S (firmware 3.10.1-4697) stopped immediately, although i wasn't sending any command, only reading the incomming messages. Tested with a Raspberry Pi Zero, Raspberry Pi OS Buster, Python3 and pyserial. Can't say on which Marlin version (1.0.x) this firmware is based, sorry. I hope this is no pure Marlin 2.x issue thread. Let me know if i can check something. |
Could be related. I ran into a communication issue with Octoprint with an older 2.1.6-bugfix (c5af435) This happend two times in a row now after a few hours of printing but I also updated Octoprint to 1.8.6 right before. ;( Resendrate reported by Octoprint is "75/140k lines (0%)" so the com is somewhat stable at 250kbaud and I had no issues like that before. But I can not exclude having a kind of burst on the USB link that results in multiple signal corruptions at once (but its a shielded 20cm USB cable...). Nevertheless, the firmware is requesting resending 034497 instead of 134497 and Marlin also sends quite corrupted error messages before that. It is looking almost like sending and receiving got scrambled together. I will update now to the latest Marlin bugfix 2.1.x (6b4d7b9) and run some trials. If the error persists, I will start a seperate issue report.
With Octoprint in safemode I got this print stall event with 115200 baud.
|
Configuration.zip My setup is a BTT SKR 1.4 Turbo running Marlin 2.1.2.1 with a BTT TFT 3.5 B1 display. The raspberry I'm using to run octoprint is a 3 model B+, running version 1.9.3. I've lowered the baud rate to 115200 to both the host and the display because I thought that might be the problem. I'm attaching the configuration files, but I think it is pretty standard. The only thing a little bit different is that I have 2 extruders with a Chimera clone hotend. The printer is a heavily customized Hellbot Magna Dual. |
With native USB baudrate is ignored. What resend rate octoprint show? |
Resend rate is always 0. I've bought a special power brick for the Raspberry PI to be sure that the power delivery is ok, but it is still stating that there are power issues. I've had no luck with that, specially because I live in Argentina and getting good stuff from abroad is not always an option. I had forgotten to attach the configuration files, so I edited the original comment and added it there |
I always power directly to the rpi header 5V pin (losing the input protection if you do not add a Zener or TVS diode) or I use a USB supply that delivers over 5.2V - everything lower made issues. But regarding the serial connection I switched to SD card upload with 1Mbaud for my mks nano board. With this board it was not possible to get a reliable connection. I would have exchanged the PSU of the printer as my next step as some cheap psu glitch under load changes (heat bed). Maybe try printing without heating the bed first. |
Has a solution to the random mid-print freezing been found? I am having the same issue with a Ender 3 with SKR 1.4 turbo with TMC2209 stepper drivers and BTT TFT35. No Octoprint. I also have a H2 500C hotend added. I am running marlin 2.1.2. I am currently trying to rule out power and heat issues. |
I fixed my issue by enabling RX_BUFFER_SIZE and TX_BUFFER SIZE and setting them to the highest option, and by changing the serial ports to match this: https://user-images.githubusercontent.com/54359396/132397091-d596abcf-750f-422a-bb59-afafc246dc58.jpg |
Bug Description
I'm experiencing radom freezes when printing with Octoprint over serial connection.
Using SKR 1.4 turbo, RPI4, directly connected to TFT serial interface (SERIAL_PORT 0).
Running bugfix-2x 004bed8
I am not able to identify any specific circumstances of the error - sometimes it happens on first layer, and sometimes after few hours of printing the same gcode.
Print head stops, screen (ulti controler) stops refreshing, communication freezes (there is "timeout" message in octoprint terminal window).
I'm suspecting it's related to recent serial changes, because:
Configuration Files
Configuration.zip
The text was updated successfully, but these errors were encountered: