Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Softer, disconnect-less HALT option. #20365

Closed
AndKe opened this issue Dec 3, 2020 · 22 comments
Closed

[FR] Softer, disconnect-less HALT option. #20365

AndKe opened this issue Dec 3, 2020 · 22 comments
Labels
C: Safety T: Feature Request Features requested by users.

Comments

@AndKe
Copy link
Contributor

AndKe commented Dec 3, 2020

Description

SKR , and many other newer boards do not reset on serial port opening, DTR, like older Arduino/Atmega based boards do.
Any HALT, even for non-dangerous situations like failed probing/levelling, makes the printer go into a brick mode, until power-cycled.
This is extremely inconvenient when printing remotely.

Feature Workflow

Add option to "Soft HALT/KILL"

When "soft halted", respond to commands with "!!" "error:" or "fatal:" as usual
This will stop any decent software, like OctoPrint
Then, accept only M999 command (restart)

This would provide a safe way of stopping a print, without disconnecting from the host.

Additional Information

This is related to:
#14300

@AndKe AndKe added the T: Feature Request Features requested by users. label Dec 3, 2020
@thisiskeithb
Copy link
Member

You should track down why your printer is halting.

It’s a safety feature since Marin controls things like heaters. The point of halting is to shut down everything and alert you that your printer did something unsafe and needs attention.

@AndKe
Copy link
Contributor Author

AndKe commented Dec 3, 2020

Absolutely, It also happens on probing error (not so dangerous) But I disabled OctoPrint from disconnecting on !!/Error now.
Still, some halts are not dangerous:
Example: bed heating is too slow in an unusually cold storage room (winter time) - may be treated as thermal runaway.
In reality, it is better to let it run and fail once, than configure a crazy runaway threshold like 2°C/5 minutes.

@thisiskeithb
Copy link
Member

There’s good reason to watch for slow heat ups and those settings are configurable:

/**
* Thermal Protection parameters for the bed are just as above for hotends.
*/
#if ENABLED(THERMAL_PROTECTION_BED)
#define THERMAL_PROTECTION_BED_PERIOD 20 // Seconds
#define THERMAL_PROTECTION_BED_HYSTERESIS 2 // Degrees Celsius
/**
* As described above, except for the bed (M140/M190/M303).
*/
#define WATCH_BED_TEMP_PERIOD 60 // Seconds
#define WATCH_BED_TEMP_INCREASE 2 // Degrees Celsius
#endif

@AndKe
Copy link
Contributor Author

AndKe commented Dec 3, 2020

@thisiskeithb
I know- and again (since you derailed this issue to be about safety) :
If enclosed printers are in an unheated room - think -5°C
Then I can reduce the target increase per period to be veeeery slow. But this is not the case most of the year, or if recently used, and printer & enclosure is not below freezing.
If I did that, then the period/time would be very slow to react if something went wrong the rest of the year.
So - in the coldest days, it is better to preheat, let it fail if needed, then resume.

Now let's rather discuss the issue:
There is no adverse effect or security risk in accepting M999, as no normal print job would have it.
Still afraid a job may somehow have it?
Then: After failure, accept M999 only after at least three seconds with no serial input (thus make sure the server stopped feeding gcode.)

So- with no adverse effects, this would save a lot of frustration for lots of users of the newer controllers.

@thisiskeithb
Copy link
Member

Halting your printer is about safety. It's why the feature exists.

@AndKe
Copy link
Contributor Author

AndKe commented Dec 3, 2020

Yes, I am not saying anything against halting - but it can be done in a more convenient, equally safe way.

You could also have a relay to make sure it halted by shorting the mains supply, but one usually takes the more sane/convenient solution to a problem.
There is no reason to "saw off the branch you sit on" (drop the serial link) just to halt safely.

@ellensp
Copy link
Contributor

ellensp commented Dec 4, 2020

This is a safety feature.
If it halt your meant to go to your printer and sort it out!, not just reboot and ignore it.

@AndKe
Copy link
Contributor Author

AndKe commented Dec 4, 2020

My suggestion increases safety in cold environment (not causing too liberal runaway settings)
No one commented on how a more convenient handling is incompatible with safety.

This applies especially to users who use big print beds, and toggle between thin powder-covered steel sheet, and heavy glass plate with PEI.

You are telling me that in order to punish the user as hard as possible for false positive on the thermal runaway (or probing error), you prefer that a user configures a 2°C / 500sec threshold for frozen glass bed, rather than a safe 2°C/60s that is sufficient with thin steel during most of the year. (then reset remotely the few times it may be needed)

Opposing this suggestion gives users incentive to have too sloppy thermal runaway settings (because the penalty is too inconvenient).

I won't' nag and discuss this anymore, if you oppose improvements and convenient safety features by principle and prefer to annoy users to configure them too liberally comment out THERMAL_PROTECTION_* - then that is an active choice to decrease safety.

@ChlorideCull
Copy link

Fundamentally, I think halting everything on errors can cause more safety issues than they solve, compared to killing everything and just keeping the serial processing open so Marlin can respond with the error on any command until actually reset.

For example, any situation where the device connected to the serial port loses state, you lose the actual error message - on boards which reset on DTR, you just end up without the ability to see the issue, and on boards which don't, you just get a dead connection.

@thisiskeithb
Copy link
Member

Preventing a possible fire is more important than keeping an error message on the screen/over serial which is why Marlin completely halts.

@rhapsodyv
Copy link
Member

Preventing a possible fire is more important than keeping an error message on the screen/over serial which is why Marlin completely halts.

My hotend melted and I got a circuit break (because the isolation of the wires melted too), when I ignored THERMAL_PROTECTION.... It was a problem in my temp sensor (loose), but I was trying to force Marlin to work.

So, I learned that safety features should not be ignored.

@AndKe
Copy link
Contributor Author

AndKe commented Dec 8, 2020

Switching off heaters may prevent fire.
You do not prevent fire by taking down communications - you DO prevent user from seeing overtemperature or the G-code server from being able to react.

IF we were actually interested in safety:
If the FET that controls some heating burns up and shorts internally (or relay for 220v relay sticks) - Marlin can't stop the heating from continuing.
The OctoPrint could -
if temperature > set_temp*1.1 then:
shut down the power to the whole printer by switching the power.
But we are apparently more interested in brainless communication-stopping than actual safety.

@rhapsodyv
Copy link
Member

rhapsodyv commented Dec 8, 2020

The point is: every time marlin has halted for me, I needed do something manually on the machine. At least for me.

But we are apparently more interested in brainless communication-stopping than actual safety.

Sarcasm?! Really??

@thisiskeithb
Copy link
Member

Halting doesn't just shut down communications. It kills everything and that's why it requires user intervention.

Your machine did something outside the norm and needs you to investigate.

I don't see how it would ever be more safe to let a printer continue to run for the sake of a user message instead of completely shutting down everything would make sense.

@AndKe
Copy link
Contributor Author

AndKe commented Dec 8, 2020

Now explain how it would be less safe to switch off steppers, heaters, keep communication ,reporting temperatures, and only accept M999

@rhapsodyv
Copy link
Member

Now explain how it would be less safe to switch off steppers, heaters, keep communication ,reporting temperatures, and only accept M999

Simple: M999 will allow you reset the printer and keep trying to print, over and over, instead of go to investigate what is causing the halt.

@AndKe
Copy link
Contributor Author

AndKe commented Dec 8, 2020

So... it would M999, restart, set temperatures (which do not enable heaters if already too high..)
None of this is automatically done by a gcode job.
however, if "runaway" is trigger because if failed to heat fromzen printer fast enough... one could, and should, just restart.
Or : do you seriously suggest that a user should have extremely liberal runaway settings to detect failure to heat enough ... so that it can allow "anything" during summer/hot printer ?

Allowing M999 also "allows" user to lit a fire manually. - disconnecting communications does nothing good. - certainly does not allow a secondary system to save the day.

@rhapsodyv
Copy link
Member

rhapsodyv commented Dec 8, 2020

Lets compare the views, I think it should explain why isn't safer to let user restart the printer remotely when it halted.

Marlin now:

  1. Marlin can't know what is the user environment. What he can know is that a thermal runaway happened.

  2. A small group of users that use the printer in a very cold environment will have faslse thermal runaway and will be annoyed by that.

What you are suggesting:

  1. Allow M999 so any user with any issue can just remotely restart the printer and keep pushing it until some bad happens.

  2. But, the user that have a very specific environment that lead marlin to false thermal runaway, will benefit from this.

Do you got it now?

We can't create something that incentives users to just ignore thermal runaway (or any other halts). That is the point.

@AndKe
Copy link
Contributor Author

AndKe commented Dec 8, 2020

@rhapsodyv
You are asking one essencial question here:
"what you are suggesting, part 1"
And the obvious answer is:
"Yes, absolutely" - allow the user control, this is what we allow to any Atmega-based controller by reconnecting. Whatever the user willingly does stupid things is completely another issue. We do not prevent people from having alcohol-soaked paper in their trash bin nearby.

So YES, any should be able to regain control, just like reconnect allows it for most users anyway.

And for the final time:
By stopping communication you can NOT guarantee that a crazy user continues even after remote power cycling or simple reconnect. you do however prevent more failsafes/ or a redundant failsafe system.

The printer is in an equally safe state by just switching off steppers/heaters and accepting only M999
It is not the engineer's job to make to perfectly idiot-proof, nature will ALWAYS come up with an better idiot.
...
On the other hand, you can always make it even less convenient for users by writing a flag to EEPROM, that require full re-flashing of the firmware too.. if ..like I see, the highest desire is to punish for such such events, way beyond sane handling of it.

@thinkyhead
Copy link
Member

If we could find some guaranteed way to reboot, that would be something to try. But, the locking up of the board to force physical interaction with the machine is for now a very intentional part of KILL.

@thisiskeithb
Copy link
Member

#21652 adds “soft reset” abilities. Closing.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C: Safety T: Feature Request Features requested by users.
Projects
None yet
Development

No branches or pull requests

6 participants