Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 2.0.9.2 PID #22893

Closed
rq3 opened this issue Oct 6, 2021 · 87 comments
Closed

[BUG] 2.0.9.2 PID #22893

rq3 opened this issue Oct 6, 2021 · 87 comments

Comments

@rq3
Copy link

rq3 commented Oct 6, 2021

Did you test the latest bugfix-2.0.x code?

Yes, and the problem still exists.

Bug Description

2.0.9.2 will not settle on a hot end temp. PID settings does not fix the issue.

Bug Timeline

new with release 2.0.9.2. OK with 2.0.9.0

Expected behavior

Temps should settle

Actual behavior

Temps wander rapidly, and will not settle at assigned value. Print will not start.

Steps to Reproduce

Compile and install Marlin 2.0.9.2 using released version and examples.

Version of Marlin Firmware

2.0.9.2

Printer model

Anycubic Predator

Electronics

stock Trigorilla Pro

Add-ons

No response

Bed Leveling

No response

Your Slicer

No response

Host Software

No response

Additional information & file uploads

Attached are screen shots of the effect from Repetier Host
Temp_Compare.docx

@KayIO
Copy link

KayIO commented Oct 29, 2021

+1
On Trigorilla Pro board. the temperatures are instantly jumping +/- 5 ...therefore the print will not start due to missing settled temprature.
Works fine up to 2.0.9.1 ... but not in 2.0.9.2.

@Alexgrosy
Copy link

+1
Same issue here with Trigorilla Pro. Temperature reading of hotend is totaly going nuts in 2.0.9.2.
2.0.9.1 is good.

@ellensp
Copy link
Contributor

ellensp commented Nov 8, 2021

If you enable SHOW_TEMP_ADC_VALUES and watch temperature in a console, does the raw ADC values also jump around?

@Alexgrosy
Copy link

Just checked with latest bugfix code. The raw ADC values are equally jumpy.
https://pastebin.com/ECveATw7

@github-actions
Copy link

github-actions bot commented Jan 8, 2022

This issue has had no activity in the last 60 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 10 days.

@rq3
Copy link
Author

rq3 commented Jan 15, 2022

Problem still exists as of January 14, 2022 bugfix. Applies to both user interfaces ( classic and color ui), and is particularly bad with any RTD (platinum resistance sensors) where temps and ADC readings jump as much as +/- 15C between readings.

Appears to be an ADC issue with low impedance sensors? Perhaps lack of oversampling for the RTD routine?

@KayIO
Copy link

KayIO commented Jan 15, 2022

Problem still exists as of January 14, 2022 bugfix.

Confirmed ! I'm wondering why it worked before, but not now anymore

@Michi1901
Copy link

Same here. My PT1000 is reading anywhere within +/- 5C of the target temperature. I am on 2.0.9.2.

@Arakon
Copy link

Arakon commented Jan 18, 2022

While I'm not seeing jumps that badly, I've been having huge issues with my SKR E3 Mini 2.0 on 2.9.0.3. The temperature is jumping +/- 2° with sometimes unexplained sudden drops by up to 8°, causing a thermal runaway error. 4 out of 5 PID tune runs result in completely bizarre values that cause huge oscillations, too.
The bed is also unstable, but by far not that badly (+/- 0.8°).

2.9.0.3:
2022-01-13 12_25_47-

I just flashed the BTT 2.8.0.2 Marlin version for an unrelated issue, and all of a sudden, after a single PID tuning run, both bed and hotend are absolutely smooth, bed staying within 0.1°C of the set temperature, hotend within 1°, usually less than 0.5° off.

2.8.0.2:
2022-01-19 00_25_18-

Note that I'm using the standard MK8 hotend on this printer, with the usual glass bead sensor (Type 1). Both heating cartridge and sensor have been replaced twice, including all wiring, to be sure that it's not a hardware issue.

@Thinkersbluff
Copy link

Thinkersbluff commented Feb 5, 2022

FWIW - I had a problem that sounds like this one, on my CR6-SE, when I flashed a "bugfix"-like developmental version of the CR6Community extui branch, which was recently merged with Marlin 2.9.0.1. I could not get Load Filament to work, and the nozzle temperature was jumping around the target value, seemingly sometimes heating or cooling faster than should be possible.
Reverting to the released v6.1 of that firmware solved the problem.
I have a BTT SKR CR6 v1 board, a PT1000 thermistor and a Dragon HF hotend.

@ceinem
Copy link

ceinem commented Feb 23, 2022

I'm observing a similar issue on the Anet A8+ with the latest Marlin firmware release.

@descipher
Copy link
Contributor

FWIW - I had a problem that sounds like this one, on my CR6-SE, when I flashed a "bugfix"-like developmental version of the CR6Community extui branch, which was recently merged with Marlin 2.9.0.1. I could not get Load Filament to work, and the nozzle temperature was jumping around the target value, seemingly sometimes heating or cooling faster than should be possible. Reverting to the released v6.1 of that firmware solved the problem. I have a BTT SKR CR6 v1 board, a PT1000 thermistor and a Dragon HF hotend.

Does adding the following to configuration.h change the behavior?
#define ADC_RESOLUTION 10

2.0.9.1 -> 2.0.9.2 ADC resolution changed to 12 bit from 10 bits

@Thinkersbluff
Copy link

Does adding the following to configuration.h change the behavior?
#define ADC_RESOLUTION 10

Sorry for the delayed response. Deeply into a project the past 2 weeks...

Unfortunately, no, adding that line did not solve the problem.
I still could not get Load/Unload to work, with the temperature fluctuating constantly above & below the target of 200C, with the PT1000 thermistor and the BTT CR6 SKR v1.0 motherboard.

It would not surprise me if the BTT SKR board itself has ADC issues if there is noise on the signal ground. They do seem to have that issue on other SKR boards, and I am not about to start de-soldering and replacing SMD components on a board I can no longer replace, if I don't have to.

My problem does seem to be solved, though, without adding the line "#define ADC_RESOLUTION 10" to my Configuration.h, now that I have removed the PT1000 thermistor and I have re-installed the PT100.

@InsanityAutomation
Copy link
Contributor

On the couple machines I have with temp fluctuations like this where it was swinging a little due to the sensor resolution, I opened up #define TEMP_WINDOW to around 5c as the swing was 3c

@tombrazier
Copy link
Contributor

tombrazier commented Mar 3, 2022

I would be very interested to know how my model predictive control algorithm handles such noisy sensor readings. It is an alternative to PID which should do better. PR #23751.

Not that PID is at fault for bad input. But I'd still be interested.

@rq3
Copy link
Author

rq3 commented Mar 4, 2022

I would be very interested to know how my model predictive control algorithm handles such noisy sensor readings. It is an alternative to PID which should do better. PR #23751.

Not that PID is at fault for bad input. But I'd still be interested.

It compiled and installed fine, and does work, but the nozzle temp seems to wander a bit more. Oddly, in both cases (your algorithm and PID), when starting a print the printer seems to stick in an infinite settling loop, waiting for the nozzle temp to stabilize. If I then change the requested nozzle temp by just one degree, either up or down, the print starts instantly.

@tombrazier
Copy link
Contributor

If I then change the requested nozzle temp by just one degree, either up or down, the print starts instantly.

I guess this behavior is an accidental byproduct of changing the target temp. Handy though!

If you're up for more experimentation...

Did you measure and set your own values for AMBIENT_FOR_CALIBRATION, TEMPERATURE_AT_T10, TEMPERATURE_AT_T20 and PWM_AT_200C? What values did you get? It sounds like sensor_xfer_coeff, which is calculated from them, might be too large.

Another thought is to try changing the value 10.0 in line 1227 of Marlin/src/module/temperature.cpp. Higher values will give greater immunity to sensor noise. However I think the problem is more likely to be with sensor_xfer_coeff.

@rq3
Copy link
Author

rq3 commented Mar 4, 2022

If I then change the requested nozzle temp by just one degree, either up or down, the print starts instantly.

I guess this behavior is an accidental byproduct of changing the target temp. Handy though!

If you're up for more experimentation...

Did you measure and set your own values for AMBIENT_FOR_CALIBRATION, TEMPERATURE_AT_T10, TEMPERATURE_AT_T20 and PWM_AT_200C? What values did you get? It sounds like sensor_xfer_coeff, which is calculated from them, might be too large.

Another thought is to try changing the value 10.0 in line 1227 of Marlin/src/module/temperature.cpp. Higher values will give greater immunity to sensor noise. However I think the problem is more likely to be with sensor_xfer_coeff.

I did, and I'm always up for experimentation. Heater is 40 watts, 24 volts. Ambient temp is 19C. T10 was 46C, T20 was 88C. Using Repetier Host, my best guess for PWM was 125 (about 50%). Temperature plot at 200C oscillated slowly and repeatably +/- 3C.

@tombrazier
Copy link
Contributor

Wow, that's fast heating. I take it you started at T0 with ambient temperature?

The value of 125 will be wrong. Max PWM is 127. On my 36W E3D V6 clone, PWM_AT_200C is 45. M105 will give you the value (after the @ at the endish of the line). e.g. I get

ok T:21.05 /0.00 B:20.10 /0.00 P:19.53 /0.00 @:0 B@:0

The first @ is the one - in this case 0 because I was not heating. The B@ is for the bed.

Still, I would expect that to give a fixed offset, not oscillations. Will ponder and play a bit. In addition to getting a better PWM value, you might want to try fiddling with that 10.0 I mentioned above.

In the meantime, it occurred to me that you could narrow down the commit between 2.0.9.0 and 2.0.9.2 where your PID behaviour changed by doing a binary search through the revision tree. The revision in Configuration.h changed to 02000901 in commit 7726af9. So start about halfway between then and when you first noticed the change and see whether it is present, rinse repeat until you have the exact commit.

@tombrazier
Copy link
Contributor

Hang on, just a thought, MPCTEMP is disabled by default in my PR. You did enabled it right?

@rq3
Copy link
Author

rq3 commented Mar 4, 2022

Yes, MPCTEMP was enabled. I didn't know that max PWM was 127, so assuming it was 255, I used 125 as a roughly 50% power level. That probably explains why the temperature settled about 10C too high!

After dropping to your default PWM of 56, things went better, and the printer started well, but then the nozzle temperature slowly sagged from the assigned temperature of 195. I stopped the print after several minutes when the temp got down to 180.

For what it's worth, the Repetier temp plot is much more "ragged" looking than PID, yet the printer LCD display readout appears much more stable. Which may explain why the print actually started without having to force a start by monkeying with the nozzle temp.

Looking at the Repetier temp plots again while running M105, I'm getting PWM values of about 65 (no fans), and about 75 with fans on 100%. These are visual interpolations, as the numbers jump pretty drastically between readings, which is indicative of Marlin's new behavior when it comes to temp readings from the ADC's.

I have a fairly unique extruder/hotend. The extruder itself is my own design (on YouTube as the "VDE Extruder"). The heater is a standard E3D 40 watt unit, but the heat block is pure silver with no sock, and yes, it's very fast. The temp sensor is a PT1000 platinum RTD feeding an amplifier of my own design, with a custom thermistor table. The nozzle is diamond.

@tombrazier
Copy link
Contributor

tombrazier commented Mar 5, 2022

Getting back to the original issue, it looks to me like the problem is more about the measurement of temperature than PID. Even with dead smooth temperature, the fluctuations in measurement will look like instability to the logic that allows the print to start. Nailing down the commit in the history of bugfix-2.0.x where the behaviour changed would be helpful.

@Arakon
Copy link

Arakon commented Mar 5, 2022

For me, it actually turned out to be a failing power supply. I'm not sure why it was better with 2.0.8.2 vs. 2.0.9.3, but I eventually found that my PSU was dropping to 19V after a few mins under load and that caused the issues.

@tombrazier
Copy link
Contributor

it actually turned out to be a failing power supply. I'm not sure why it was better with 2.0.8.2 vs. 2.0.9.3

Hmm, I wonder whether some firmware change increased hardware noise. A bit edge case but when you consider the 3D printers have long wires leading to sensors running in parallel with long wires leading to heaters and steppers, possible. @rq3 do you have an oscilloscope? Could you measure the actual electrical signal and see if there is a difference between versions of Marlin?

@rq3
Copy link
Author

rq3 commented Mar 5, 2022

it actually turned out to be a failing power supply. I'm not sure why it was better with 2.0.8.2 vs. 2.0.9.3

Hmm, I wonder whether some firmware change increased hardware noise. A bit edge case but when you consider the 3D printers have long wires leading to sensors running in parallel with long wires leading to heaters and steppers, possible. @rq3 do you have an oscilloscope? Could you measure the actual electrical signal and see if there is a difference between versions of Marlin?

I do have a 500MHz scope, and used it extensively during development of the PT1000 amplifier. The amplifier feeds the ADC on the control board, and has a typical noise of about 20 microvolts.

Marlin started having ADC issues around release 2.0.9.2, when the ADC routines went from 10 bit to the correct 12 bit.

@tombrazier
Copy link
Contributor

tombrazier commented Mar 5, 2022

Commit 5d8ca7c removed a 10 bit bitmask. What happens if you insert a 12 bit one? i.e.

HAL_adc_result = (HAL_adc_results[(int)pin_index] >> (12 - HAL_ADC_RESOLUTION)) & 0xFFF; // shift out unused bits

@tombrazier
Copy link
Contributor

Compiles and works with

#if ENABLED(HAL_ADC_FILTERED)
  #define OVERSAMPLENR 1
#elif HAL_ADC_RESOLUTION > 12
  #define OVERSAMPLENR (1 << (16 - HAL_ADC_RESOLUTION))
#else
  #define OVERSAMPLENR 16
#endif

in thermisters.h? I'm still expecting that #error statement to kick in.

@rq3
Copy link
Author

rq3 commented Mar 8, 2022

Compiles and works with

#if ENABLED(HAL_ADC_FILTERED)
  #define OVERSAMPLENR 1
#elif HAL_ADC_RESOLUTION > 12
  #define OVERSAMPLENR (1 << (16 - HAL_ADC_RESOLUTION))
#else
  #define OVERSAMPLENR 16
#endif

in thermisters.h? I'm still expecting that #error statement to kick in.

No. Thermistors.h currently looks like:
#define THERMISTOR_TABLE_ADC_RESOLUTION 10
#define THERMISTOR_TABLE_SCALE (HAL_ADC_RANGE / _BV(THERMISTOR_TABLE_ADC_RESOLUTION))
#if ENABLED(HAL_ADC_FILTERED)
#define OVERSAMPLENR 1
#else
#define OVERSAMPLENR 16
#endif

Stand-by while I try the change.

@tombrazier
Copy link
Contributor

Oh wait have you applied my uint16_t patch?

@rq3
Copy link
Author

rq3 commented Mar 8, 2022

Oh wait have you applied my uint16_t patch?

Yes, I have. I try to keep up ;-)
And it compiles fine either way. The ADC is set to 12 bits all the way back in pins_arduino.h
I had mistakenly thought that the STM32F1 HAL would handle that.

@tombrazier
Copy link
Contributor

Great. That is what I expect then. PR to make all STM boards 12 bits on its way.

@rq3
Copy link
Author

rq3 commented Mar 8, 2022

Great. That is what I expect then. PR to make all STM boards 12 bits on its way.

Just for grins, I compiled under maple to force the 12 bit STM32F1 HAL. Compiles and loads fine, but printer throws a bed max temp error.

@tombrazier
Copy link
Contributor

Hmm. And with PR #23871?

@rq3
Copy link
Author

rq3 commented Mar 8, 2022

Hmm. And with PR #23871?

Doesn't compile under the maple environment. I get: 'adc result' was not declared in this scope, for /STM32F1/HAL.cpp.

@tombrazier
Copy link
Contributor

You need to get a more recent bugfix-2.0.x. The variable was renamed some time back.

@rq3
Copy link
Author

rq3 commented Mar 8, 2022

You need to get a more recent bugfix-2.0.x. The variable was renamed some time back.

Will do. Although I shudder at the thought of editing all of these patches into a new bugfix!

I've also enable reporting of the ADC results for M105 in Repetier, and the hotend is hovering right around count 1640 at 200C, which is certainly 12 bit data and just about where it should be.

Oddly, although the heated bed is working fine, the reported counts are negative, and get increasingly negative as the temperature goes up.

@tombrazier
Copy link
Contributor

tombrazier commented Mar 9, 2022

I know you have a PTC device on the hotend, but I guess you have NTC on the bed. That would explain why the number decreases as temperature increases. The negative, however, looks like an integer overflow. What actual -ve number do you have? And what arrangement for the sensor? (e.g. 4k7 pullup with 100k NTC?) And do you get a -ve number with latest bugfix plus the PRs #23867 and #23871?

Edit: and is it -ve with maple / without maple / both?

@rq3
Copy link
Author

rq3 commented Mar 9, 2022

I know you have a PTC device on the hotend, but I guess you have NTC on the bed. That would explain why the number decreases as temperature increases. The negative, however, looks like an integer overflow. What actual -ve number do you have? And what arrangement for the sensor? (e.g. 4k7 pullup with 100k NTC?) And do you get a -ve number with latest bugfix plus the PRs #23867 and #23871?

Edit: and is it -ve with maple / without maple / both?

What does -ve mean in this context?

To be clear, the hotend sensor is a standard 1000 ohm platinum RTD, which is extremely linearized (within 0.05%), by a low noise amplifier. The amplifier runs from the ADC 3.3 volt Vref, so is ratiometric. There is no pull-up. The hot end is running under MPC control.

The bed sensor is a Type 1 NTC thermistor stock to the printer, and uses a 4.7k pull-up to the 3.3V Vref. The ADC count starts at about -200 at room temperature, and heads towards -500 at 60C. Temperatures were somewhat verified with an IR gun, and heating seems stable and reasonable. The bed is running under PID control.

@rq3
Copy link
Author

rq3 commented Mar 9, 2022

I know you have a PTC device on the hotend, but I guess you have NTC on the bed. That would explain why the number decreases as temperature increases. The negative, however, looks like an integer overflow. What actual -ve number do you have? And what arrangement for the sensor? (e.g. 4k7 pullup with 100k NTC?) And do you get a -ve number with latest bugfix plus the PRs #23867 and #23871?

Edit: and is it -ve with maple / without maple / both?

With the very latest bugfix and the above two PR's applied, I get a successful compile, and completely dead display under maple. Non-maple yields a grey screen (back ight on), but no graphics.

@tombrazier
Copy link
Contributor

What does -ve mean in this context?

It means what you meant when you said

the reported counts are negative, and get increasingly negative as the temperature goes up

And following up on:

I shudder at the thought of editing all of these patches into a new bugfix!

I don't know how you do it from VSCODE but from a Linux command line:

git clone https://github.com/MarlinFirmware/Marlin.git
cd Marlin
git checkout bugfix-2.0.x
git checkout -b testbranch
git remote add tombrazier https://github.com/tombrazier/Marlin.git
git pull tombrazier modelpredictivecontrol
git pull tombrazier 22893
git pull tombrazier stm32adc12bitfix
git remote remove tombrazier

That gives you a branch off bugfix-2.0.x called testbranch with all three patches merged as individual commits. git rocks.

@tombrazier
Copy link
Contributor

With the very latest bugfix and the above two PR's applied, I get a successful compile, and completely dead display under maple. Non-maple yields a grey screen (back ight on), but no graphics.

Well that's discouraging. Am guessing it's something in bugfix-2.0.x causing a hang during boot. Can't think why the PRs might have caused that.

@rq3
Copy link
Author

rq3 commented Mar 9, 2022

With the very latest bugfix and the above two PR's applied, I get a successful compile, and completely dead display under maple. Non-maple yields a grey screen (back ight on), but no graphics.

Well that's discouraging. Am guessing it's something in bugfix-2.0.x causing a hang during boot. Can't think why the PRs might have caused that.

I doubt it's the PR's, as the screen is blank just loading a freshly compiled, latest, bugfix, i.e., no PR's involved at all.
Bug report filed.

@tombrazier
Copy link
Contributor

Well, we just need bugfix up to 44eff9a. Let's hope that commit is not where the problem lies.

@rq3
Copy link
Author

rq3 commented Mar 9, 2022

Well, we just need bugfix up to 44eff9a. Let's hope that commit is not where the problem lies.

Never mind. Issue deleted. My mistake. Amazing what a simple negative sign can do in the wrong place. Any chance you could zip all the files associated with MPC and the 12 bit issue? I have a running version of the latest bugfix at the most bare-bones level, and would like to insert them manually.

@tombrazier
Copy link
Contributor

@rq3
Copy link
Author

rq3 commented Mar 10, 2022

https://github.com/tombrazier/Marlin/archive/eedf4c17ecae1fc55ea23fbf992848fe0b06201e.zip

Thank you for that. I have an error somewhere in conffiguration.h that causes a boot loop on printer start, but I'll find it. Using my earlier config works fine, and all ADC data is normal and correct.

You have been a very busy fellow, and I pray that your efforts are quickly merged into the next Marlin release.

@rq3
Copy link
Author

rq3 commented Mar 10, 2022

OK. Currently using latest bugfix with new configuration files. All is well. My only comment at this point is that changing:
adc_result = (adc_results[(int)pin_index] & 0xFFF) >> (12 - HAL_ADC_RESOLUTION); // shift out unused bits
to:
adc_result = (adc_results[(int)pin_index] & 0xFFF) >> (24 - HAL_ADC_RESOLUTION); // shift out unused bits
in /src/HAL/STM32F1/HAL.CPP
makes a distinct improvement in the stability of the ADC counts, as reported by M155 in Repetier Host.

@tombrazier
Copy link
Contributor

That will definitely do one of two things. With non-maple, it will do nothing as it is in the maple code. With maple, it will make adc_result permanently zero.

@rq3
Copy link
Author

rq3 commented Mar 10, 2022

That will definitely do one of two things. With non-maple, it will do nothing as it is in the maple code. With maple, it will make adc_result permanently zero.

I believe you, yet there is a VERY distinct difference in the reported ADC data (non-maple build). I can only report what I see, and don't claim to understand it. I'm just passing on my observations.

Another interesting point. Marlin users need to be aware that a new PID tune is in order with the "new" 12 bit ADC capabilty.
The resulting PID parameters are, of course, drastically different, with a noticeable difference in bed stability (not that it was bad before).

And one other oddity. During this entire process, I have noticed that the ADC counts become more stable after the steppers are homed and still engaged. I suspect an increase in ADC Vref stability when a load is placed on the printer's primary power supply, (or the supply switching frequency goes high enough for the ADC input low pass function to become effective), but don't care enough to dismantle the printer and break out the scope to probe for confirmation.

Having done analog design all my life, I pity the board designers when 24 bit ADC's become the norm. They're in enough trouble at 12 bits.

@thisiskeithb
Copy link
Member

#23871 has been merged.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators May 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests