Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

work around interlock hardware issue #117

Closed
hartytp opened this issue Jan 5, 2021 · 32 comments
Closed

work around interlock hardware issue #117

hartytp opened this issue Jan 5, 2021 · 32 comments
Labels

Comments

@hartytp
Copy link

hartytp commented Jan 5, 2021

Due to #97 the interlock thresholds are not accurate enough to be useful. It appears highly likely that the underlying issue is a hardware issue. That needs to be understood and resolved, so I've opened sinara-hw/Booster#375

Whatever the underlying cause, there are too many Booster units currently in active use to start applying hardware fixes, so it would be great to find a software workaround for this.

The previous firmware had a separately calibrated transformation for the interlock and power readout. IIRC we decided not to do that in this firmware implementation because it should not be needed (in the sense that the additional accuracy gained by the calibration at one operating point should be small compared with the expected variation over power/frequency/etc -- although I have not thought too carefully about this / tested it on hardware so that may not be totally correct, but I'd be surprised it it were be far off). With separate calibrations, the interlock did work well, suggesting that whatever the cause of this effect, it can be removed by an additional two-point calibration.

@ryan-summers could you look into adding an additional transform that I can set? (or if you have a better idea, let me know!)

@hartytp
Copy link
Author

hartytp commented Jan 5, 2021

NB AFAICT this is the only "critical" issue currently. The rest is documentation / usability or lower priority enhancements.

@ryan-summers
Copy link
Member

ryan-summers commented Jan 6, 2021

I think adding an additional transform doesn't actually address the problem - there is logically only a single power detector per channel. If we supplied two different transforms, we would then have two means of interpreting the same voltage level. This would essentially just hide the issue, which is what was happening in the previous firmware and why we are just now finding it.

I think what adding a second transform would ultimately result in (and what the previous firmware likely does) is that the interlock threshold transform would have an additional 6dBm offset from the power measurement readout. That would have (essentially) the effect of always setting the interlock threshold 6dBm higher than what the power detector actually detects.

It's likely easier for us to hard-code that offset value into the firmware and make a note of the hardware fault until we can implement a suitable fix.

That being said, ultimately I'd like to hold off on any firmware changes until he problem is more understood. For example:

  • Was this investigation completed for the reflected interlock threshold? This is now removed and fixed to 30 dBm. Does this occur for the output power threshold?
  • You mention that you are increasing the synthesizer by 1dBm step size input power to generate the trip. This has the effect of "stepping" the power. Step functions have very high frequency components associated with them, so is it possible that these are false-positive failures? Does this still occur if the power signal smoothly increases towards the interlock threshold?

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

@ryan-summers thanks for the response

You mention that you are increasing the synthesizer by 1dBm step size input power to generate the trip. This has the effect of "stepping" the power. Step functions have very high frequency components associated with them, so is it possible that these are false-positive failures? Does this still occur if the power signal smoothly increases towards the interlock threshold?

This is not the case (see my various measurements on the issue threads). In brief:

  1. I looked at the synth output on a scope and verified that there are no glitches (high frequency components); it always transitions smoothly (which most high-end T&M equipment does IME). Remember that (a) the detector response is filtered quite heavily so it isn't sensitive to really high frequency stuff (b) we're talking about 6dB which would be an absolutely huge transient
  2. I probed the detector output during the synth transitions. I confirmed that when the interlock threshold is set > ~6dB above the synth power there are no glitches on the detector whatsoever. Large glitches do appear when the threshold is set < ~6dB above the higher synth power, although it's not clear if that is a cause of the interlock problems or just an effect. Due to the comparator hysteresis a glitch after the interlock trips is expected. I wasn't able to probe enough signals to determine the time ordering of the various events, so it's not clear how to interpret this.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

Was this investigation completed for the reflected interlock threshold? This is now removed and fixed to 30 dBm. Does this occur for the output power threshold?

The issue may well also affect the reflected power interlock. I'll have a look at that at shortly. The threshold for the reflected power is much less critical so it may not actually be a problem (at least for us), otherwise the fix will need applying there as well.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

That being said, ultimately I'd like to hold off on any firmware changes until he problem is more understood. For example:

I fully agree that this issue needs resolving, ideally before any more hardware ships. However, I don't think that waiting until the issue is understood before implementing a workaround is going to work for us.

  1. Whatever the actual hardware issue is, I believe we have enough data at this stage to be confident what the workaround will end up being (adding some form of offset to the interlock threshold), so I don't think we benefit from waiting
  2. It's not going to be practical to patch the hardware issue on all the Boosters channels currently in the wild. There are too many of them built into experiments that are in constant use and removing, resoldering, etc is just too disruptive. So we do need to find some form of work around (otherwise people are just going to stick with the old firmware for a long time)
  3. Debugging this hardware issue is likely to take some time, but I need to get some Boosters commissioned in the next week or so.
  4. Until we have a workaround for this, I can't use the new firmware in anger, so can't check that there aren't any other issues. I don't want to put further testing on hold while we debug this issue

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

I think what adding a second transform would ultimately result in (and what the previous firmware likely does) is that the interlock threshold transform would have an additional 6dBm offset from the power measurement readout. That would have (essentially) the effect of always setting the interlock threshold 6dBm higher than what the power detector actually detects.

From the data I already have, it does provisionally look like an offset will be enough, but we should take a little more data to confirm that's enough. We also need to verify that the detector -> threshold offset is constant between channels. If both of those work out, then I agree that hard-coding an offset is the most effective resolution here. Assuming this is fixed for the next hw revision we should probably do a runtime check based on the hw-rev pins to determine whether or not to apply the offset, but that can be left for a later stage.

@ryan-summers would you mind measuring the detector -> threshold offset (output power only) on a couple of channels of one of your Boosters at a few different powers? I'll do the same and then we can decide what the lowest effort work around is.

@jordens
Copy link
Member

jordens commented Jan 6, 2021

I agree that Cti/ts or you guys at Oxford are best equipped to debug the hardware issue. I don't want to break the seals and don't consider the work around required or critical for the correct operation of the firmware. The easiest work around is to just choose between accurately calibrated monitoring and some level of accurately calibrated thresholds. The latter transformation can just as well be implemented by the user on the host side using the same code they'd develop for calibration. My impression given the am-am keying distortion is that the latter will be ambiguous in any case.

@jordens
Copy link
Member

jordens commented Jan 6, 2021

Hardcoding the heuristic 6 dB offset is also fine as a workaround. It would (and would have been in the specification phase) be very useful to confirm it on some existing devices and it's accuracy over a range of powers.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

I think adding an additional transform doesn't actually address the problem - there is logically only a single power detector per channel. If we supplied two different transforms, we would then have two means of interpreting the same voltage level. This would essentially just hide the issue, which is what was happening in the previous firmware and why we are just now finding it.

That this isn't totally correct AFAICT (but correct me if my reasoning is not right here). There are two voltages: the comparator reference and the detector voltage. The slopes of the DAC and ADC will not be identical. The calibration we currently do includes the ADC reference voltage/ADC gain errors as well as the detector slope, coupler etc. However, the DAC reference/gain errors are independent from the ADC. So, if the ADC reference/gain were far off, the calibration process could actually make the interlock threshold less accurate rather than more.

The assumption of course is that the ADC/DAC reference/gain errors are small compared with other errors, so it's fine to just use one calibration for everything. That's probably true, but it's not a priori obvious and needs verification, which I'm not aware of being done.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

My impression given the am-am keying distortion is that the latter will be ambiguous in any case.

I do plan to have another look at that at some point soon. But, to put things in perspective, the level of distortion (presumably thermal transients in the amp) that you saw was only 0.6dB. So worst-case the interlock should still be good to ~1dB, which is fine for all the applications I have in mind. 6dB errors are not.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

Hardcoding the heuristic 6 dB offset is also fine as a workaround. It would (and would have been in the specification phase) be very useful to confirm it on some existing devices and it's accuracy over a range of powers.

Agreed. I think this is will probably be an appropriate work around, but let's confirm first. As I said above, I'm happy to test on a few channels that I have access to right now at a few powers, but it would also be great if @ryan-summers is able to do the same on his hw.

@ryan-summers
Copy link
Member

ryan-summers commented Jan 6, 2021

I'm happy to test on a few channels that I have access to right now at a few powers, but it would also be great if @ryan-summers is able to do the same on his hw.

Sorry, I realize I forgot to mention it here - I don't have a signal generator capable of generating the input RF signals to test this at my office. Previously, Robert was doing additional testing to verify the interlocks, but I believe he no longer has access to a Booster at his office.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

Okay, if you can't easily test then don't worry. If I test a selection of channels that should give us enough information to decide on an acceptable workaround for now.

@jordens
Copy link
Member

jordens commented Jan 6, 2021

The slopes of the DAC and ADC will not be identical.

AFAWK (#97) relative ADC/DAC gain/offset errors are not a problem and it's not relevant to this problem, right?

The assumption of course is that the ADC/DAC reference/gain errors are small compared with other errors, so it's fine to just use one calibration for everything. That's probably true, but it's not a priori obvious and needs verification, which I'm not aware of being done.

You appeared convinced that it would work and the specification/design relies on it. Lesson for next time: If there was doubt, this should have been verified long ago (taking a look at calibrations of existing devices).

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

AFAWK (#97) relative ADC/DAC gain/offset errors are not a problem and it's not relevant to this problem, right?

I don't think that contradicts anything I said. I don't expect any issues here and am not advocating changes.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

You appeared convinced that it would work and the specification/design relies on it. Lesson for next time: If there was doubt, this should have been verified long ago (taking a look at calibrations of existing devices).

I'm not sure that's a productive line to go down.

Let me have a look at the hw I have here and see how reliable the offset is. If it's something we can hard code into the firmware then the software workaround becomes pretty trivial. I don't want to spend more time discussing this than the fix will take to implement.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

Okay, I have some code written to programmatically scan the synth power in 0.1dB steps and read out the highest value of the output power (as measured by Booster's power detectors) that does not result in an interlock trip.

I'm currently blocked by being unable to enable my Booster, but once I get past that I'll run this for interlock thresholds of [13dBm, 23dBm, 33dBm] on a couple of channels and see how things look.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

okay...worked around that by software enabling and re-running the tune script.

P_set P_trip error
33dBm 26.7dBm 6.3dB
23dBm 16.8dBm 6.2dB
13dBm 6.8dBm 6.2dB

So on this channel the issue does seem to induce a fixed 6.2dB discrepancy between the interlock trip and the power readout

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

Now looking at another channel in the same Booster. NB I haven't calibrated this channel yet, but I don't expect that to matter since this is a differential measurement (only involving Booster's detector) and, the calibration seems to only make a small difference to the reading any way (at least it did on the other channel I looked at)...

P_set P_trip error
33dBm 26.9dBm 6.1dB
23dBm 16.9dBm 6.1dB
13dBm 6.9dBm 6.1dB

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

Okay, so still a pretty small sample size, but I find that pretty convincing.

@ryan-summers are you okay to add a fixed 6.0dB offset in the interlock as a work around for this?

@jordens
Copy link
Member

jordens commented Jan 6, 2021

Ryan's suggestion is a potential workaround if it is supported by analysis and data.
But this is a tiny sample size. I presume this is just one hardware revision. And the underlying issue is still not understood.

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

I presume this is just one hardware revision.

Yes, this is just on the current v1.5 release. That's the only release I have access to for now so I don't see us realistically being able to test on any other revision any time soon. However, it's highly likely that the behaviour will be the same for v1.4 since there were no relevant changes between releases (basically just the TVS diodes). Revisions before v1.4 are not currently planned to be supported by this firmware anyway (there are quite a few changes to the InAmps etc which would prevent the firmware from working as is).

Obviously, more testing is always better, but given the above I'm happy with only testing on v1.5 hardware

@hartytp
Copy link
Author

hartytp commented Jan 6, 2021

And the underlying issue is still not understood.

sure, but that's going to take some time and it's not unreasonable to want a work around until then.

Why so much opposition to adding an offset into the firmware? It's not a large amount of work to implement and it will make it easier for me to start using the Boosters in the lab and do further testing. If further testing reveals that this offset doesn't work robustly on multiple boards we can always revert the commit.

@jordens
Copy link
Member

jordens commented Jan 7, 2021

I have to disagree. It is a significant amount of work to implement, test, review, understand, verify, document, deploy, revert, maintain, support, revise. We can't stop half-way into the "implement" aspect and ignore the rest.
For initial usage and testing an easy work around appears to be simply setting the threshold 6.2 dB high.
Is there a chance to determine the offset on a 1.4 board and a 1.4/1.5 board of the other manufacturer, CTI vs TS? Looking at the existing calibration data (online or the calibration reports) would be sufficient. Then I'm happy.

@jordens jordens changed the title work around interlock issue work around interlock hardware issue Jan 9, 2021
@hartytp
Copy link
Author

hartytp commented Jan 11, 2021

@ryan-summers am I right in thinking that replacing:

self.settings.data.output_power_transform.invert(power),
with power + 6.0 should work as a quick hack?

@hartytp
Copy link
Author

hartytp commented Jan 11, 2021

For initial usage and testing an easy work around appears to be simply setting the threshold 6.2 dB high.

Yes, but that does mean that there will be inconsistencies for example when users query the set interlock threshold. That feels like an unnecessary foot gun that I don't want to have to explain to every student who uses Booster.

Is there a chance to determine the offset on a 1.4 board and a 1.4/1.5 board of the other manufacturer, CTI vs TS? Looking at the existing calibration data (online or the calibration reports) would be sufficient. Then I'm happy.

I can easily get someone from the Uni to post calibration numbers from the old firmware, but it's such a spaghetti mess that I would have trouble extracting meaningful information from it. Based on past experience, I'm also not overly optimistic about getting wizath et al to help us here. If you're okay to interpret the data then I'm more than happy to gather it.

@hartytp
Copy link
Author

hartytp commented Jan 11, 2021

hmmm...the change I suggested above doesn't quite achieve the desired effect because the interlock threshold that is reported from the device tries to account for the DAC quantization

self.settings.data.output_power_transform.map(voltage);

IMHO we should report the value set by the user, rather than trying to back out the exact setting from the DAC value. The quantization here is much smaller than the other uncertainties (variation over frequency/temperature of the various RF components, offsets in the DAC which we don't calibrate) so accounting for DAC quantization is false precision. I'll remove that in our local checkout next time I reflash.

@ryan-summers
Copy link
Member

IMHO we should report the value set by the user, rather than trying to back out the exact setting from the DAC value. The quantization here is much smaller than the other uncertainties (variation over frequency/temperature of the various RF components, offsets in the DAC which we don't calibrate) so accounting for DAC quantization is false precision. I'll remove that in our local checkout next time I reflash.

I agree that the quantization noise is somewhat unsavory - the original intent here is to show the user the actual setting instead of what was requested. I was originally using this as a means of verifying proper functionality of the firmware. E.g. if I requested a threshold of 10dBm and the firmware then reported something like a 20dBm set threshold, it was obviously not working as intended. We can revert this now since it may no longer be necessary, but we will lose a form of feedback from the firmware for the peace of mind of seeing a nice round number. I'm okay with either direction (the fix is quite small as you note).

@hartytp
Copy link
Author

hartytp commented Jan 11, 2021

I agree that the quantization noise is somewhat unsavory - the original intent here is to show the user the actual setting instead of what was requested. I was originally using this as a means of verifying proper functionality of the firmware. E.g. if I requested a threshold of 10dBm and the firmware then reported something like a 20dBm set threshold, it was obviously not working as intended. We can revert this now since it may no longer be necessary, but we will lose a form of feedback from the firmware for the peace of mind of seeing a nice round number. I'm okay with either direction (the fix is quite small as you note).

I see, that makes sense.

Ideally one should either get a clear error or the interlock threshold one asked for to within the accuracy tolerance. I'm not sure how that works in MQTT, but if I ask for 10dBm and get 20dBm without error then that's a big issue. This is a safety feature primarily so 20dBm could result in catastrophic damage.

@ryan-summers
Copy link
Member

Ideally one should either get a clear error or the interlock threshold one asked for to within the accuracy tolerance. I'm not sure how that works in MQTT, but if I ask for 10dBm and get 20dBm without error then that's a big issue. This is a safety feature primarily so 20dBm could result in catastrophic damage.

As I mentioned, that was just how I was using it during firmware development. That being said, we could theoretically implement a verification step in firmware to check the quantized setting against the requested to verify they're within a reasonable limit (e.g. 0.5 dBm or so) and then report the user-provided value to gain the same level of certainty.

@hartytp
Copy link
Author

hartytp commented Jan 11, 2021

Up to you. So long as you're happy things are now working reliably I'm happy dropping this diagnostic

@jordens
Copy link
Member

jordens commented Dec 1, 2021

Current Boosters are being manufactured with this hardware issue addressed in hardware. Older Boosters should (a) be reworked/upgraded, (b) use a recalibrated transform or (c) apply a hack in firmware that offsets the interlock DAC voltage with a guesstimate of the glitches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants