Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support reading sensors that need to be bridged to ipmb address #6

Merged
merged 10 commits into from
Jan 19, 2024

Conversation

richardstephens
Copy link
Contributor

@richardstephens richardstephens commented Jan 17, 2024

On some newer generations of HP hardware (output below is from an ML110 Gen10) we are not able to read all the sensors because they are not reachable directly from the BMC - we need to request that the request be bridged to the actual sensor.

...
 WARN  get_info > No reading for 31-PCI 6
 WARN  get_info > No reading for 32-PCI 7
 WARN  get_info > No reading for 33-PCI 8
 INFO  get_info > 34-PCI 1 Zone: 29.00 °C
 INFO  get_info > 35-PCI 2 Zone: 30.00 °C
 INFO  get_info > 36-PCI 3 Zone: 30.00 °C
 INFO  get_info > 37-PCI 4 Zone: 30.00 °C
 INFO  get_info > 38-PCI 5 Zone: 34.00 °C
 INFO  get_info > 39-PCI 6 Zone: 36.00 °C
 INFO  get_info > 40-PCI 7 Zone: 35.00 °C
 INFO  get_info > 41-PCI 8 Zone: 34.00 °C
 INFO  get_info > 43-Battery Zone: 33.00 °C
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ParsingFailed { error: Failed(RequestedDatapointNotPresent), netfn: SensorEvent, cmd: 45, completion_code: 203, data: [] }', examples/get-info.rs:92:18
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: core::result::Result<T,E>::unwrap
             at /home/buildozer/aports/community/rust/src/rustc-1.71.1-src/library/core/src/result.rs:1076:23
   4: get_info::main
             at ./examples/get-info.rs:90:25
   5: core::ops::function::FnOnce::call_once
             at /home/buildozer/aports/community/rust/src/rustc-1.71.1-src/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

With this change we are now able to read all the sensors succesfully.

...
 INFO  get_info > 36-PCI 3 Zone: 30.00 °C
 INFO  get_info > 37-PCI 4 Zone: 30.00 °C
 INFO  get_info > 38-PCI 5 Zone: 34.00 °C
 INFO  get_info > 39-PCI 6 Zone: 36.00 °C
 INFO  get_info > 40-PCI 7 Zone: 35.00 °C
 INFO  get_info > 41-PCI 8 Zone: 34.00 °C
 INFO  get_info > 43-Battery Zone: 33.00 °C
 INFO  get_info > 44-P/S 1 Inlet: 0.00 °C
 INFO  get_info > 45-P/S 2 Inlet: 35.00 °C
 INFO  get_info > 46-P/S 1: 0.00 °C
 INFO  get_info > 47-P/S 2: 46.00 °C
 INFO  get_info > 48-E-Fuse: 35.00 °C
 INFO  get_info > 49-P/S 2 Zone: 38.00 °C
 WARN  get_info > No reading for 50-AHCI HD Max
...

(there's something wrong with power supply 1 on this system, ipmitool shows the same numbers)

h2. Questions

  • I'm not super happy with the way we decide what type of address to use in send(). Normally I'd create an enum and match on the variants, but I've run into issues before with unsafe code and stuff getting moved unexpectedly, so I wouldn't be confident it was sound.
  • Should we keep for_sensor() around at all? At least one of our test systems (the Gen10 ML110 system again) re-uses the same sensor number for two different target addresses so I'm not entirely sure it's sound to just query the BMC with just the sensor number.
  • Is the handling of channels and LUNs correct? I'll dig through the specification a bit more to see how these are used. None of the test systems we have show any sensors on a non-zero channel or LUN
  • ipmitool will use picmg or VITA if they are available to determine what base address to use. None of the test systems I have access to seem to have these available, some googling seems to indicate they are related to blade chassis.

Thanks again for taking the time to review - happy to address any comments or feedback.

@datdenkikniet
Copy link
Owner

datdenkikniet commented Jan 17, 2024

I am an absolute fan of these contributions, thank you! Very cool to see that you can get it to work this well.

I'm not super happy with the way we decide what type of address to use in send()...

I've made some comments regarding this, hopefully helping a little :) A helper function and/or a well-placed match statement may do the trick.

Should we keep for_sensor() around at all? ..

I don't think there are any soundness issues (hopefully IPMI just returns errors if we're sending incorrect stuff), but keeping it around seems relatively nonsensical. Would opt for migrating to GetSensorReading::new_raw(SensorNumber, Option<(u8, u8> /* prefer those newtypes! */) and GetSensorReading::new(SensorKey). That would give both flexibility and clarity w.r.t. what information the command uses.

Is the handling of channels and LUNs correct? I'll dig through the specification a bit more to see how these are used.

Unfortunately I don't know either. It looks fine to me, so I suggest we just roll with this for now, and bug-fix when/if someone complains!

I've lost access to my IPMI-capable system so I have a hard time checking, too.

Also I have been slightly pedantic with the review, but hope the comments are insightful!

@datdenkikniet
Copy link
Owner

Oh yeah, and I've added a CHANGELOG.md now, would appreciate it if you could add an entry for IPMB support for GetSensorReading to that 😄

richardstephens added a commit to gallium-cloud/ipmi-rs that referenced this pull request Jan 18, 2024
richardstephens added a commit to gallium-cloud/ipmi-rs that referenced this pull request Jan 18, 2024
@richardstephens
Copy link
Contributor Author

richardstephens commented Jan 18, 2024

I am an absolute fan of these contributions, thank you!

I'm a huge fan of the library! We would have likely ended up shelling out to ipmitool otherwise, which I wasn't happy with for a whole host of reasons.

Would opt for migrating to GetSensorReading::new_raw...

Done. I didn't mean soundness issues in the UB/safety sense, more that calling GetSensorReading for a duplicate sensor number would give you a reading for a different sensor that probably even has a different unit.

Unfortunately I don't know either. It looks fine to me, so I suggest we just roll with this for now, and bug-fix when/if someone complains!

We're expanding our catalog of test hardware and we'll take a look as soon as we see it

I've lost access to my IPMI-capable system so I have a hard time checking, too.

We might be able to help with that.

richardstephens added a commit to gallium-cloud/ipmi-rs that referenced this pull request Jan 18, 2024
@richardstephens
Copy link
Contributor Author

In 0c3cad0 I've refactored send() to pull the two types of addresses into an enum. It works - and I tested both a debug and release build - but I'm not 100% confident in the soundness.

@richardstephens
Copy link
Contributor Author

This should be ready for re-review now. If you're happy with these changes as they are, the only other outstanding issue I'm aware of is RCMP support. Unfortunately the box I'm testing on doesn't have its BMC LAN connected and that probably won't be addressable until after the weekend.

@datdenkikniet
Copy link
Owner

Had a few more minor comments, this is coming along very nicely! One more round and it should be good to go.

Done. I didn't mean soundness issues in the UB/safety sense, more that calling GetSensorReading for a duplicate sensor number would give you a reading for a different sensor that probably even has a different unit.

Ah alright, yes I agree 100%.

We're expanding our catalog of test hardware and we'll take a look as soon as we see it

Awesome!

We might be able to help with that.

You had my curiosity... but now you have my attention! If this is a serious offer (I imagine it's remote access or something, not hoping for you to send me hardware 😛 ), please contact me at [email protected]

@datdenkikniet
Copy link
Owner

datdenkikniet commented Jan 18, 2024

In 0c3cad0 I've refactored send() ...

I think we have it the right way around, for now. As far as I understand it, calling drop() on the relevant data after we're done IOCTL-ing ensures that the lifetime of the data that we send in unsafely is extended to include the IOCTL call. Time will tell :)

@datdenkikniet datdenkikniet merged commit 0305d84 into datdenkikniet:main Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants