PersistentStorageDelegate SyncGetKeyValue not returning CHIP_ERROR_BUFFER_TOO_SMALL #16958

mlepage-google · 2022-04-01T22:32:44Z

Problem

Using chip::Server::GetPersistentStorage on Linux (e.g. in chip-all-clusters-app),
whatever storage implementation is being used under the hood, it isn't
returning a CHIP_ERROR_BUFFER_TOO_SMALL error (as documented)
if calling SyncGetKeyValue with a buffer too small; it just silently truncates
the data it gets.

E.g. I wrote 4 bytes and passed in a buffer of size 2, and got back no error,
size 2 (but because it actually held 4 bytes, I should have received an error).

Proposed Solution

Audit code to ensure implementation is following the published API.

bzbarsky-apple · 2022-05-22T06:12:52Z

I expect this is src/include/platform/KvsPersistentStorageDelegate.h which calls into KeyValueStoreManager::Get, which eventually lands in src/platform/Linux/KeyValueStoreManagerImpl.cpp where we have, in _Get:

    if ((err != CHIP_NO_ERROR) && (err != CHIP_ERROR_BUFFER_TOO_SMALL))
    {
        return err;
    }

and otherwise we copy however much data we can and return CHIP_NO_ERROR.

That is definitely a violation of the KeyValueStoreManager API documentation.

This varies by platform as follows:

Linux: broken
Darwin: looks like it gets this right
Zephyr: I can't tell at first glance
qpg: gets this right, I think
k32w0: Gets it right (but not the "write as much as will fit" bits, I think, just the error return).
cc13x2_26x2: Looks wrong to me
CYW30739: Can't tell
mbed: Looks correct.
ESP32: Can't tell
P6: Can't tell
Tizen: Seems correct
Ameba: Broken
Android: Looks correct
EFR32: Looks correct
BL602: Looks broken

@mlepage-google Maybe we need separate issues here for the different buggy platforms?

bzbarsky-apple · 2022-06-13T22:51:36Z

I suspect we need to fix this to behave correctly, because various other code will not work right if this is wrong and we can end up with serious security problems.

issue project-chip#16958

@msandstedt

* Implement shadow fail-safe data in FabricTable - FabricTable management during commissioning did not properly handle the fact that committing needs to only be done on CommissioningComplete, which prevented the AddTrustedRootCertificate, UpdateNOC and AddNOC command semantics to be implemented properly and prevented proper state observation of operational credential clusters server Fixes #7695 Issue #8905 Fixes #18633 Issue #17208 Fixes #15585 This PR: - Removes direct access to FabricInfo from everywhere, which caused possibly stale FabricInfo references during commissioning. - Remove immediate committing of fabric table on UpdateNOC. - Make Fabrics, NOCs and TrustedRootCertificates attributes reflect proper partial state during fail-safe, by using the shadow data capabilities of OperationalCertificateStore and by updates to FabricInfo - Make it possible to unit test fabric table by providing the necessary lifecycle public APIs to test every modality of the commissioning flow - Make Server and DeviceController use OperationalCertificateStore to allow proper external lifecycle management of the operational cert chain. - Update all examples/controller code to new API - Remove dangerous internal APIs from FabricTable and replace with direct accessors where needed - Add more of the necessary spec validations to the UpdateNOC and AddNOC flows Testing done: - Updated all unit tests, all pass - Cert tests still pass as before - Working on further integration tests and unit tests as a follow-up noting that current state has not regressed on existing test coverage, and that new usage of OperationalCertificateStore class in FabricTable gains a large amount of additional coverage transitively via some of the existing tests making use of FabricTable. * Restyled by clang-format * Fixes after merging master * Rename callback of FabricTable::Delegate - Remove needless ones - Make the callbacks do nothing by default to avoid more "Intentionally left blank" - Renamed to actually reflect what is happening * Rekick restyle * restyle * Align last known good time commit / revert to shadow fail-safe approach Fabric commit / revert is being refactored to only persist fabric data when we actually commit on receipt of CommissioningComplete. This reworks Last known Good Time to use the same strategy. Now, instead of persisting both last known good time and a fail-safe backup at the time of certificate installation, a single, updated last known good time is stored in RAM at the time of certificate installation. Then, on commit, this is persisted, but never before. * Fix CI * Fix TV app * Reduce flash usage on K32W0 by disabling detail (verbose) logs * Disable progress logging in TI platform lock example to save flash * Disable progress logging in TI platform shell example to save flash * Reduce stack usage of TestEventLogging * Apply review comments from @msandstedt * Save stack for Nordic build * Save stack for Nordic build, some more * Added unit tests for all basic operations * Fix merge conflict * Restyle * Change nrf connect stack limit up by 512 bytes for fake unit test comparisons * Revert "Save stack for Nordic build" This reverts commit 52699c4. * Revert "Save stack for Nordic build, some more" This reverts commit e62391e. * Fix UpdateLabel * Apply review comments, add ACL fabric removal * Apply review comments * per bzbarsky, clear in-memory Last Known Good Time if revert fails * Apply fixes from @bzbarsky-apple * Apply more review comments * Restyled * Fix tests * Add unit test for iterator * Iterator now works * Fix semantic merge conflict * Restyled * Fix predicates for storage presence temporarily issue #16958 * Fix Darwin build * Add missing docs, move methods to private that shouldn't be public * Change stack warning temporarily to pass on nRFConnect * Make MatterControllerFactory use const FabricInfo * Restyle * Fix semantic conflict on SessionManager * Fix an init ordering issue in TestSessionManager.cpp * Fix SessionManager shutdown * Restyled Co-authored-by: Restyled.io <[email protected]> Co-authored-by: Michael Sandstedt <[email protected]> Co-authored-by: Boris Zbarsky <[email protected]>

kkasperczyk-no · 2022-06-30T06:54:09Z

@bzbarsky-apple I would say it should be fine for Zephyr/nRFConnect platform. The Get_ implementation requests value read from Zephyr subsys and then destination buffer size is verified in the Zephyr callback: https://github.com/project-chip/connectedhomeip/blob/master/src/platform/Zephyr/KeyValueStoreManagerImpl.cpp#L72. Result assigned in callback is then returned by the Get_ method: https://github.com/project-chip/connectedhomeip/blob/master/src/platform/Zephyr/KeyValueStoreManagerImpl.cpp#L125

tehampson · 2022-08-23T13:19:16Z

Finding based on survey results in general

Based on the responses to the survey these are conclusion we have made for matter 1.0:

There is no pressing reason to lower PersistentStorageDelegate::kKeyLengthMax from 32 to a smaller size.
No need to restrict values of characters used in keys.
Based on the survey results no, platforms stated being capable of some low fixed number of total simultaneously stored key/value pairs. With that said, we are still working on a stress test to ensure platforms are actually capable of handling enough simultaneous key/value pairs to allow for multiple fabrics to be commissioned with certificates crafted to be of maximum size.
All platforms are capable of adhering to the API contract of SyncGetKeyValue as defined currently. Although some platforms might need to perform some additional work and it can be quite costly if the buffer stored is very large, but fundamentally there is nothing preventing them from doing so.

Consideration to explore for PersistentStorageDelegate post 1.0:

While all platforms are capable of performing SyncGetKeyValue as defined currently, because of the high cost on some platform where they need to allocate a buffer the size of value stored to perform a full read to then partially copy into the smaller provided buffer, we might want to consider fine tuning the API contract. For example if the size of value in storage is under a certain threshold, implementations of SyncGetKeyValue are required to do what described by the contract today. If size of value is beyond threshold, implementation may choose to do the partial read, but if the operation is too costly they could return an error like CHIP_ERROR_PERSISTED_STORAGE_FAILED. The threshold would be some a value such that platforms could use stack space instead of being forced to allocate a costly large buffer just to be compliant with API contract.
Explore adding ability to provide attributes to attach on key/value pairs. An example of an attribute would be whether a key/value pair should be stored securely or not.
Creating some sort of minimum number of simultaneous key/values pairs that should be supported and adding some sort of audit to support testing that. We feel TC-RR-1.1 is the baseline test for this.
Enforce a minimum length for values that needs to be supported.
- Based on survey response all platforms claim to have no problems with supporting Credentials::kMaxCHIPCertLength * 2 which is why we are defer making some sort of decision on this for after 1.0
Would be helpful for spec to state a minimum persistent storage size recommendation.
Some platforms are performing some hashing operation to the key before storing it, making it impossible to enumerate all keys as of today. If we think that it is important to ensure keys are enumerable to allow some feature like removing keys with a certain prefix (like f/a1/* to remove all keys associated with a fabric index 0xa1) we need to formalize the requirement and possibly assist in a solution for those that have already gone down the path of hashing keys.

Uses of PersistentStorageDelegate in SDK post 1.0 to explore:

All code in the SDK that uses PersistentStorageDelegate should avoid using raw packed structs. Raw packet structs are really not future proof and don't age well. They should instead try using Matter TLV structures.
Consider changing resumption ID key names to not be randomized values; resumption ID for sigma don’t actually need to be in the key name. Doing this would help with dimensioning and debugging for some persistent storage backends.

The ability for platforms to quickly audit their PersistentStorageDelegate has been created. Based on https://github.com/project-chip/connectedhomeip/projects/86#card-83751232, many platforms have passed the audit. At this point it is on the platforms themselves to pass the audit. If there is a platform not adhering definitions and failing the audit, a separate issue can be created for that platform specifically.

bzbarsky-apple added V1.0 platform needs platform breakdown labels May 22, 2022

woody-apple added the deferral candidate label Jun 7, 2022

bzbarsky-apple removed the deferral candidate label Jun 13, 2022

bzbarsky-apple mentioned this issue Jun 23, 2022

Linux KVS Get impl should return buffer too small if the buffer is too small #19914

Merged

tehampson self-assigned this Jun 23, 2022

tcarmelveilleux added a commit to tcarmelveilleux/connectedhomeip that referenced this issue Jun 23, 2022

Fix predicates for storage presence temporarily

cbbf425

issue project-chip#16958

turon added the security label Jun 29, 2022

tehampson closed this as completed Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PersistentStorageDelegate SyncGetKeyValue not returning CHIP_ERROR_BUFFER_TOO_SMALL #16958

PersistentStorageDelegate SyncGetKeyValue not returning CHIP_ERROR_BUFFER_TOO_SMALL #16958

mlepage-google commented Apr 1, 2022

bzbarsky-apple commented May 22, 2022

bzbarsky-apple commented Jun 13, 2022

kkasperczyk-no commented Jun 30, 2022 •

edited

Loading

tehampson commented Aug 23, 2022 •

edited

Loading

PersistentStorageDelegate SyncGetKeyValue not returning CHIP_ERROR_BUFFER_TOO_SMALL #16958

PersistentStorageDelegate SyncGetKeyValue not returning CHIP_ERROR_BUFFER_TOO_SMALL #16958

Comments

mlepage-google commented Apr 1, 2022

Problem

Proposed Solution

bzbarsky-apple commented May 22, 2022

bzbarsky-apple commented Jun 13, 2022

kkasperczyk-no commented Jun 30, 2022 • edited Loading

tehampson commented Aug 23, 2022 • edited Loading

kkasperczyk-no commented Jun 30, 2022 •

edited

Loading

tehampson commented Aug 23, 2022 •

edited

Loading