Improve the power managing actor initialization #966

llucax · 2024-06-10T10:32:25Z

This PR was done while chasing an issue with tests complaining that some tasks were pending while the loop was destroyed.

I guess the issue is in the pytest_asyncio plugin, because those tasks seem to be properly cancelled according to my debugging, and we had similar issues with pytest-asyncio before. The weird thing is that the errors show up in a different test from the one that actually creates the tasks, which makes me think that the plugin is not properly cleaning up after itself.

In any case, while debugging this, I found a few things that could be improved in the code in terms of synchoronization during initialization and cleanup.

This PR:

Adds a unique_id to LatestValueCache to improve the debugging experience (using the new name unique_id instead of name, as suggested in Rename BackgroundService's name argument to unique_id frequenz-core-python#7).
Adds __str__ and __repr__ to LatestValueCache to use this unique_id
Cancels LatestValueCache's task if deleted (just in case)
Adds a wait_for_value() method to LatestValueCache so we can wait for the cache to be filled with data
Adds a wait_for_data() method to ComponentManager to wait for all caches to be filled with data
Add @override to PowerDistributingActor (because why not?)
Improve waiting for data in the PowerDistributingActor by using the new wait_for_data() method instead of blidly wait for the whole wait time

When debugging and having some task in the stack coming from a `LastValueCache` it is very difficult to identify which one is it. This commit add a new `unique_id` that's used to create the task name. Signed-off-by: Leandro Lucarella <[email protected]>

The `str` just returns the value, and `repr` gets all the info, including the new `unique_id`. Signed-off-by: Leandro Lucarella <[email protected]>

This shouldn't be strictly necessary as the even loop only store weak references to tasks, and when a task is gone it should be removed from the event loop anyway, but this makes the process more explicit and might help cleaning up more properly if the task was still running when deleted, as the `CancelledError` might be still raised inside the task, giving it the opportunity to clean up. Signed-off-by: Leandro Lucarella <[email protected]>

This methods allows to wait until the cache is filled, to ensure working with a valid value at some particular point in time. Signed-off-by: Leandro Lucarella <[email protected]>

This method will wait that the manager actually received some data to ensure it can make informed decisions, and it is not just working on a partially initialized system. Signed-off-by: Leandro Lucarella <[email protected]>

Signed-off-by: Leandro Lucarella <[email protected]>

Instead of blindly waiting for some time in the hopes that the underlying manager got some data, which can be a longer-than-needed wait when the data is available pretty quickly, we ask the manager to wait until the data is ready by using the new `wait_for_data()` method. Signed-off-by: Leandro Lucarella <[email protected]>

llucax · 2024-06-10T10:33:05Z

@jh2007github FYI as it is (slightly) related to frequenz-floss/frequenz-core-python#7.

llucax · 2024-06-10T10:34:00Z

Skipping release notes as the resulting effect should be barely noticeable for end users, but I can mention the improvement in the release notes if you prefer.

shsms · 2024-06-19T11:15:19Z

The waiting for data at power distributor startup can be dropped instead: #971, so I guess some of the commits from this PR can go away.

llucax · 2024-06-20T13:33:44Z

Dropped in favor of:

llucax · 2024-06-20T13:34:15Z

We leave the wait for data for LastValueCache for feature, if necessary.

llucax added 7 commits June 10, 2024 12:21

Add __str__ and __repr__ to LatestValueCache

73fd25f

The `str` just returns the value, and `repr` gets all the info, including the new `unique_id`. Signed-off-by: Leandro Lucarella <[email protected]>

Add a wait_for_value() method to LatestValueCache

0ed87b5

This methods allows to wait until the cache is filled, to ensure working with a valid value at some particular point in time. Signed-off-by: Leandro Lucarella <[email protected]>

Add a wait_for_data() method to ComponentManager

8e48b49

This method will wait that the manager actually received some data to ensure it can make informed decisions, and it is not just working on a partially initialized system. Signed-off-by: Leandro Lucarella <[email protected]>

Add @override to PowerDistributingActor

81cb369

Signed-off-by: Leandro Lucarella <[email protected]>

llucax requested a review from a team as a code owner June 10, 2024 10:32

llucax requested a review from shsms June 10, 2024 10:32

llucax self-assigned this Jun 10, 2024

github-actions bot added part:data-pipeline Affects the data pipeline part:actor Affects an actor ot the actors utilities (decorator, etc.) part:core Affects the SDK core components (data structures, etc.) labels Jun 10, 2024

llucax added the cmd:skip-release-notes It is not necessary to update release notes for this PR label Jun 10, 2024

llucax added this to the v1.0.0-rc700 milestone Jun 10, 2024

llucax closed this Jun 20, 2024

llucax added the resolution:wontfix This will not be worked on label Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the power managing actor initialization #966

Improve the power managing actor initialization #966

llucax commented Jun 10, 2024

llucax commented Jun 10, 2024

llucax commented Jun 10, 2024

shsms commented Jun 19, 2024

llucax commented Jun 20, 2024

llucax commented Jun 20, 2024 •

edited

Loading

Improve the power managing actor initialization #966

Improve the power managing actor initialization #966

Conversation

llucax commented Jun 10, 2024

llucax commented Jun 10, 2024

llucax commented Jun 10, 2024

shsms commented Jun 19, 2024

llucax commented Jun 20, 2024

llucax commented Jun 20, 2024 • edited Loading

llucax commented Jun 20, 2024 •

edited

Loading