Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/hard reset recovery #1608

Merged
merged 8 commits into from
Dec 5, 2022
Merged

Fix/hard reset recovery #1608

merged 8 commits into from
Dec 5, 2022

Conversation

0xArdi
Copy link
Collaborator

@0xArdi 0xArdi commented Dec 2, 2022

Fixes

This PR fixes 2 issues with hard reset when it is used as a recovery mechanism for broken agent <-> tendermint communication. Namely:

  1. The initial height and genesis time should not be changed when resetting.
  2. We should not wait for 1/2 the observation interval before and after reset when recovering.

Types of changes

What types of changes does your code introduce? (A breaking change is a fix or feature that would cause existing functionality and APIs to not work as expected.)
Put an x in the box that applies

  • Non-breaking fix (non-breaking change which fixes an issue)
  • Breaking fix (breaking change which fixes an issue)
  • Non-breaking feature (non-breaking change which adds functionality)
  • Breaking feature (breaking change which adds functionality)
  • Refactor (non-breaking change which changes implementation)
  • Messy (mixture of the above - requires explanation!)

Checklist

Put an x in the boxes that apply.

  • I have read the CONTRIBUTING doc
  • I am making a pull request against the main branch (left side). Also you should start your branch off our main.
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have locally run services that could be impacted and they do not present failures derived from my changes

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc...

@codecov
Copy link

codecov bot commented Dec 2, 2022

Codecov Report

Base: 94.89% // Head: 94.89% // No change to project coverage 👍

Coverage data is based on head (fccdb41) compared to base (6acbca3).
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1608   +/-   ##
=======================================
  Coverage   94.89%   94.89%           
=======================================
  Files          68       68           
  Lines        2411     2411           
=======================================
  Hits         2288     2288           
  Misses        123      123           
Flag Coverage Δ
unittests 94.89% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

# in case of successful reset we store the reset params in the shared state,
# so that in the future if the communication with tendermint breaks, and we need to
# perform a hard reset to restore it, we can use these as the right ones
self.context.shared_state["reset_params"] = reset_params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shared_state is not conceptually the right place. There we store information which is relevant cross skills. I'd save it in a model on the skill.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@angrybayblade could rename shared_state for v2 and to call it skills_stared_state or similar. It gets often misinterpreted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 1610 to 1630
@property
def sleep_before_hard_reset(self) -> float:
"""
Amount of time to sleep before performing a hard reset.

We sleep for half the observation interval as there are no immediate transactions on either side of the reset.

:returns: the amount of time to sleep in seconds
"""
return self.params.observation_interval / 2

@property
def sleep_after_hard_reset(self) -> float:
"""
Amount of time to sleep after performing a hard reset.

We sleep for half the observation interval as there are no immediate transactions on either side of the reset.

:returns: the amount of time to sleep in seconds
"""
return self.params.observation_interval / 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just make this as a one variable since they both return the same value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I did it like this to make it more readable. It seemed weird to have a property for half the observation interval.

# in case of successful reset we store the reset params in the shared state,
# so that in the future if the communication with tendermint breaks, and we need to
# perform a hard reset to restore it, we can use these as the right ones
self.context.shared_state["reset_params"] = reset_params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Conflicts:
#	autonomy/constants.py
#	docs/guides/quick_start.md
#	docs/package_list.md
#	packages/packages.json
#	packages/valory/agents/hello_world/aea-config.yaml
#	packages/valory/agents/register_reset/aea-config.yaml
#	packages/valory/agents/register_termination/aea-config.yaml
#	packages/valory/agents/registration_start_up/aea-config.yaml
#	packages/valory/agents/test_abci/aea-config.yaml
#	packages/valory/services/hello_world/service.yaml
#	packages/valory/services/register_reset/service.yaml
#	packages/valory/skills/hello_world_abci/skill.yaml
#	packages/valory/skills/register_reset_abci/skill.yaml
#	packages/valory/skills/register_termination_abci/skill.yaml
#	packages/valory/skills/registration_abci/skill.yaml
#	packages/valory/skills/reset_pause_abci/skill.yaml
#	packages/valory/skills/safe_deployment_abci/skill.yaml
#	packages/valory/skills/termination_abci/skill.yaml
#	packages/valory/skills/test_abci/skill.yaml
#	packages/valory/skills/transaction_settlement_abci/skill.yaml
@DavidMinarsch DavidMinarsch merged commit faabe62 into main Dec 5, 2022
@0xArdi 0xArdi deleted the fix/hard-reset-recovery branch December 5, 2022 11:59
@Karrenbelt Karrenbelt mentioned this pull request Dec 6, 2022
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants