Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End of night shutdown improvements. #407

Merged
merged 9 commits into from
Jan 26, 2018

Conversation

wtgee
Copy link
Member

@wtgee wtgee commented Jan 22, 2018

A number of fixes and changes to account for situation when no valid
observations are found but we don't want to shut down the state machine.

  • If scheduler has a list of observations then it has_valid_observations.
    which is True when starting.

Process:

  • scheduling state calls observatory.get_observation()
  • get_observation first checks if has_valid_observations is False
    if so it will reread the field list.
  • get_observation either returns an observation or, if none found,
    calls clear_available_observations, making has_valid_observations = False.
  • If no observation is returned to scheduling state, send to parking (then to sleeping).
  • In sleeping, if has_valid_observations is False then sleep for an hour
    then try again (or shut down if run_once is specified).
    If has_valid_observations is True and it is safe and dark, shut down because
    of unknown error.

Other changes:

  • Remove reread_fields_file from dispatch scheduler as it is handled at
    the observatory level by observatory.get_observation.
  • Misc small fixes

⚠️ Note: I think I've gotten all the logic correct so that Bad Things™ won't happen. I think. :)

Closes #390

A number of fixes and changes to account for situation when on valid
observations are found but we don't want to shut down the state machine.

* If scheduler has a list of `observations` then it `has_valid_observations`.
	which is True when starting.

Process:
* `scheduling` state calls `observatory.get_observation()`
* `get_observation` first checks if `has_valid_observations is False`
	if so it will reread the field list.
* `get_observation` either returns an observation or, if none found,
	calls `clear_available_observations`, making `has_valid_observations = False`.
* If no observation is returned to `scheduling` state, send to `parking` (then to `sleeping`).
* In `sleeping`, if `has_valid_observations is False` then sleep for an hour
	then try again (or shut down if `run_once` is specified).
  If `has_valid_observations is True` and it is safe and dark, shut down because
  	of unknown error.

Other changes:
* Remove `reread_fields_file` from dispatch scheduler as it is handled at
the `observatory` level by `observatory.get_observation`.
* Misc small fixes
@wtgee wtgee requested a review from a team January 22, 2018 01:24
@codecov
Copy link

codecov bot commented Jan 22, 2018

Codecov Report

Merging #407 into develop will increase coverage by 0.05%.
The diff coverage is 83.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop    #407      +/-   ##
==========================================
+ Coverage    69.25%   69.3%   +0.05%     
==========================================
  Files           60      60              
  Lines         4774    4806      +32     
  Branches       659     665       +6     
==========================================
+ Hits          3306    3331      +25     
- Misses        1292    1298       +6     
- Partials       176     177       +1
Impacted Files Coverage Δ
pocs/scheduler/dispatch.py 98.03% <100%> (ø) ⬆️
pocs/observatory.py 83.66% <100%> (+0.14%) ⬆️
pocs/core.py 80.32% <100%> (+0.48%) ⬆️
pocs/scheduler/scheduler.py 96.07% <100%> (+1.39%) ⬆️
pocs/state/states/default/sleeping.py 62.5% <50%> (-37.5%) ⬇️
pocs/state/machine.py 88.27% <66.66%> (+0.08%) ⬆️
pocs/state/states/default/parked.py 75% <70.58%> (-25%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2cff654...e943bad. Read the comment docs.

Copy link
Contributor

@jamessynge jamessynge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, besides some nit and questions.

@@ -4,11 +4,17 @@ def on_enter(event_data):
pocs.next_state = 'ready'

# If it is dark and safe we shouldn't be in sleeping state
if pocs.is_dark() and pocs.is_safe():
if pocs.observatory.scheduler.has_valid_observations and \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 strongly prefers wrapping with parens around expressions rather than using a backslash.

@@ -4,11 +4,17 @@ def on_enter(event_data):
pocs.next_state = 'ready'

# If it is dark and safe we shouldn't be in sleeping state
if pocs.is_dark() and pocs.is_safe():
if pocs.observatory.scheduler.has_valid_observations and \
pocs.is_dark() and pocs.is_safe():
if pocs.should_retry is False:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Property should_retry is very problematic. It looks like a property, but accessing it modifies its value!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for calling this out. It's also problematic in that it is never reset without stopping and starting whole system. So if it retries once per night by the third night it is out of retry attempts no matter what. Will fix.

@@ -119,11 +119,12 @@ def test_get_observation_reread(field_list, observer, temp_file, constraints):
f.write(yaml.dump([{
'name': 'New Name',
'position': '20h00m43.7135s +22d42m39.0645s',
'priority': 50
'priority': 5000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why the change in priority?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out too. I'm not sure doing what I did is quite what we want. Changing now.

@@ -184,9 +184,16 @@ def get_observation(self, *args, **kwargs):
"""

self.logger.debug("Getting observation for observatory")

# If observation list is empty or a reread is requested
if self.scheduler.has_valid_observations is False or \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8 strongly prefers wrapping with parens around expressions rather than using a backslash.

wtgee added 3 commits January 23, 2018 13:35
Note: This changes some of the state behavior for how parking and retrying
is handled. Housekeeping state will now only be called at the very end
of an observing run (either in the morning when unsafe or when too many
retry attempts). This moves some behavior from `sleeping` into `parked`.

* Fix the `should_retry` so it does not modify values
* Create a `reset_observing_run` method. Currenlty only resets number
of retries
* Other PR fixes
* Adding back the ability to reread fields file from the `get_observation`
call so that a user can do it manually rather than relying on an empty
observations list. But do it cleaner.
* Test the rereading of a fields with same entires only with a new higher
priority target
* Skip duplicate observation entries rather than stopping with assertion error.
@wtgee wtgee mentioned this pull request Jan 23, 2018
3 tasks
it for no reason and is also not testing what it says it should be testing.
@wtgee
Copy link
Member Author

wtgee commented Jan 23, 2018

I've removed a whole test in test_pocs.py that was hanging. Looking at the test I can't figure out how it was any different from the test below it. Based off of the name of the test, it was designed to be testing some of the interrupt code that a student added about a year ago (via Huntsman) and which has been removed during repository merge and cleanup because it needs some independent work.

Feel weird about just removing a test but I think it is the right course of action.

if self.scheduler.has_valid_observations is False or \
kwargs.get('reread_fields_file', False):
if (self.scheduler.has_valid_observations is False or
kwargs.get('reread_fields_file', False)):
self.scheduler.read_field_list()

self.scheduler.get_observation(*args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it should return something. Why not return what get_observation returns?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to make it explicit, but get_observation actually also sets current_observation, which is then returned. Should maybe think about renaming this.

@@ -129,7 +130,8 @@ def run(self, exit_when_done=False, run_once=False):
try:
state_changed = self.goto_next_state()
except Exception as e:
self.logger.warning("Problem going to next state, exiting loop [{}]".format(e))
self.logger.warning("Problem going from {} to {}, exiting loop [{!r}]".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for now, but it would be nice to have a recovery mechanism, so that the system automatically tries to get to mount-parked/dome-closed/state-sleeping if things go bad, and then resumes... or waits until the next noon to restart.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state_changed is False it will do as you describe, which should be what happens under normal operations. We should probably do as you describe although I would want to do more testing and focus on it. Created #419 to address.

pocs.next_state = 'housekeeping'
has_valid_observations = pocs.observatory.scheduler.has_valid_observations

if (has_valid_observations and pocs.is_safe() is False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parentheses not needed here.

if (has_valid_observations and pocs.is_safe() is False):
pocs.say("Cleaning up for the night!")
pocs.next_state = 'housekeeping'
elif (has_valid_observations and pocs.is_safe()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excess parens.

pocs.next_state = 'housekeeping'
has_valid_observations = pocs.observatory.scheduler.has_valid_observations

if (has_valid_observations and pocs.is_safe() is False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps:

if has_valid_observations:
  if not pocs.is_safe():
    ...
  else:  # pocs.is_safe
    ...
else: # not has_valid_observations
  ...

* cleaning up code so parking logic is clearer
@jamessynge jamessynge merged commit e2dd720 into panoptes:develop Jan 26, 2018
@wtgee wtgee deleted the no-obs-clean-shutdown-390 branch January 26, 2018 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

End of night shutdown
2 participants