Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-run Julia-based Guard policies before environment reset #26

Open
rallen10 opened this issue Jan 5, 2025 · 1 comment
Open

Pre-run Julia-based Guard policies before environment reset #26

rallen10 opened this issue Jan 5, 2025 · 1 comment

Comments

@rallen10
Copy link
Collaborator

rallen10 commented Jan 5, 2025

In current versions of julia-based guard bot policies (e.g. LBG1_LG3, LBG1_LG4, LBG1_LG5), the julia-based iLQGames solver is only run after the environment is initiated. This can cause very long delays for time-to-first execution of guard policy, especially if the computer running the scenario is also running many other processes (e.g. a computationally expensive bandit policy). In turn, this causes the effective behavior of the guard to vary widely between systems.

Instead, the guard policy should be pre-run prior to environment initialization. This may look like:

  1. initialize and reset environment
  2. wait until first execution of guard policy completes
  3. reset environment again

This will likely take changes to ksp_interface.py which calls the env.reset() function. If the changes are made at this fundamental level that would affect all environments, then all environments would need a version increment (e.g. V1 -> V2)

@rallen10
Copy link
Collaborator Author

rallen10 commented Jan 6, 2025

Did some work on this, but I think it would need a deeper dive to implement correctly. I think the fundamental problem with a "quick fix" as described above is that iLQGames gets run in a separate process (i.e. burn_scheduler_loop defined in LBG1_LG5_ParentEnv), and if that process is started and then stopped during the reset process (which is the case in the quick fix), then I think the solved trajectory is dumped out of memory and you can't use it to speed up first execution after the full reset function is complete and the episode has started (thus defeating the purpose of the pre-run).

Here is a hack attempt at LBG1_LG5_ParentEnv.reset that doesn't really work

def reset(self):
        """Enable iLQGames first execution during episode reset process"""

        # first perform a normal reset that should call KSPDGBaseEnv.reset
        self.logger.info("PRE-RUNNING ENVIRONMENT FOR GUARD INSTANTIATION")
        super().reset()

        # wait for iLQGames first execution to complete
        while self.burn_sched is None:
            time.sleep(1.0)
            self.logger.info("Waiting for Guard initialization to complete...")

        # close environment and do a fresh restart
        self.logger.info("CLOSING PRE-RUN ENVIRONMENT")
        self.close()
        self.logger.info("RESETING ENVIRONMENT TO START EPISODE")
        return super().reset()

Beyond the fundamental problem described above, this reset function has several other issues with the current architecture. This reset gets called by ksp_interface.ksp_interface_loop but it hangs that thread while the full reset is completed. This prevents the observation_handshake from executing in a timely fashion causing the runner.policy_loop in the main thread to timeout. Thus the main thread has unraveled even before the reset process completes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant