This code models the standard door key problem as a Markov Decision Process and solve it using Dynamic Programming Algorithm. The problem is split as 2 parts. The objective for both parts is to implement a Dynamic Programming algorithm that minimizes the cost of reaching the goal. Mathematically,
Standard maps of size 5x5, 6x6 and 8x8 is given with random goal and key position. The placement of them is also given as input in the environment variable. There will only be one door which might or might not block the path to the key. The goal is to run DP on every input map and find an optimal path sequence. Example of the map is given below:
doorkey-5x5 | doorkey-6x6 | doorkey-8x8 |
---|---|---|
We are given a set of 36 environments and run DP once which computes one general policy for all the environment.
doorkey-8x8-normal |
---|
Some of the results obtained are shown below:
doorkey-5x5 | doorkey-6x6 | doorkey-8x8 |
---|---|---|
Random Map 3 | Random Map 4 | Random Map 22 |
---|---|---|
- Install Python version
3.7 ~ 3.10
- Install dependencies
pip install -r requirements.txt
This is the main entry point for the algorithm
Implements a class to solve problem part A.
Implements a class to solve problem part B.
Useful functions that is used to operate in gym is implemented here
- step(): Move your agent
- generate_random_env(): Generate a random environment for debugging
- load_env(): Load the test environments
- save_env(): Save the environment for reproducing results
- plot_env(): For a quick visualization of your current env, including: agent, key, door, and the goal
- draw_gif_from_seq(): Draw and save a gif image from a given action sequence.
This folder contains the result in the form of gif
This folder contains the .env files which serves as input to the algorithm to construct and operate on the environment.