Skip to content

gamante91/Q-Learning-openAI-gym-Taxi-V2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 

Repository files navigation

The aim of this repo is to solve openAI Taxi-v2 environment using Q-learning.

Environment:

There are 4 locations (labeled by different letters) and the agent's job is to pick up the passenger at one location and drop him off in another.

+---------+
|R: | : :G|   R, G, B, Y are locations on this map
| : : : : |   The pipe symbol (|) represent walls
| : : : : |   The colon symbol (:) represent open spaces
| | : | : |
|Y| : |B: |
+---------+ 

State space and action space:

The state space is finite and has 25x5x4 = 300 possible values:

  • 25 is the number of possible taxi positions in a 5x5 grid
  • 5 is the number of possible passenger positions (R, G, B, Y and inside the taxi)
  • 4 is the number of possible target positions (where the passenger wants to go)

The action space is finite and has 6 possible values, as the agent can perform 6 different actions:

  • move south
  • move north
  • move west
  • move east
  • pickup the passenger
  • dropoff the passenger

Reward signal

The agent receives +20 points for a successful dropoff, and loses 1 point for every timestep it takes. There is also a 10 point penalty for illegal pick-up and drop-off actions. An optimal algorithm will then [...]

Algorithm:

The algorithm used to train the agent is the popular Q-learning (also known as SARSA-max), which makes used of a lookup table of size n_states x n_actions, called the Q-table.
Each Q-table entry Q[s][a] contains a so called Q-value, which represents the expected cumulated reward if the agent takes action a from state s.
These entries are learned by the agent while exploring the environment using the followig update formula:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published