An actuated multirotor unmanned aerial vehicle (UAV) in the Quad-X configuration as described by ArduPilot and PX4. It consists of four motors with implementations for cascaded PID controllers. This environment corresponds to the QuadXHover-v1 environment included in the PyFlyt package. It is different in the fact that:
- The reward has been changed to a cost. This was done by negating the reward always to be positive definite.
- A health penalty has been added. This penalty is applied when the quadrotor moves outside the flight dome or crashes. The penalty equals the maximum episode steps minus the steps taken or a user-defined penalty.
- The
max_duration_seconds
has been removed. Instead, themax_episode_steps
parameter of the gym.wrappers.TimeLimit wrapper is used to limit the episode duration.
The rest of the environment is the same as the original QuadXHover environment. Below, the modified cost is described. For more information about the environment (e.g. observation space, action space, episode termination, etc.), please refer to the PyFlyt package documentation.
The cost function of this environment is designed in such a way that it tries to minimize the Euclidean distance error between the quadrotors' current position and a desired hover position (i.e. max_episode_steps
minus the number of steps taken in the episode or a fixed value. The cost is computed as:
Where:
-
$p_{drone}$ - is the current quadrotor position (i.e. x,y,z). -
$p_{hover}$ - is the desired hover position (i.e. x,y,z). -
$\theta_{roll,pitch}$ - is the current quadrotor roll and pitch. -
$p_{health}$ is a penalty for being unhealthy (i.e. if the Quadrotor moves outside the flight dome or crashes).
The health penalty is optional and can be disabled using the include_health_penalty
environment arguments.
In addition to the observations, the cost and a termination and truncation boolean, the environment also returns an info dictionary:
[observation, cost, termination, truncation, info_dict]
Compared to the original QuadXHover-v1 environment, the following keys were added to this info dictionary:
-
reference: The reference that the quadrotor is tracking (i.e. the desired hover position
$p=x_{x,y,z}=[0,0,1]$ ). - state_of_interest: The state that should track the reference (SOI).
- reference_error: The error between SOI and the reference.
This environment is part of the Stable Gym package. It is therefore registered as the stable_gym:QuadXHoverCost-v1
gymnasium environment when you import the Stable Gym package. If you want to use the environment in stand-alone mode, you can register it yourself.