Skip to content
This repository has been archived by the owner on May 6, 2021. It is now read-only.

Add Bit Flipping Environment[WIP] #116

Merged
merged 2 commits into from
Jan 1, 2021
Merged

Add Bit Flipping Environment[WIP] #116

merged 2 commits into from
Jan 1, 2021

Conversation

sriram13m
Copy link
Contributor

Add Bit Flipping Environment inspired from Hindsight Experience Replay(https://arxiv.org/pdf/1707.01495.pdf)

Add BitFlipping Environment
@sriram13m sriram13m requested a review from findmyway December 31, 2020 15:16
@sriram13m sriram13m marked this pull request as draft December 31, 2020 15:23
@@ -0,0 +1,8 @@
@testset "bit_flipping_env" begin

env = BitFlippingEnv(; N = 7)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an independent rng like

rng = StableRNG(123)
obs_prob = 0.85
env = TigerProblemEnv(; rng = rng, obs_prob = obs_prob)
here to avoid GLOBAL_RNG being polluted.

RLBase.DynamicStyle(::BitFlippingEnv) = SEQUENTIAL
RLBase.ActionStyle(::BitFlippingEnv) = MINIMAL_ACTION_SET
RLBase.InformationStyle(::BitFlippingEnv) = PERFECT_INFORMATION
RLBase.StateStyle(::BitFlippingEnv) = Observation{BitArray{1}}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you support two state styles in this environment. You can return it here.

Suggested change
RLBase.StateStyle(::BitFlippingEnv) = Observation{BitArray{1}}()
RLBase.StateStyle(::BitFlippingEnv) = (Observation{BitArray{1}}(), GoalState())

if env.state == env.goal_state
1.0
else
0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return -1 instead of 0. here based on the description in the original paper:

For every episode we sample uniformly an initial state as well as a target state and the policy gets areward of−1as long as it is not in the target state

RLBase.ActionStyle(::BitFlippingEnv) = MINIMAL_ACTION_SET
RLBase.InformationStyle(::BitFlippingEnv) = PERFECT_INFORMATION
RLBase.StateStyle(::BitFlippingEnv) = Observation{BitArray{1}}()
RLBase.RewardStyle(::BitFlippingEnv) = TERMINAL_REWARD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return a reward of -1 at each non-terminated step, then I think this environment is a STEP_REWARD env?

struct GoalState{T} <: RLBase.AbstractStateStyle end
GoalState() = GoalState{Any}()

mutable struct BitFlippingEnv <: AbstractEnv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this immutable

RLBase.is_terminated(env::BitFlippingEnv) = env.state == env.goal_state

function RLBase.reset!(env::BitFlippingEnv)
env.state = bitrand(env.rng,env.N)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
env.state = bitrand(env.rng,env.N)
env.state .= bitrand(env.rng,env.N)


function RLBase.reset!(env::BitFlippingEnv)
env.state = bitrand(env.rng,env.N)
env.goal_state = bitrand(env.rng,env.N)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
env.goal_state = bitrand(env.rng,env.N)
env.goal_state .= bitrand(env.rng,env.N)

Bug FIxes
@sriram13m sriram13m marked this pull request as ready for review December 31, 2020 16:58
@sriram13m sriram13m requested a review from findmyway January 1, 2021 05:10
@findmyway findmyway merged commit c481c06 into JuliaReinforcementLearning:master Jan 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants