Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDPG w/ :RoboschoolHopper-v1 #24

Closed
CUN-bjy opened this issue Dec 31, 2020 · 2 comments
Closed

DDPG w/ :RoboschoolHopper-v1 #24

CUN-bjy opened this issue Dec 31, 2020 · 2 comments
Assignees

Comments

@CUN-bjy
Copy link
Owner

CUN-bjy commented Dec 31, 2020

No description provided.

@CUN-bjy CUN-bjy changed the title DDPG w/ RoboschoolHopper-v1 DDPG w/ :RoboschoolHopper-v1 Dec 31, 2020
@CUN-bjy
Copy link
Owner Author

CUN-bjy commented Dec 31, 2020

experiment specifications
hyperparameter:

  • buffer_size = 20000, batch_size = 128
  • prioritized buffer : False
  • learning_rate: 1e-4,1e-3 for actor, critic
  • tau(target update rate): 1e-3,1e-3 for actor, critic
  • network
    • actor:
        # input layer(observations)
        input_ = Input(shape=self.obs_dim)
      
        # hidden layer 1
        h1_ = Dense(300,kernel_initializer=GlorotNormal())(input_)
        h1_b = BatchNormalization()(h1_)
        h1 = Activation('relu')(h1_b)
      
        # hidden_layer 2
        h2_ = Dense(400,kernel_initializer=GlorotNormal())(h1)
        h2_b = BatchNormalization()(h2_)
        h2 = Activation('relu')(h2_b)
      
        # output layer(actions)
        output_ = Dense(self.act_dim,kernel_initializer=GlorotNormal())(h2)
        output_b = BatchNormalization()(output_)
        output = Activation('tanh')(output_b)
        scalar = self.act_range * np.ones(self.act_dim)
        out = Lambda(lambda i: i * scalar)(output)
    • critic
        # input layer(observations and actions)
        input_obs = Input(shape=self.obs_dim)
        input_act = Input(shape=(self.act_dim,))
        inputs = [input_obs,input_act]
        concat = Concatenate(axis=-1)(inputs)
      
        # hidden layer 1
        h1_ = Dense(300, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(concat)
        h1_b = BatchNormalization()(h1_)
        h1 = Activation('relu')(h1_b)
      
        # hidden_layer 2
        h2_ = Dense(400, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h1)
        h2_b = BatchNormalization()(h2_)
        h2 = Activation('relu')(h2_b)
      
        # output layer(actions)
        output_ = Dense(1, kernel_initializer=GlorotNormal(), kernel_regularizer=l2(0.01))(h2)
        output_b = BatchNormalization()(output_)
        output = Activation('linear')(output_b)

Results

max_timestep per 1epi: 500

  • 3500epi
    hopper
  • 5000epi
    hopper5000 2020-12-31 23-53

Performance

reward 2021-01-01 00-13-28

Critic Loss

critic_loss 2021-01-01 00-13-26

@CUN-bjy CUN-bjy self-assigned this Jan 1, 2021
@CUN-bjy CUN-bjy closed this as completed Jan 1, 2021
@CUN-bjy
Copy link
Owner Author

CUN-bjy commented Jan 1, 2021

training time : 5500epi(about 7 hours)
on intel i7 cpu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant