- RL Landscape
- RL History
- RL Agents Implementation
- RL Environments
- RL Mechanisms
- RL Applications
- References
- Reference Implementations
- Review Papers
- RL Platforms
- Deep Reinforcement Learning Papers
Source: eleurent/phd-bibliography
- Value Optimization
- Policy Optimization
- [Policy Gradient]
- [Actor Critic]
- [DDPG] [Code]
- [HAC DDPG]
- [DDPG with HER]
- [Clipped PPO]
- [PPO]
- [DDPG] [Code]
- [DFP]
- Imitation
- [Behavioural cloning]
- [Inverse Reinforcement Learning] [Code] [irl-imitation-code]
- [Generative Adversarial Imitation Learning]
- Deep Q Network (DQN) (code)
- Double Deep Q Network (DDQN) (code)
- Dueling Q Network
- Mixed Monte Carlo (MMC) (code)
- Persistent Advantage Learning (PAL) (code)
- Categorical Deep Q Network (C51) (code)
- Quantile Regression Deep Q Network (QR-DQN) (code)
- N-Step Q Learning | Distributed (code)
- Neural Episodic Control (NEC) (code)
- Normalized Advantage Functions (NAF) | Distributed (code)
- Policy Gradients (PG) | Distributed (code)
- Asynchronous Advantage Actor-Critic (A3C) | Distributed (code)
- Deep Deterministic Policy Gradients (DDPG) | Distributed (code)
- Proximal Policy Optimization (PPO) (code)
- Clipped Proximal Policy Optimization (CPPO) | Distributed (code)
- Generalized Advantage Estimation (GAE) (code)
- Direct Future Prediction (DFP) | Distributed (code)
- Behavioral Cloning (BC) (code)
- E-Greedy (code)
- Boltzmann (code)
- Ornstein–Uhlenbeck process (code)
- Normal Noise (code)
- Truncated Normal Noise (code)
- Bootstrapped Deep Q Network (code)
- UCB Exploration via Q-Ensembles (UCB) (code)
- Noisy Networks for Exploration (code)
- Temporal difference(TD) learning (1988)
- Q‐learning (1998)
- BayesRL (2002)
- RMAX (2002)
- CBPI (2002)
- PEGASUS (2002)
- Least‐Squares Policy Iteration (2003)
- Fitted Q‐Iteration (2005)
- GTD (2009)
- UCRL (2010)
- REPS (2010)
- DQN (2014) - DeepMind
- [Acrobot]
- [Bike]
- [Blackjack]
- [Cartpole]
- [ContextBandit]
- [Continuous Chain]
- [Corridor]
- [Discrete Chain]
- [Discretiser (for continuous environments)]
- [Double Loop]
- [Environment]
- [Gridworld]
- [Inventory management]
- [Linear context bandit]
- [Linear dynamic quadratic]
- [Mountaincar (2d and 3d)]
- [POMDP Maze]
- [Optimistic Task]
- [Puddleworld]
- [Random MDPs]
- [Riverswim]
- [Attention and Memory]
- [Unsupervised learning ]
- [GANs]
- [GQN]
- [UNREAL]
- [Hierarchical RL]
- [FuNs]
- [Option-Critic]
- [STRAW]
- [h-DQN]
- [Stochastic Neural Networks]
- [Multi-agent RL]
- [Relational RL]
- [Learning to Learn, a.k.a. Meta-Learning]
- [Few/One/Zero-shot Learning]
- [MAML]
- [Transfer and Multi-Task Learning]
- [Learning to Optimize]
- [Learning to Re-inforcement Learn]
- [Learning Combinatorial Optimization]
- [AutoML]
- [Few/One/Zero-shot Learning]
- Chinook (1997;2007) for Checkers,
- Deep Blue (2002) for chess,
- Logistello (1999) for Othello,
- TD-Gammon (1994) for Backgammon,
- GIB (2001) for contract bridge,
- MoHex (2017) for Hex,
- DQN (2016)(2018) for Atari 2600 games,
- AlphaGo (2016a) and AlphaGo Zero (2017) for Go,
- Alpha Zero (2017) for chess, shogi, and Go,
- Cepheus (2015), DeepStack (2017), and Libratus (2017a;b) for heads-up Texas Hold’em Poker,
- Jaderberg et al. (2018) for Quake III Arena Capture the Flag,
- OpenAI Five, for Dota 2 at 5v5, https://openai.com/five/,
- Zambaldi et al. (2018), Sun et al. (2018), and Pang et al. (2018) for StarCraft II
- [Board Games]
- [Computer Go]
- [AlphaGo: Trainig pipeline with MCTS]
- [AlphaGo Zero]
- [Alpha Zero]
- [Card Games]
- [DeepStack]
- [Video Games]
- [Atari 2600 games]
- [StarCraft]
- [StarCraft II mini-games]
- [Quake III Arena]
- [Minecraft]
- [Super Smash Bros]
- [Doom]
- [ViZDoom]
- [Sim-to-Real]
- [MuJoCo]
- [Imitation Learning]
- [Value-based Learning]
- [Policy-based Learning]
- [Model-based Learning]
- [Autonomous Driving Vehicles]
- [Sequence Generation]
- [Machine Translation]
- [Dialogue Systems]
- [Recognition]
- [Motion Analysis]
- [Scene Understanding]
- [Vision + NLP]
- [Visual Control]
- [Interactive Perception]
- reference implementations
- review papers
- RL platforms
- deep-reinforcement-learning-papers
- ICLR2019-RL-Papers
- deepmind.com/blog
- blog.openai
- openai/spinningup
- DeepRLHacks-Deep RL Bootcamp (Aug 2017)
- Deep RL Bootcamp
- UC Berkeley: Deep Reinforcement Learning
- MIT 6.S094: Deep Learning for Self-Driving Cars
- Deep Reinforcement Learning and Control Spring 2017, CMU 10703
- Sutton & Barto's: Reinforcement Learning: An Introduction
- Algorithms for Reinforcement Learning
- reinforcejs
- Hands-On-Reinforcement-Learning-With-Python
- jetson-reinforcement
- DEEP REINFORCEMENT LEARNING
- Reinforcement Learning Applications
- Lessons Learned Reproducing a Deep Reinforcement Learning Paper
- Scalable Deep Reinforcement Learning for Robotic Manipulation
- Closing the Simulation-to-Reality Gap for Deep Robotic Learning
- Intel AI : demystifying-deep-reinforcement-learning
- Intel AI : deep-reinforcement-learning-with-neon
- Intel AI : reinforcement-learning-coach-intel
- eecs.berkeley.deeprlcourse
- Deep Learning for Video Game Playing
- Deep Reinforcement Learning that Matters
- Adversarial Examples: Attacks and Defenses for Deep Learning
- Reinforcement learning
- A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress
- A Survey of Knowledge Representation and Retrieval for Learning in Service Robotics
- Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
- An Introduction to Deep Reinforcement Learning
- Challenges of Real-World Reinforcement Learning
- Modern Deep Reinforcement Learning Algorithms
Maintainer
Gopala KR / @gopala-kr