This project sets to create an intelligent trading agent and a trading environment that provides an ideal learning ground. A real-world trading environment is complex with stock, related instruments, macroeconomic, news and possibly alternative data in consideration. An effective agent must derive efficient representations of the environment from high-dimensional input, and generalize past experience to new situation. The project adopts a deep reinforcement learning algorithm, deep deterministic policy gradient (DDPG) to trade a portfolio of five stocks. Different reward system and hyperparameters was tried. Its performance compared to models created by recurrent neural network, modern portfolio theory, simple buy-and-hold and benchmark DJIA index. The agent and environment will then be evaluated to deliberate possible improvement and the agent potential to beat professional human trader, just like Deepmind’s Alpha series of intelligent game playing agents.
The trading agent will learn and trade in OpenAI Gym environment. Two Gym environments are created to serve the purpose, one for training (StarTrader-v0), another testing (StarTraderTest-v0). Both versions of StarTrader will utilize Gym's baseline implmentation of Deep deterministic policy gradient (DDPG).
A portfolio of five stocks (out of 27 Dow Jones Industrial Average stocks) are selected based on non-correlation factor. StarTrader will trade these five non-correlated stocks by learning to maximize total asset (portfolio value + current account balance) as its goal. During the trading process, StarTrader-v0 will also optimize the portfolio by deciding how many stock units to trade for each of the five stocks.
Based on non-correlation factor, a portfolio optimization algorithm has chosen the following five stocks to trade:
- American Express
- Wal Mart
- UnitedHealth Group
- Apple
- Verizon Communications
The preprocessing function creates technical data derived from each of the stock’s OHLCV data. On average there are roughly 6-8 time series data derived for each stock.
Apart from stock data, context data is also used to aid learning:
- S&P 500 index
- Dow Jones Industrial Average index
- NASDAQ Composite index
- Russell 2000 index
- SPDR S&P 500 ETF
- Invesco QQQ Trust
- CBOE Volatility Index
- SPDR Gold Shares
- Treasury Yield 30 Years
- CBOE Interest Rate 10 Year T Note
- iShares 1-3 Year Treasury Bond ETF
- iShares Short Treasury Bond ETF
Similarly, technical data derived from the above context data’s OHLCV data are being created. All data preprocessing is handled by two modules:
- data_preprocessing.py
- feature_select.py
The preprocessed data are then being fed directly to StarTrader’s trading environment: class StarTradingEnv.
The feature selection module (feature_select.py) select about 6-8 features out of 41 OHLCV and its technical data, In total, there are 121 features (may varies on different machine as the algorithm is not seeded) with about 36 stock feature data and the rest are context feature data.
When trading is executed, 121 features along with total asset, current asset holdings and unrealized profit and loss will form a complete state space for the agent to trade and learn. The state space is designed to allow the agent to get a sense of the instantaneous environment in addition to how its interactions with the environment affects future state space. In another words, the trading agent bears the fruits and consequences of its own actions.
No learning or model refinement, purely on testing the trained model.
Trading agent survived the major market correction in 2018 with 1.13 Sharpe ratio.
DDPG is the best performer in terms of cumulative returns. However with a much less volatile ride, RNN-LSTM model has better risk-adjusted return: the highest Sharpe ratio (1.88) and Sortino ratio (3.06). Both RNN-LSTM and DRL-DDPG modelled trading strategies have trading costs: commission (based on Interactive Broker's fee) and slippage (modelled by Zipline and based on stock's daily volume) incorporated since there are many transactions during the trading window. The other buy-and-hold strategies' trading costs are omitted since there is stocks are only transacted once. DDPG's reward system shall be modified to yield higher risk-adjusted return. For a fair comparison, LSTM model uses the same training data and similar backtester as DDPG model.
Python 3.6 or Anaconda with Python 3.6 environment Python packages: pandas, numpy, matplotlib, statsmodels, sklearn, tensorflow
The code is written in a Linux machine and has been tested on two operating systems: Linux Ubuntu 16.04 & Windows 10 Pro
-
Installation of system packages CMake, OpenMPI on Mac
brew install cmake openmpi
-
Activate environemnt and install gym under this environment
pip install gym
-
Download Official Baseline Package
Clone the repo:
git clone https://github.com/openai/baselines.git cd baselines pip install -e .
-
Install Tensorflow
There are several ways of installing Tensorflow, this page provide a good description on how it can be done with system OS, Python version and GPU availability taken into consideration.
https://www.tensorflow.org/install/
In short, after environment activation, Tensorflow can be installed with these commands:
Tensorflow for CPU:
pip3 install --upgrade tensorflow
Tensorflow for GPU:
pip3 install --upgrade tensorflow-gpu
Installing Tensorflow GPU allows faster training if your machine has nVidia GPU(s) built-in. However, Tensorflow GPU version requires the installation of the right cuDNN and CUDA, these pages provide instructions to ensure the right version is installed:
-
Place StarTrader and StarTraderTest folders in this repository to your machine's OpenAI Gym's environment folder:
gym/envs/
-
Replace the
__init__.py
file in the following folder with the__ini__.py
provided in this repository:gym/envs/__init__.py
-
Place run.py in baselines folder to the folder where you want to execute run.py, for example:
From Gym's installation:
baselines/baselines/run.py
To:
run.py
-
Place 'data' folder to the folder where run.py resides
/data/
-
Replace ddpg.py from Gym's installation with the ddpg.py in this repository:
In your machine Gym's installation:
baselines/baselines/ddpg/ddpg.py
replaced by the ddpg.py in repository:
baselines/baselines/ddpg/ddpg.py
-
Replace ddpg_learner.py from Gym's installation with the ddpg_learner.py in this repository:
In your machine Gym's installation:
baselines/baselines/ddpg/ddpg_learner.py
replaced by the ddpg_learner.py in repository:
baselines/baselines/ddpg/ddpg_learner.py
-
Place feature_select.py and data_preprocessing.py in this repository into the same folder as run.py
-
Place the following folders in this repository into the folder where your run.py resides
/test_result/
/train_result/
/model/
You do not need to include the folders' content, they will be generated when the program executes. If contents are included, they will be replaced once program executes.
-
Under the folder where run.py resides enter the following command:
To train agent:
python -m run --alg=ddpg --env=StarTrader-v0 --network=mlp --num_timesteps=2e4
To test agent:
python -m run --alg=ddpg --env=StarTraderTest-v0 --network=mlp --num_timesteps=2e3 --load_path='./model/DDPG_trained_model_8'
If you have trained a better model, replace
DDPG_trained_model_8
with your new model.After training and testing the agent successfully, pick the first DDPG trading book for the test run which is saved as ./test_result/trading_book_test_1.csv or modify filename in compare.py.
Compare agent performance with benchmark index and other trading strategies:python compare.py
-
Depends on machine configuration, the following intallation maybe necessary:
pip3 install -U numpy
pip3 install opencv-python
pip3 install mujoco-py==0.5.7
pip3 install lockfile
-
The technical analysis library, TA-Lib may be tricky to install in some machines. The following page is a handy guide: https://goldenjumper.wordpress.com/tag/ta-lib/
graphiviz which is required to plot the XGBoost tree diagram, can be installed with the following command:
Windows:
conda install python-graphviz
Mac/Linux:
conda install graphviz