Merge branch 'main' into issue-804-thread-multisim

oceanprotocol · Apr 24, 2024 · 848a8d7 · 848a8d7
2 parents 8cbb6a9 + eb083e8
commit 848a8d7
Show file tree

Hide file tree

Showing 41 changed files with 1,260 additions and 948 deletions.
diff --git a/READMEs/predictoor.md b/READMEs/predictoor.md
@@ -59,9 +59,6 @@ codesign --force --deep --sign - venv/sapphirepy_bin/sapphirewrapper-arm64.dylib
 
 ## 2. Simulate Modeling and Trading
 
-> [!WARNING]  
-> Simulation has been temporarily disabled as of version v0.3.3
-
 Simulation allows us to quickly build intuition, and assess the performance of the data / predicting / trading strategy (backtest).
 
 Copy [`ppss.yaml`](../ppss.yaml) into your own file `my_ppss.yaml` and change parameters as you see fit.
@@ -70,22 +67,29 @@ Copy [`ppss.yaml`](../ppss.yaml) into your own file `my_ppss.yaml` and change pa
 cp ppss.yaml my_ppss.yaml
 ```
 
-Let's simulate! In console:
-
+Let's run the simulation engine. In console:
 ```console
 pdr sim my_ppss.yaml
 ```
 
-What it does:
-
+What the engine does does:
 1. Set simulation parameters.
 1. Grab historical price data from exchanges and stores in `parquet_data/` dir. It re-uses any previously saved data.
 1. Run through many 5min epochs. At each epoch:
    - Build a model
    - Predict
    - Trade
-   - Plot profit versus time, more
    - Log to console and `logs/out_<time>.txt`
+   - For plots, output state to `sim_state/`
+
+Let's visualize results. Open a separate console, and:
+```console
+cd ~/code/pdr-backend # or wherever your pdr-backend dir is
+source venv/bin/activate
+
+#display real-time plots of the simulation
+streamlit run sim_plots.py
+```
 
 "Predict" actions are _two-sided_: it does one "up" prediction tx, and one "down" tx, with more stake to the higher-confidence direction. Two-sided is more profitable than one-sided prediction.
 
@@ -97,10 +101,9 @@ To see simulation CLI options: `pdr sim -h`.
 
 Simulation uses Python [logging](https://docs.python.org/3/howto/logging.html) framework. Configure it via [`logging.yaml`](../logging.yaml). [Here's](https://medium.com/@cyberdud3/a-step-by-step-guide-to-configuring-python-logging-with-yaml-files-914baea5a0e5) a tutorial on yaml settings.
 
-Plot profit versus time, more: use `streamlit run sim_plots.py` to display real-time plots of the simulation while it is running. After the final iteration, the app settles into an overview of the final state.
-
 By default, streamlit plots the latest sim (even if it is still running). To enable plotting for a specific run, e.g. if you used multisim or manually triggered different simulations, the sim engine assigns unique ids to each run.
 Select that unique id from the `sim_state` folder, and run `streamlit run sim_plots.py <unique_id>` e.g. `streamlit run sim_plots.py 97f9633c-a78c-4865-9cc6-b5152c9500a3`
+
 You can run many instances of streamlit at once, with different URLs.
 
 ## 3. Run Predictoor Bot on Sapphire Testnet

diff --git a/READMEs/trader.md b/READMEs/trader.md
@@ -59,21 +59,29 @@ Copy [`ppss.yaml`](../ppss.yaml) into your own file `my_ppss.yaml` and change pa
 cp ppss.yaml my_ppss.yaml
 ```
 
-Let's simulate! In console:
-
+Let's run the simulation engine. In console:
 ```console
 pdr sim my_ppss.yaml
 ```
 
-What it does:
-
+What the engine does does:
 1. Set simulation parameters.
 1. Grab historical price data from exchanges and stores in `parquet_data/` dir. It re-uses any previously saved data.
 1. Run through many 5min epochs. At each epoch:
    - Build a model
    - Predict
    - Trade
    - Log to console and `logs/out_<time>.txt`
+   - For plots, output state to `sim_state/`
+
+Let's visualize results. Open a separate console, and:
+```console
+cd ~/code/pdr-backend # or wherever your pdr-backend dir is
+source venv/bin/activate
+
+#display real-time plots of the simulation
+streamlit run sim_plots.py
+```
 
 "Predict" actions are _two-sided_: it does one "up" prediction tx, and one "down" tx, with more stake to the higher-confidence direction. Two-sided is more profitable than one-sided prediction.
 
@@ -85,10 +93,9 @@ To see simulation CLI options: `pdr sim -h`.
 
 Simulation uses Python [logging](https://docs.python.org/3/howto/logging.html) framework. Configure it via [`logging.yaml`](../logging.yaml). [Here's](https://medium.com/@cyberdud3/a-step-by-step-guide-to-configuring-python-logging-with-yaml-files-914baea5a0e5) a tutorial on yaml settings.
 
-Plot profit versus time, more: use `streamlit run sim_plots.py` to display real-time plots of the simulation while it is running. After the final iteration, the app settles into an overview of the final state.
-
 By default, streamlit plots the latest sim (even if it is still running). To enable plotting for a specific run, e.g. if you used multisim or manually triggered different simulations, the sim engine assigns unique ids to each run.
 Select that unique id from the `sim_state` folder, and run `streamlit run sim_plots.py <unique_id>` e.g. `streamlit run sim_plots.py 97f9633c-a78c-4865-9cc6-b5152c9500a3`
+
 You can run many instances of streamlit at once, with different URLs.
 
 ## Run Trader Bot on Sapphire Testnet

diff --git a/READMEs/vps.md b/READMEs/vps.md
@@ -242,10 +242,19 @@ In `my_ppss.yaml` file, in `web3_pp` -> `development` section:
 
 ### Run pdr bot
 
-Then, run a bot with modeling-on-the fly (approach 3). In console:
+Then, run a bot with modeling-on-the fly (approach 2). In console:
 
 ```console
-pdr predictoor 3 my_ppss.yaml development
+pdr predictoor 2 my_ppss.yaml development
+```
+
+Or, to be fancier: (a) add `nohup` so that the run keeps going if the ssh session closes, and (b) output to out.txt (c) observe output 
+```console
+# start bot
+nohup pdr predictoor 2 my_ppss.yaml development 1>out.txt 2>&1 &
+
+# observe output
+tail -f out.txt
 ```
 
 Your bot is running, congrats! Sit back and watch it in action. It will loop continuously.

diff --git a/pdr_backend/aimodel/aimodel_data_factory.py b/pdr_backend/aimodel/aimodel_data_factory.py
@@ -1,6 +1,6 @@
 import logging
 import sys
-from typing import Optional, Tuple
+from typing import List, Optional, Tuple
 
 import numpy as np
 import pandas as pd
@@ -68,8 +68,8 @@ def create_xy(
         self,
         mergedohlcv_df: pl.DataFrame,
         testshift: int,
-        feed: ArgFeed,
-        feeds: Optional[ArgFeeds] = None,
+        predict_feed: ArgFeed,
+        train_feeds: Optional[ArgFeeds] = None,
         do_fill_nans: bool = True,
     ) -> Tuple[np.ndarray, np.ndarray, pd.DataFrame, np.ndarray]:
         """
@@ -80,6 +80,8 @@ def create_xy(
         @arguments
           mergedohlcv_df -- *polars* DataFrame. See class docstring
           testshift -- to simulate across historical test data
+          predict_feed -- feed to predict
+          train_feeds -- feeds to use for model inputs. If None use predict feed
           do_fill_nans -- if any values are nan, fill them? (Via interpolation)
             If you turn this off and mergedohlcv_df has nans, then X/y/etc gets nans
 
@@ -94,27 +96,30 @@ def create_xy(
         assert "timestamp" in mergedohlcv_df.columns
         assert "datetime" not in mergedohlcv_df.columns
 
-        # every column should be ordered with oldest first, youngest last.
-        # let's verify! The timestamps should be in ascending order
+        # condition mergedohlcv_df
+        # - every column should be ordered with oldest first, youngest last.
+        #  let's verify! The timestamps should be in ascending order
         uts = mergedohlcv_df["timestamp"].to_list()
         assert uts == sorted(uts, reverse=False)
-
-        # condition inputs
         if do_fill_nans and has_nan(mergedohlcv_df):
             mergedohlcv_df = fill_nans(mergedohlcv_df)
-        ss = self.ss.aimodel_ss
-        x_dim_len = 0
-        if not feeds:
-            x_dim_len = ss.n
-            feeds = ss.feeds
+
+        # condition other inputs
+        train_feeds_list: List[ArgFeed]
+        if train_feeds:
+            train_feeds_list = train_feeds
         else:
-            x_dim_len = len(feeds) * ss.autoregressive_n
+            train_feeds_list = [predict_feed]
+        ss = self.ss.aimodel_ss
+        x_dim_len = len(train_feeds_list) * ss.autoregressive_n
+
         # main work
         x_df = pd.DataFrame()  # build this up
         xrecent_df = pd.DataFrame()  # ""
 
         target_hist_cols = [
-            f"{feed.exchange}:{feed.pair}:{feed.signal}" for feed in feeds
+            f"{train_feed.exchange}:{train_feed.pair}:{train_feed.signal}"
+            for train_feed in train_feeds_list
         ]
         for hist_col in target_hist_cols:
             assert hist_col in mergedohlcv_df.columns, f"missing data col: {hist_col}"
@@ -146,7 +151,7 @@ def create_xy(
 
         # y is set from yval_{exch_str, signal_str, pair_str}
         # eg y = [BinEthC_-1, BinEthC_-2, ..., BinEthC_-450, BinEthC_-451]
-        hist_col = f"{feed.exchange}:{feed.pair}:{feed.signal}"
+        hist_col = f"{predict_feed.exchange}:{predict_feed.pair}:{predict_feed.signal}"
         z = mergedohlcv_df[hist_col].to_list()
         y = np.array(_slice(z, -testshift - N_train - 1, -testshift))