Update docs

nautechsystems · Oct 15, 2023 · d3527b2 · d3527b2
1 parent 619733a
commit d3527b2
Show file tree

Hide file tree

Showing 10 changed files with 149 additions and 28 deletions.
diff --git a/docs/concepts/advanced/advanced_orders.md b/docs/concepts/advanced/advanced_orders.md
@@ -19,19 +19,19 @@ specific exchange they are being routed to.
 These contingency types relate to ContingencyType FIX tag <1385> https://www.onixs.biz/fix-dictionary/5.0.sp2/tagnum_1385.html.
 ```
 
-### One Triggers the Other (OTO)
+### *'One Triggers the Other'* (OTO)
 An OTO orders involves two orders—a parent order and a child order. The parent order is a live 
 marketplace order. The child order, held in a separate order file, is not. If the parent order 
 executes in full, the child order is released to the marketplace and becomes live. 
 An OTO order can be made up of stock orders, option orders, or a combination of both.
 
-### One Cancels the Other (OCO)
+### *'One Cancels the Other'* (OCO)
 An OCO order is an order whose execution results in the immediate cancellation of another order 
 linked to it. Cancellation of the Contingent Order happens on a best efforts basis. 
 In an OCO order, both orders are live in the marketplace at the same time. The execution of either 
 order triggers an attempt to cancel the other unexecuted order. Partial executions will also trigger an attempt to cancel the other order.
 
-### One Updates the Other (OUO)
+### *'One Updates the Other'* (OUO)
 An OUO order is an order whose execution results in the immediate reduction of quantity in another 
 order linked to it. The quantity reduction happens on a best effort basis. In an OUO order both 
 orders are live in the marketplace at the same time. The execution of either order triggers an 

diff --git a/docs/concepts/advanced/custom_data.md b/docs/concepts/advanced/custom_data.md
@@ -6,6 +6,10 @@ guide covers some possible use cases for this functionality.
 It's possible to create custom data types within the Nautilus system. First you
 will need to define your data by subclassing from `Data`.
 
+```{note}
+As `Data` holds no state, it is not strictly necessary to call `super().__init__()`.
+```
+
 ```python
 from nautilus_trader.core.data import Data
 
@@ -67,10 +71,6 @@ The recommended approach to satisfy the contract is to assign `ts_event` and `ts
 to backing fields, and then implement the `@property` for each as shown above 
 (for completeness, the docstrings are copied from the `Data` base class).
 
-```{note}
-As `Data` holds no state, it is not strictly necessary to call `super().__init__()`. 
-```
-
 ```{note}
 These timestamps are what allow Nautilus to correctly order data streams for backtests 
 by monotonically increasing `ts_init` UNIX nanoseconds.

diff --git a/docs/concepts/advanced/emulated_orders.md b/docs/concepts/advanced/emulated_orders.md
@@ -2,7 +2,7 @@
 
 The platform makes it possible to emulate most order types locally, regardless
 of whether the type is supported on a trading venue. The logic and code paths for 
-order emulation are exactly the same for all environment contexts (backtest, sandbox, live), 
+order emulation are exactly the same for all environment contexts (`backtest`, `sandbox`, `live`)
 and utilize a common `OrderEmulator` component.
 
 ```{note}

diff --git a/docs/concepts/advanced/synthetic_instruments.md b/docs/concepts/advanced/synthetic_instruments.md
@@ -3,7 +3,7 @@
 The platform supports the definition of customized synthetic instruments. 
 These instruments can generate synthetic quote and trade ticks, which are beneficial for:
 
-- Allowing actors (and strategies) to subscribe to quote or trade feeds (for any purpose)
+- Allowing `Actor` (and `Strategy`) components to subscribe to quote or trade feeds (for any purpose)
 - Facilitating the triggering of emulated orders
 - Constructing bars from synthetic quotes or trades
 
@@ -67,7 +67,7 @@ self.subscribe_quote_ticks(self._synthetic_id)
 ```
 
 ```{note}
-The `instrument_id` for the synthetic instrument in the above example will be structured as `{symbol}.{SYNTH}`, resulting in 'BTC-ETH:BINANCE.SYNTH'.
+The `instrument_id` for the synthetic instrument in the above example will be structured as `{symbol}.{SYNTH}`, resulting in `'BTC-ETH:BINANCE.SYNTH'`.
 ```
 
 ## Updating formulas

diff --git a/docs/concepts/architecture.md b/docs/concepts/architecture.md
@@ -107,7 +107,7 @@ for each of these subpackages from the left nav menu.
 ### System implementations
 - `backtest` - backtesting componentry as well as a backtest engine and node implementations
 - `live` - live engine and client implementations as well as a node for live trading
-- `system` - the core system kernel common between backtest, sandbox and live contexts
+- `system` - the core system kernel common between `backtest`, `sandbox`, `live` contexts
 
 ## Code structure
 The foundation of the codebase is the `nautilus_core` directory, containing a collection of core Rust libraries including a C API interface generated by `cbindgen`. 

diff --git a/docs/concepts/backtesting.md b/docs/concepts/backtesting.md
@@ -2,7 +2,7 @@
 
 Backtesting with NautilusTrader is a methodical simulation process that replicates trading
 activities using a specific system implementation. This system is composed of various components
-including [Actors](), [Strategies](/docs/concepts/strategies.md), [Execution Algorithms](/docs/concepts/execution.md),
+including [Actors](advanced/actors.md), [Strategies](strategies.md), [Execution Algorithms](execution.md),
 and other user-defined modules. The entire trading simulation is predicated on a stream of historical data processed by a
 `BacktestEngine`. Once this data stream is exhausted, the engine concludes its operation, producing 
 detailed results and performance metrics for in-depth analysis.

diff --git a/docs/concepts/data.md b/docs/concepts/data.md
@@ -7,7 +7,7 @@ a trading domain:
 - `OrderBookDeltas` (L1/L2/L3) - Bundles multiple order book deltas
 - `QuoteTick` - Top-of-book best bid and ask prices and sizes
 - `TradeTick` - A single trade/match event between counterparties
-- `Bar` - OHLCV data aggregated using a specific method
+- `Bar` - OHLCV 'bar' data, aggregated using a specific *method*
 - `Ticker` - General base class for a symbol ticker
 - `Instrument` - General base class for a tradable instrument
 - `VenueStatus` - A venue level status event
@@ -18,28 +18,71 @@ Each of these data types inherits from `Data`, which defines two fields:
 - `ts_event` - The UNIX timestamp (nanoseconds) when the data event occurred
 - `ts_init` - The UNIX timestamp (nanoseconds) when the object was initialized
 
-This inheritance ensures chronological data ordering, vital for backtesting, while also enhancing analytics.
+This inheritance ensures chronological data ordering (vital for backtesting), while also enhancing analytics.
 
-Consistency is key; data flows through the platform in exactly the same way between all system contexts (backtest, sandbox and live),
+Consistency is key; data flows through the platform in exactly the same way for all system contexts (`backtest`, `sandbox`, `live`)
 primarily through the `MessageBus` to the `DataEngine` and onto subscribed or registered handlers.
 
-For those seeking customization, the platform supports user-defined data types. Refer to the [advanced custom guide](/docs/concepts/advanced/custom_data.md) for more details.
+For those seeking customization, the platform supports user-defined data types. Refer to the advanced [Custom/Generic data guide](advanced/custom_data.md) for more details.
 
 ## Loading data
 
 NautilusTrader facilitates data loading and conversion for three main use cases:
-- Populating the `BacktestEngine` directly
-- Persisting the Nautilus-specific Parquet format via `ParquetDataCatalog.write_data(...)` to be used with a `BacktestNode`
-- Research purposes
+- Populating the `BacktestEngine` directly to run backtests
+- Persisting the Nautilus-specific Parquet format for the data catalog via `ParquetDataCatalog.write_data(...)` to be later used with a `BacktestNode`
+- For research purposes (to ensure data is consistent between research and backtesting)
 
 Regardless of the destination, the process remains the same: converting diverse external data formats into Nautilus data structures.
-To achieve this two components are necessary:
-- A data loader which can read the data and return a `pd.DataFrame` with the correct schema for the desired Nautilus object
-- A data wrangler which takes this `pd.DataFrame` and returns a `list[Data]` of Nautilus objects
 
-`raw data (e.g. CSV)` -> `*DataLoader` -> `pd.DataFrame` -> `*DataWrangler` -> Nautilus `list[Data]`
+To achieve this, two main components are necessary:
+- A type of DataLoader (normally specific per raw source/format) which can read the data and return a `pd.DataFrame` with the correct schema for the desired Nautilus object
+- A type of DataWrangler (specific per data type) which takes this `pd.DataFrame` and returns a `list[Data]` of Nautilus objects
 
-Conceretely, this would involve for example:
+### Data loaders
+
+Data loader components are typically specific for the raw source/format and per integration. For instance, Binance order book data is stored in its raw CSV file form with
+an entirely different format to [Databento Binary Encoding (DBN)](https://docs.databento.com/knowledge-base/new-users/dbn-encoding/getting-started-with-dbn) files.
+
+### Data wranglers
+
+Data wranglers are implemented per specific Nautilus data type, and can be found in the `nautilus_trader.persistence.wranglers` modules.
+Currently there exists:
+- `OrderBookDeltaDataWrangler`
+- `QuoteTickDataWrangler`
+- `TradeTickDataWrangler`
+- `BarDataWrangler`
+
+```{warning}
+At the risk of causing confusion, there are also a growing number of DataWrangler v2 components, which will take a `pd.DataFrame` typically
+with a different fixed width Nautilus arrow v2 schema, and output pyo3 Nautilus objects which are only compatible with the new version
+of the Nautilus core, currently in development.
+
+**These pyo3 provided data objects are not compatible where the legacy Cython objects are currently used (adding directly to a `BacktestEngine` etc).**
+```
+
+### Transformation pipeline
+
+**Process flow:**
+1. Raw data (e.g., CSV) is input into the pipeline
+2. DataLoader processes the raw data and converts it into a `pd.DataFrame`
+3. DataWrangler further processes the `pd.DataFrame` to generate a list of Nautilus objects
+4. The Nautilus `list[Data]` is the output of the data loading process
+
+```
+  ┌──────────┐    ┌──────────────────────┐                  ┌──────────────────────┐
+  │          │    │                      │                  │                      │
+  │          │    │                      │                  │                      │
+  │ Raw data │    │                      │  `pd.DataFrame`  │                      │
+  │ (CSV)    ├───►│      DataLoader      ├─────────────────►│     DataWrangler     ├───► Nautilus `list[Data]`
+  │          │    │                      │                  │                      │
+  │          │    │                      │                  │                      │
+  │          │    │                      │                  │                      │
+  └──────────┘    └──────────────────────┘                  └──────────────────────┘
+
+- This diagram illustrates how raw data is transformed into Nautilus data structures.
+```
+
+Conceretely, this would involve:
 - `BinanceOrderBookDeltaDataLoader.load(...)` which reads CSV files provided by Binance from disk, and returns a `pd.DataFrame`
 - `OrderBookDeltaDataWrangler.process(...)` which takes the `pd.DataFrame` and returns `list[OrderBookDelta]`
 
@@ -81,4 +124,55 @@ from the `/serialization/arrow/schema.py` module.
 2023-10-14: The current plan is to eventually phase out the Python schemas module, so that all schemas are single sourced in the Rust core.
 ```
 
-**This doc is an evolving work in progress and will continue to describe the data catalog more fully...**
+### Initializing
+The data catalog can be initialized from a `NAUTILUS_PATH` environment variable, or by explicitly passing in a path like object.
+
+The following example shows how to initialize a data catalog where there is pre-existing data already written to disk at the given path.
+
+```python
+CATALOG_PATH = os.getcwd() + "/catalog"
+
+# Create a new catalog instance
+catalog = ParquetDataCatalog(CATALOG_PATH)
+```
+
+### Writing data
+New data can be stored in the catalog, which is effectively writing the given data to disk in the Nautilus-specific Parquet format.
+All Nautilus built-in `Data` objects are supported, and any data which inherits from `Data` can be written.
+
+The following example shows the above list of Binance `OrderBookDelta` objects being written.
+```python
+catalog.write_data(deltas)
+```
+
+Rust Arrow schema implementations and available for the follow data types (enhanced performance):
+- `OrderBookDelta`
+- `QuoteTick`
+- `TradeTick`
+- `Bar`
+
+### Reading data
+Any stored data can then we read back into memory:
+```python
+start = dt_to_unix_nanos(pd.Timestamp("2020-01-03", tz=pytz.utc))
+end =  dt_to_unix_nanos(pd.Timestamp("2020-01-04", tz=pytz.utc))
+
+deltas = catalog.order_book_deltas(instrument_ids=[instrument.id.value], start=start, end=end)
+```
+
+### Streaming data
+When running backtests in streaming mode with a `BacktestNode`, the data catalog can be used to stream the data in batches.
+
+The following example shows how to achieve this by initializing a `BacktestDataConfig` configuration object:
+```python
+data_config = BacktestDataConfig(
+    catalog_path=str(catalog.path),
+    data_cls=OrderBookDelta,
+    instrument_id=instrument.id.value,
+    start_time=start,
+    end_time=end,
+)
+```
+
+This configuration object then be passed into a `BacktestRunConfig` and then in turn passed into a `BacktestNode` as part of a run.
+See the [Backtest (high-level API)](../tutorials/backtest_high_level.md) tutorial for more details.
diff --git a/docs/concepts/execution.md b/docs/concepts/execution.md
@@ -37,6 +37,33 @@ The general execution flow looks like the following (each arrow indicates moveme
 The `OrderEmulator` and `ExecAlgorithm`(s) components are optional in the flow, depending on
 individual order parameters (as explained below).
 
+```
+                    ┌───────────────────┐
+                    │                   │
+                    │                   │
+            ┌───────►                   ├────────────┐
+            │       │   OrderEmulator   │            │
+            │       │                   │            │
+  ┌─────────┴──┐    │                   │            │
+  │            │    │                   │    ┌───────▼────────┐   ┌─────────────────────┐   ┌─────────────────────┐
+  │            │    └───────┬───▲───────┘    │                │   │                     │   │                     │
+  │            │            │   │            │                ├───►                     ├───►                     │
+  │  Strategy  ◄────────────┼───┼────────────┤                │   │                     │   │                     │
+  │            │            │   │            │   RiskEngine   │   │   ExecutionEngine   │   │   ExecutionClient   │
+  │            │            │   │            │                ◄───┤                     ◄───┤                     │
+  │            │    ┌───────▼───┴───────┐    │                │   │                     │   │                     │
+  │            │    │                   │    │                │   │                     │   │                     │
+  └─────────┬──┘    │                   │    └────────▲───────┘   └─────────────────────┘   └─────────────────────┘
+            │       │                   │             │
+            │       │   ExecAlgorithm   ├─────────────┘
+            │       │                   │
+            └───────►                   │
+                    │                   │
+                    └───────────────────┘
+
+- This diagram illustrates message flow (commands and events) across the Nautilus execution components.
+```
+
 ## Execution algorithms
 
 The platform supports customized execution algorithm components and provides some built-in 
@@ -190,7 +217,7 @@ or confusion with the "parent" and "child" contingency orders terminology (an ex
 The `Cache` provides several methods to aid in managing (keeping track of) the activity of
 an execution algorithm:
 
-```python
+```cython
 
 cpdef list orders_for_exec_algorithm(
     self,

diff --git a/docs/concepts/overview.md b/docs/concepts/overview.md
@@ -74,7 +74,7 @@ The platform is designed to be easily integrated into a larger distributed syste
 To facilitate this, nearly all configuration and domain objects can be serialized using JSON, MessagePack or Apache Arrow (Feather) for communication over the network.
 
 ## Common core
-The common system core is utilized by both the backtest, sandbox, and live trading nodes. 
+The common system core is utilized by all node contexts `backtest`, `sandbox`, and `live`.
 User-defined Actor, Strategy and ExecAlgorithm components are managed consistently across these environment contexts.
 
 ## Backtesting

diff --git a/nautilus_trader/system/kernel.py b/nautilus_trader/system/kernel.py
@@ -93,7 +93,7 @@ class NautilusKernel:
     """
     Provides the core Nautilus system kernel.
 
-    The kernel is common between backtest, sandbox and live environment context types.
+    The kernel is common between ``backtest``, ``sandbox`` and ``live`` environment context types.
 
     Parameters
     ----------