Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate/classic v4 py39 #177

Merged
merged 12 commits into from
Sep 27, 2024
2 changes: 1 addition & 1 deletion .github/workflows/poetry.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.10', '3.11']
python-version: ['3.10', '3.11', '3.12']
steps:
- name: Checkout repository
uses: actions/checkout@v2
Expand Down
18 changes: 6 additions & 12 deletions docs/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

`NumeraiClassicDownloader` simplifies downloading of datasets from Numerai's API. It allows you to easily download data with a few lines and the data is automatically organized in directories.

NOTE: Only int8 versions are available in this downloader. From v4.2 onwards, Numerai only provides int8 versions of the data.
More information: [https://numer.ai/data](https://numer.ai/data)


```py
from numerblox.download import NumeraiClassicDownloader
Expand Down Expand Up @@ -35,9 +36,9 @@ meta_model_preds = pd.read_parquet("my_numerai_data_folder/meta_model.parquet")

## Numerai Signals

Numerai provides a basic dataset for Numerai Signals. This is a good starting point for new users.
Numerai provides a dataset for Numerai Signals. This is a good starting point for new users.

More information: [https://signals.numer.ai/data/v1.0](https://signals.numer.ai/data/v1.0)
More information: [https://signals.numer.ai/data](https://signals.numer.ai/data)

```py
from numerblox.download import NumeraiSignalsDownloader
Expand All @@ -53,9 +54,9 @@ dl.download_live_data()

## Numerai Crypto

For Numerai Crypto there are some basic files to download.
For Numerai Crypto there are files to download.

More information: [https://crypto.numer.ai/data/v1.0](https://crypto.numer.ai/data/v1.0)
More information: [https://crypto.numer.ai/data](https://crypto.numer.ai/data)

```py
from numerblox.download import NumeraiCryptoDownloader
Expand Down Expand Up @@ -156,10 +157,3 @@ class AwesomeCustomDownloader(BaseDownloader):
...

```







20 changes: 19 additions & 1 deletion docs/submission.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,25 @@ dataf = pd.DataFrame(columns=['bloomberg_ticker', 'signal'])
submitter.full_submission(dataf=dataf,
cols=["bloomberg_ticker", "signal"],
file_name="submission.csv",
model_name="my_model")
model_name="my_signals_model")
```

## Numerai Crypto

`NumeraiCryptoSubmitter` has checks specific to Crypto. Mainly, it checks if the data contains a valid symbol column (`"symbol"`) and a `'signal'` column.

`NumeraiCryptoSubmitter.full_submission` handles checks, saving of CSV and uploading with `numerapi`.

```py
from numerblox.submission import NumeraiCryptoSubmitter
submitter = NumeraiCryptoSubmitter(directory_path="sub_current_round", key=key)
# Your prediction file with 'id' as index, a valid symbol column and signal column below.
dataf = pd.DataFrame(columns=['symbol', 'signal'])
# Only works with valid key credentials and model_name
submitter.full_submission(dataf=dataf,
cols=["symbol", "signal"],
file_name="submission.csv",
model_name="my_crypto_model")
```

## NumerBay
Expand Down
2 changes: 1 addition & 1 deletion examples/end_to_end.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
"source": [
"from numerblox.numerframe import create_numerframe\n",
"\n",
"df = create_numerframe(\"../tests/test_assets/train_int8_5_eras.parquet\")"
"df = create_numerframe(\"../tests/test_assets/val_3_eras.parquet\")"
]
},
{
Expand Down
8 changes: 4 additions & 4 deletions examples/numerai_pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
"from sklearn.compose import make_column_transformer\n",
"from sklego.preprocessing import ColumnSelector\n",
"\n",
"from numerblox.preprocessing import GroupStatsPreProcessor, V4_2_FEATURE_GROUP_MAPPING\n",
"from numerblox.preprocessing import GroupStatsPreProcessor, V5_FEATURE_GROUP_MAPPING\n",
"from numerblox.meta import MetaEstimator\n",
"from numerblox.neutralizers import FeatureNeutralizer\n",
"\n",
Expand All @@ -55,7 +55,7 @@
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_parquet(\"../tests/test_assets/train_int8_5_eras.parquet\")"
"df = pd.read_parquet(\"../tests/test_assets/val_3_eras.parquet\")"
]
},
{
Expand Down Expand Up @@ -389,8 +389,8 @@
"metadata": {},
"outputs": [],
"source": [
"random_int_features = np.random.choice(V4_2_FEATURE_GROUP_MAPPING['intelligence'], size=10, replace=False).tolist()\n",
"random_char_features = np.random.choice(V4_2_FEATURE_GROUP_MAPPING['charisma'], size=10, replace=False).tolist()"
"random_int_features = np.random.choice(V5_FEATURE_GROUP_MAPPING['intelligence'], size=10, replace=False).tolist()\n",
"random_char_features = np.random.choice(V5_FEATURE_GROUP_MAPPING['charisma'], size=10, replace=False).tolist()"
]
},
{
Expand Down
59 changes: 16 additions & 43 deletions numerblox/download.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,25 +168,22 @@ def __call__(self, *args, **kwargs):

class NumeraiClassicDownloader(BaseDownloader):
"""
WARNING: Versions 1-3 (legacy data) are deprecated. Only supporting version 4+.
Download from NumerAPI for Numerai Classic data.
More information: https://numer.ai/data

Downloading from NumerAPI for Numerai Classic data. \n
:param directory_path: Base folder to download files to. \n
:param directory_path: Base folder to download files to.
All kwargs will be passed to NumerAPI initialization.
"""
TRAIN_DATASET_NAME = "train_int8.parquet"
TRAIN_DATASET_NAME_5 = "train.parquet"
VALIDATION_DATASET_NAME = "validation_int8.parquet"
VALIDATION_DATASET_NAME_5 = "validation.parquet"
LIVE_DATASET_NAME = "live_int8.parquet"
LIVE_DATASET_NAME_5 = "live.parquet"
TRAIN_DATASET_NAME = "train.parquet"
VALIDATION_DATASET_NAME = "validation.parquet"
LIVE_DATASET_NAME = "live.parquet"
LIVE_EXAMPLE_PREDS_NAME = "live_example_preds.parquet"
VALIDATION_EXAMPLE_PREDS_NAME = "validation_example_preds.parquet"

def __init__(self, directory_path: str, **kwargs):
super().__init__(directory_path=directory_path)
self.napi = NumerAPI(**kwargs)
# Get all available versions available for Numerai Classic.
# Get all available versions for Numerai Classic.
self.dataset_versions = set(s.split("/")[0] for s in self.napi.list_datasets())
self.dataset_versions.discard("signals")

Expand All @@ -198,20 +195,11 @@ def download_training_data(
:param subfolder: Specify folder to create folder within base directory root.
Saves in base directory root by default.
:param version: Numerai dataset version.
4 = April 2022 dataset
4.1 = Sunshine
4.2 = Rain
4.3 = Midnight
5.0 = Atlas (default)
"""
self._check_dataset_version(version)
if float(version) >= 5.0:
train_val_files = [f"v{version}/{self.TRAIN_DATASET_NAME_5}",
f"v{version}/{self.VALIDATION_DATASET_NAME_5}"]
else:
print("WARNING: v4 data will only be supported until Sept. 27, 2024!!!")
train_val_files = [f"v{version}/{self.TRAIN_DATASET_NAME}",
f"v{version}/{self.VALIDATION_DATASET_NAME}"]
train_val_files = [f"v{version}/{self.TRAIN_DATASET_NAME}",
f"v{version}/{self.VALIDATION_DATASET_NAME}"]
for file in train_val_files:
dest_path = self._get_dest_path(subfolder, file)
self.download_single_dataset(
Expand Down Expand Up @@ -250,19 +238,11 @@ def download_live_data(
:param subfolder: Specify folder to create folder within directory root.
Saves in directory root by default.
:param version: Numerai dataset version.
4 = April 2022
4.1 = Sunshine
4.2 = Rain
4.3 = Midnight
5.0 = Atlas (default)
:param round_num: Numerai tournament round number. Downloads latest round by default.
"""
self._check_dataset_version(version)
if float(version) >= 5.0:
live_files = [f"v{version}/{self.LIVE_DATASET_NAME_5}"]
else:
print("WARNING: v4 data will only be supported until Sept. 27, 2024!!!")
live_files = [f"v{version}/{self.LIVE_DATASET_NAME}"]
live_files = [f"v{version}/{self.LIVE_DATASET_NAME}"]
for file in live_files:
dest_path = self._get_dest_path(subfolder, file)
self.download_single_dataset(
Expand All @@ -280,10 +260,6 @@ def download_example_data(
:param subfolder: Specify folder to create folder within base directory root.
Saves in base directory root by default.
:param version: Numerai dataset version.
4 = April 2022 dataset
4.1 = Sunshine
4.2 = Rain
4.3 = Midnight
5.0 = Atlas (default)
:param round_num: Numerai tournament round number. Downloads latest round by default.
"""
Expand Down Expand Up @@ -337,7 +313,8 @@ def _check_dataset_version(self, version: str):

class NumeraiSignalsDownloader(BaseDownloader):
"""
Support for Numerai Signals v1 parquet data.
Support for Numerai Signals data.
More information: https://signals.numer.ai/data
Downloading from SignalsAPI for Numerai Signals data. \n
:param directory_path: Base folder to download files to. \n
All kwargs will be passed to SignalsAPI initialization.
Expand All @@ -355,14 +332,13 @@ def __init__(self, directory_path: str, **kwargs):
self.dataset_versions = set(s.replace("signals/", "").split("/")[0] for s in self.sapi.list_datasets() if s.startswith("signals/v"))

def download_training_data(
self, subfolder: str = "", version: str = "1.0"
self, subfolder: str = "", version: str = "2.0"
):
"""
Get Numerai Signals training and validation data.
:param subfolder: Specify folder to create folder within base directory root.
Saves in base directory root by default.
:param version: Numerai Signals dataset version.
Currently only v1.0 is supported.
"""
self._check_dataset_version(version)
train_val_files = [f"signals/v{version}/{self.TRAIN_DATASET_NAME}",
Expand Down Expand Up @@ -394,15 +370,14 @@ def download_single_dataset(
def download_live_data(
self,
subfolder: str = "",
version: str = "1.0",
version: str = "2.0",
):
"""
Download all live data in specified folder (i.e. minimal data needed for inference).

:param subfolder: Specify folder to create folder within directory root.
Saves in directory root by default.
:param version: Numerai dataset version.
Currently only v1.0 is supported.
"""
self._check_dataset_version(version)
live_files = [f"signals/v{version}/{self.LIVE_DATASET_NAME}"]
Expand All @@ -414,15 +389,14 @@ def download_live_data(
)

def download_example_data(
self, subfolder: str = "", version: str = "1.0"
self, subfolder: str = "", version: str = "2.0"
):
"""
Download all example prediction data in specified folder for given version.

:param subfolder: Specify folder to create folder within base directory root.
Saves in base directory root by default.
:param version: Numerai dataset version.
Currently only v1.0 is supported.
"""
self._check_dataset_version(version)
example_files = [f"signals/v{version}/{self.LIVE_EXAMPLE_PREDS_NAME}",
Expand All @@ -440,6 +414,7 @@ def _check_dataset_version(self, version: str):
class NumeraiCryptoDownloader(BaseDownloader):
"""
Download Numerai Crypto data.
More information: https://crypto.numer.ai/data

:param directory_path: Base folder to download files to.
"""
Expand All @@ -462,7 +437,6 @@ def download_training_data(
:param subfolder: Specify folder to create folder within directory root.
Saves in directory root by default.
:param version: Numerai dataset version.
Currently only v1.0 is supported.
"""
self._check_dataset_version(version)
training_files = [f"crypto/v{version}/{self.TRAIN_TARGETS_NAME}"]
Expand All @@ -484,7 +458,6 @@ def download_live_data(
:param subfolder: Specify folder to create folder within directory root.
Saves in directory root by default.
:param version: Numerai dataset version.
Currently only v1.0 is supported.
"""
self._check_dataset_version(version)
live_files = [f"crypto/v{version}/{self.LIVE_DATASET_NAME}"]
Expand Down
3 changes: 1 addition & 2 deletions numerblox/evaluation.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import time
import sklearn
import numpy as np
import pandas as pd
from numerai_tools import scoring as nt_scoring, signals as nt_signals
Expand Down Expand Up @@ -1000,7 +999,7 @@ def full_evaluation(
feature_set = set(dataf.columns)
if set(self.fncv3_features).issubset(feature_set):
print(
"Using 'v4.2/features.json/fncv3_features' feature set to calculate FNC metrics."
"Using 'v5/features.json/fncv3_features' feature set to calculate FNC metrics."
)
valid_features = self.fncv3_features
else:
Expand Down
5,951 changes: 3,612 additions & 2,339 deletions numerblox/feature_groups.py

Large diffs are not rendered by default.

34 changes: 11 additions & 23 deletions numerblox/numerframe.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
import pandas as pd
from pathlib import Path
from typing import Union, Tuple, Any, List
from typing import Union, Any
from numerai_era_data.date_utils import (ERA_ONE_START, get_current_era,
get_current_date, get_era_for_date,
get_date_for_era)

from .misc import AttrDict
from .feature_groups import (V4_2_FEATURE_GROUP_MAPPING, FNCV3_FEATURES,
SMALL_FEATURES, MEDIUM_FEATURES, V2_EQUIVALENT_FEATURES,
V3_EQUIVALENT_FEATURES)
from .feature_groups import FNCV3_FEATURES, SMALL_FEATURES, MEDIUM_FEATURES, V5_FEATURE_GROUP_MAPPING


ERA1_TIMESTAMP = pd.Timestamp(ERA_ONE_START)
Expand Down Expand Up @@ -98,26 +96,16 @@ def get_fncv3_feature_data(self) -> "NumerFrame":

@property
def get_small_feature_data(self) -> "NumerFrame":
""" Small subset of the Numerai dataset for v4.2 data. """
""" Small subset of the Numerai dataset for v5 data. """
return self.get_column_selection(cols=SMALL_FEATURES)

@property
def get_medium_feature_data(self) -> "NumerFrame":
""" Medium subset of the Numerai dataset for v4.2 data. """
""" Medium subset of the Numerai dataset for v5 data. """
return self.get_column_selection(cols=MEDIUM_FEATURES)

@property
def get_v2_equivalent_feature_data(self) -> "NumerFrame":
""" Features equivalent to the deprecated v2 Numerai data. For v4.2 data. """
return self.get_column_selection(cols=V2_EQUIVALENT_FEATURES)

@property
def get_v3_equivalent_feature_data(self) -> "NumerFrame":
""" Features equivalent to the deprecated v3 Numerai data. For v4.2 data. """
return self.get_column_selection(cols=V3_EQUIVALENT_FEATURES)

@property
def get_unique_eras(self) -> List[str]:
def get_unique_eras(self) -> list[str]:
""" Get all unique eras in the data. """
return self[self.meta.era_col].unique().tolist()

Expand All @@ -133,9 +121,9 @@ def get_last_n_eras(self, n: int) -> "NumerFrame":

def get_feature_group(self, group: str) -> "NumerFrame":
""" Get feature group based on name or list of names. """
assert group in V4_2_FEATURE_GROUP_MAPPING.keys(), \
f"Group '{group}' not found in {V4_2_FEATURE_GROUP_MAPPING.keys()}"
return self.get_column_selection(cols=V4_2_FEATURE_GROUP_MAPPING[group])
assert group in V5_FEATURE_GROUP_MAPPING.keys(), \
f"Group '{group}' not found in {V5_FEATURE_GROUP_MAPPING.keys()}"
return self.get_column_selection(cols=V5_FEATURE_GROUP_MAPPING[group])

def get_pattern_data(self, pattern: str) -> "NumerFrame":
"""
Expand All @@ -144,7 +132,7 @@ def get_pattern_data(self, pattern: str) -> "NumerFrame":
"""
return self.filter(like=pattern)

def get_feature_target_pair(self, multi_target=False) -> Tuple["NumerFrame", "NumerFrame"]:
def get_feature_target_pair(self, multi_target=False) -> tuple["NumerFrame", "NumerFrame"]:
"""
Get split of feature and target columns.
:param multi_target: Returns only 'target' column by default.
Expand All @@ -154,12 +142,12 @@ def get_feature_target_pair(self, multi_target=False) -> Tuple["NumerFrame", "Nu
y = self.get_target_data if multi_target else self.get_single_target_data
return X, y

def get_era_batch(self, eras: List[Any],
def get_era_batch(self, eras: list[Any],
convert_to_tf = False,
aemlp_batch = False,
features: list = None,
targets: list = None,
*args, **kwargs) -> Tuple["NumerFrame", "NumerFrame"]:
*args, **kwargs) -> tuple["NumerFrame", "NumerFrame"]:
"""
Get feature target pair batch of 1 or multiple eras. \n
:param eras: Selection of era names that should be present in era_col. \n
Expand Down
Loading
Loading