A framework to evaluate various models for tabular regression and classification tasks. The package integrates 25 machine learning (including deep learning) models for tabular prediction tasks from the following well-established model bases:
autogluon
"LightGBM"
,"CatBoost"
,"XGBoost"
,"Random Forest"
,"Extremely Randomized Trees"
,"K-Nearest Neighbors"
,"Linear Regression"
,"Neural Network with MXNet"
,"Neural Network with PyTorch"
,"Neural Network with FastAI"
.
pytorch_widedeep
"TabMlp"
,"TabResnet"
,"TabTransformer"
,"TabNet"
,"SAINT"
,"ContextAttentionMLP"
,"SelfAttentionMLP"
,"FTTransformer"
,"TabPerceiver"
,"TabFastFormer"
.
pytorch_tabular
"Category Embedding"
,"NODE"
,"TabNet"
,"TabTransformer"
,"AutoInt"
,"FTTransformer"
.
You are able to implement your own models, data processing pipelines, and datasets under the flexible and
well-tested framework for consistent comparisons with baseline models, which is even easier when your own model is
based on pytorch
.
Supported features for all model bases:
- Data processing
- Data splitting (training/validation/testing sets)
- Data imputation
- Data filtering
- Data scaling
- Data augmentation
- Feature augmentation
- Feature selection
- etc.
- Multi-modal data
- Loading UCI datasets
- Data/result analysis
- Leaderboard
- Box plot
- Pair plot
- Pearson correlation
- Partial dependency plot (with bootstrapping)
- Feature importance (Permutation and SHAP)
- etc.
- Building models upon other trained models
pytorch_lightning
-based training forpytorch
models- Gaussian-process-based Bayesian hyperparameter optimization
- Cross-validation (including continuing from a cross-validation checkpoint)
- Saving, loading, and migrating models
The package stands on the shoulder of the giants:
- scikit-learn
- PyTorch
- PyTorch Lightning
- etc. (See
requirements.txt
)
A full documentation is available here. For a quick start:
tabular_ensemble
can be installed using pypi by running the following command:
pip install tabensemb[torch]
Please use pip install tabensemb
instead if you already have torch>=1.12.0
installed. Use pip install tabensemb[test]
if you want to run unit tests.
To install from source,
pip install -e .[torch]
- (Optional) Run unit tests after installed
tabensemb[test]
:
cd test
pytest .
- Place your
.csv
or.xlsx
file in adata
subfolder (e.g.,data/sample.csv
), and generate a configuration file in aconfigs
subfolder (e.g.,configs/sample.py
), containing the following content
cfg = {
"database": "sample",
"continuous_feature_names": ["cont_0", "cont_1", "cont_2", "cont_3", "cont_4"],
"categorical_feature_names": ["cat_0", "cat_1", "cat_2"],
"label_name": ["target"],
}
- Run the experiment using the configuration and the data using
python main.py --base sample --epoch 10
where --base
refers to the configuration file, and additional arguments (such as --epoch
here) refer to those in config/default.py
.
See the documentation pages for details.
If you use this repository, please cite us as:
(Will be updated after released on arXiv or published)