tabular_ensemble

A framework to evaluate various models for tabular regression and classification tasks. The package integrates 25 machine learning (including deep learning) models for tabular prediction tasks from the following well-established model bases:

autogluon
- "LightGBM", "CatBoost", "XGBoost", "Random Forest", "Extremely Randomized Trees", "K-Nearest Neighbors", "Linear Regression", "Neural Network with MXNet", "Neural Network with PyTorch", "Neural Network with FastAI".
pytorch_widedeep
- "TabMlp", "TabResnet", "TabTransformer", "TabNet", "SAINT", "ContextAttentionMLP", "SelfAttentionMLP", "FTTransformer", "TabPerceiver", "TabFastFormer".
pytorch_tabular
- "Category Embedding", "NODE", "TabNet", "TabTransformer", "AutoInt", "FTTransformer".

You are able to implement your own models, data processing pipelines, and datasets under the flexible and well-tested framework for consistent comparisons with baseline models, which is even easier when your own model is based on pytorch.

Supported features for all model bases:

Data processing
- Data splitting (training/validation/testing sets)
- Data imputation
- Data filtering
- Data scaling
- Data augmentation
- Feature augmentation
- Feature selection
- etc.
Multi-modal data
Loading UCI datasets
Data/result analysis
- Leaderboard
- Box plot
- Pair plot
- Pearson correlation
- Partial dependency plot (with bootstrapping)
- Feature importance (Permutation and SHAP)
- etc.
Building models upon other trained models
pytorch_lightning-based training for pytorch models
Gaussian-process-based Bayesian hyperparameter optimization
Cross-validation (including continuing from a cross-validation checkpoint)
Saving, loading, and migrating models

The package stands on the shoulder of the giants:

scikit-learn
PyTorch
PyTorch Lightning
etc. (See requirements.txt)

Installation/Usage

A full documentation is available here. For a quick start:

tabular_ensemble can be installed using pypi by running the following command:

pip install tabensemb[torch]

Please use pip install tabensemb instead if you already have torch>=1.12.0 installed. Use pip install tabensemb[test] if you want to run unit tests.

To install from source,

pip install -e .[torch]

(Optional) Run unit tests after installed tabensemb[test]:

cd test
pytest .

Place your .csv or .xlsx file in a data subfolder (e.g., data/sample.csv), and generate a configuration file in a configs subfolder (e.g., configs/sample.py), containing the following content

cfg = {
    "database": "sample",
    "continuous_feature_names": ["cont_0", "cont_1", "cont_2", "cont_3", "cont_4"],
    "categorical_feature_names": ["cat_0", "cat_1", "cat_2"],
    "label_name": ["target"],
}

Run the experiment using the configuration and the data using

python main.py --base sample --epoch 10

where --base refers to the configuration file, and additional arguments (such as --epoch here) refer to those in config/default.py.

See the documentation pages for details.

Citation

If you use this repository, please cite us as:

(Will be updated after released on arXiv or published)

Name		Name	Last commit message	Last commit date
Latest commit History 419 Commits
.github		.github
configs		configs
data		data
docs		docs
tabensemb		tabensemb
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
continue_previous.py		continue_previous.py
main.py		main.py
requirements.txt		requirements.txt
run_sample.sh		run_sample.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tabular_ensemble

Installation/Usage

Citation

About

Releases 2

Contributors 2

Languages

License

Luwen-Zhang/tabular_ensemble

Folders and files

Latest commit

History

Repository files navigation

tabular_ensemble

Installation/Usage

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Contributors 2

Languages