diff --git a/docs/examples.rst b/docs/examples.rst index 205eab3..fa47fd8 100644 --- a/docs/examples.rst +++ b/docs/examples.rst @@ -9,36 +9,36 @@ Oversampling can be carried out by importing any oversampler from the ``smote_va .. code-block:: Python import smote_variants as sv - + oversampler= sv.SMOTE_ENN() - + # supposing that X and y contain some the feature and target data of some dataset X_samp, y_samp= oversampler.sample(X, y) - + Using the ``datasets`` package of ``sklearn`` to import some data: .. code-block:: Python import smote_variants as sv import sklearn.datasets as datasets - + dataset= datasets.load_breast_cancer() - + oversampler= sv.KernelADASYN() - + X_samp, y_samp= oversampler.sample(dataset['data'], dataset['target']) - + Using the imbalanced datasets available in the ``imbalanced_datasets`` package: .. code-block:: Python import smote_variants as sv import imbalanced_datasets as imbd - + dataset= imbd.load_iris0() - + oversamplers= sv.SMOTE_OUT() - + X_samp, y_samp= oversampler.sample(dataset['data'], dataset['target']) Oversampling with random, reasonable parameters @@ -52,13 +52,13 @@ In order to facilitate model selection, each oversampler class is able to genera import smote_variants as sv import imbalanced datasets as imbd - + dataset= imbd.load_yeast1() - + par_combs= SMOTE_Cosine.parameter_combinations() - + oversampler= SMOTE_Cosine(**np.random.choice(par_combs)) - + X_samp, y_samp= oversampler.sample(dataset['data'], dataset['target']) Multiclass oversampling @@ -70,11 +70,12 @@ Multiclass oversampling is highly ambiguous task, as balancing various classes m import smote_variants as sv import sklearn.datasets as datasets - + dataset= datasets.load_wine() - - oversampler= sv.MulticlassOversampling(sv.distance_SMOTE) - + + oversampler= sv.MulticlassOversampling(oversampler='distance_SMOTE', + oversampler_params={}) + X_samp, y_samp= oversampler.sample(dataset['data'], dataset['target']) Model selection @@ -83,10 +84,10 @@ Model selection When facing an imbalanced dataset, model selection is crucial to find the right oversampling approach and the right classifier. It is obvious that the best performing oversampling technique depends on the subsequent classification, thus, the model selection of oversampler and classifier needs to be carried out hand in hand. This is facilitated by the ``model_selection`` function of the package. One must specify a set of oversamplers and a set of classifiers, a score function (in this case 'AUC') to optimize in cross validation and the ``model_selection`` function does all the job: .. code-block:: Python - + import smote_variants as sv import imbalanced_datasets as imbd - + datasets = [imbd.load_glass2] oversamplers = sv.get_all_oversamplers(n_quickest=5) oversamplers = sv.generate_parameter_combinations(oversamplers, @@ -94,13 +95,13 @@ When facing an imbalanced dataset, model selection is crucial to find the right classifiers = [('sklearn.neighbors', 'KNeighborsClassifier', {'n_neighbors': 3}), ('sklearn.neighbors', 'KNeighborsClassifier', {'n_neighbors': 5}), ('sklearn.tree', 'DecisionTreeClassifier', {})] - + sampler, classifier= model_selection(datasets=datasets, oversamplers=oversamplers, classifiers=classifiers) The function call returns the best performing oversampling object and the corresponding, best performing classifier object, respecting the 'glass2' dataset. - + Thorough evaluation involving multiple datasets =============================================== @@ -110,18 +111,18 @@ Another scenario is the comparison and evaluation of a new oversampler to conven import smote_variants as sv import imbalanced_datasets as imbd - + datasets= [imbd.load_glass2, imbd.load_ecoli4] - + oversamplers = sv.get_all_oversamplers(n_quickest=5) - + oversamplers = sv.generate_parameter_combinations(oversamplers, n_max_comb=5) - + classifiers = [('sklearn.neighbors', 'KNeighborsClassifier', {'n_neighbors': 3}), ('sklearn.neighbors', 'KNeighborsClassifier', {'n_neighbors': 5}), ('sklearn.tree', 'DecisionTreeClassifier', {})] - + results= evaluate_oversamplers(datasets=datasets, oversamplers=oversamplers, classifiers=classifiers,