analysis

This module contains all the core functions/classes that can help in the analysis of data that come from quantitative phenotyping experiments. This includes exploratory analysis of the data, statistical comparisons and inference tasks such as classification and clustering.

These functions are meant to be used for an initial quick inspection of the data and not for thourough statistical analysis. Because they are used for quick data inspection, some of these functions offer the option of plotting a number of features automatically (as an exception to the general rule that plotting functions should not be included in the analysis module). Because they are not meant to be used for statistical analysis, any statistics produced from these functions are not corrected for multiple comparisons.

Functions available to the user:

k_significant_feat
mRMR_feature_selection
top_feat_in_PCs
top_feat_in_LDA
k_significant_from_classifier

statistical_tests

Functions to perform statistical tests with multidimensional data. Contains functions that compare between groups using univariate tests per feature. Also contains a wrapper function to estimate confidence intervals for statistical measures using bootstrapping.

Functions available to the user:

univariate_tests
get_effect_sizes
bootstrapped_ci

fingerprints

Classes used to estimate tierpsy fingerprints, a way to summarize information from the multidimensional feature space and plot it in an interpretable way.

clustering_tools

Functions for clustering analysis. At the moment, contains functions that estimate the cluster purity for different distances of hierarchical clustering.

Functions available to the user:

purity_score
hierarchical_purity

classification_tools

Functions for classification analysis. At the moment, it includes mainly wrapper functions for cross-validation. Similar CV wrappers exist in sklearn. The ones developed here:

allow easier scaling (without using pipeline)
can use custom splitters for grouped data and custom scorers
have customized output that are convenient for the analysis of the drug screenings (for example in majority vote functions).

Fucntions available to the user:

cv_predict
cv_predict_single
cv_score
get_fscore
rearrange_confusion_matrix

cv_splitters

Customized splitter classes for training and tuning of machine learning models. They are based on the structure of the splitter classes in sklearn, inheriting fomr the base sklearn splitters.

Class StratifiedGroupKFold

scorers

Custom scorer classes for machine learning algorithms. At the moment, a scorer for classification tasks is implemented. The scorer can estimate different types of performance metrics, including metrics after majority vote.

Class ClassifScorer - methods available to the user:

score
score_maj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

Contents

significant_features

statistical_tests

fingerprints

clustering_tools

classification_tools

cv_splitters

scorers

Clone this wiki locally