Skip to content

analysis

em812 edited this page May 11, 2021 · 6 revisions

This module contains all the core functions/classes that can help in the analysis of data that come from quantitative phenotyping experiments. This includes exploratory analysis of the data, statistical comparisons and inference tasks such as classification and clustering.

Contents

significant_features

Functions that return a number of features of interest based on a specific comparison between groups or based on the variance of the data along different phenotypic directions.

These functions are meant to be used for an initial quick inspection of the data and not for thourough statistical analysis. Because they are used for quick data inspection, some of these functions offer the option of plotting a number of features automatically (as an exception to the general rule that plotting functions should not be included in the analysis module). Because they are not meant to be used for statistical analysis, any statistics produced from these functions are not corrected for multiple comparisons.

Functions available to the user:

  • k_significant_feat
  • mRMR_feature_selection
  • top_feat_in_PCs
  • top_feat_in_LDA
  • k_significant_from_classifier

statistical_tests

Functions to perform statistical tests with multidimensional data. Contains functions that compare between groups using univariate tests per feature. Also contains a wrapper function to estimate confidence intervals for statistical measures using bootstrapping.

Functions available to the user:

  • univariate_tests
  • get_effect_sizes
  • bootstrapped_ci

fingerprints

Classes used to estimate tierpsy fingerprints, a way to summarize information from the multidimensional feature space and plot it in an interpretable way.

clustering_tools

Functions for clustering analysis. At the moment, contains functions that estimate the cluster purity for different distances of hierarchical clustering.

Functions available to the user:

  • purity_score
  • hierarchical_purity

classification_tools

Functions for classification analysis. At the moment, it includes mainly wrapper functions for cross-validation. Similar CV wrappers exist in sklearn. The ones developed here:

  • allow easier scaling (without using pipeline)
  • can use custom splitters for grouped data and custom scorers
  • have customized output that are convenient for the analysis of the drug screenings (for example in majority vote functions).

Fucntions available to the user:

  • cv_predict
  • cv_predict_single
  • cv_score
  • get_fscore
  • rearrange_confusion_matrix

cv_splitters

Customized splitter classes for training and tuning of machine learning models. They are based on the structure of the splitter classes in sklearn, inheriting fomr the base sklearn splitters.

Class StratifiedGroupKFold

scorers

Custom scorer classes for machine learning algorithms. At the moment, a scorer for classification tasks is implemented. The scorer can estimate different types of performance metrics, including metrics after majority vote.

Class ClassifScorer - methods available to the user:

  • score
  • score_maj
Clone this wiki locally