Skip to content

Manual phenotype extraction

Martin Zackrisson edited this page Dec 21, 2016 · 10 revisions

There really should be only one reason to do this and that is to re-use the quality control work done previously but obtain new phenotypes that were released after the feature extraction was run on a project. It can be used to access developmental and experimental phenotypes BUT YOU USE THOSE AT YOUR OWN RISK!

First navigate to the analysis folder of interest and start ipython.

Second import the dependencies needed:

from scanomatic.data_processing import phenotyper

(in old versions of Scan-o-Matic data_processing might be named dataProcessing)

Second, load the previous state of the analysis and feature extraction:

P = phenotyper.Phenotyper.LoadFromState(".")

Then, if you intend to obtain phenotypes in development or even experimental phenotypes you need to set the inclusion level:

P.set_phenotype_inclusion_level(phenotyper.PhenotypeDataType.UnderDevelopment)

BUT YOU REALLY SHOULDN'T

The relevant levels are: PhenotypeDataType.Trusted (Default), PhenotypeDataType.UnderDevelopment, and PhenotypeDataType.All.

Third, remove previous phenotypes and extract new ones from the data object (nothing happens in the directory):

P.wipe_extracted_phenotypes()  # Optional in most scenarios
P.extract_phenotypes()  # Before doing this see right below here for options

This takes a little while (5-10 minutes on most computers).

If you want to retain previous QC-work while extracting phenotypes you can modify the extraction:

P.extract_phenotypes(keep_filter=True)

If you want to invoke specific type of smoothing or just redo it for some reason:

P.extract_phenotypes(smoothing=phenotyper.Smoothing.PolynomialWeightedMulti)

This will (re)do a weighted multi-polynomial smoothing.

Options are:

  • Smoothing.Keep to not redo smoothing if any exists or use default smoothing if non exists.
  • DEFAULT Smoothing.MedianGauss for median kernel filter followed by Gaussian smoothing.
  • Smoothing.Polynomial for median kernel filter followed by one polynomial smoothing
  • Smoothing.PolynomialWeightedMulti for median kernel filter followed by weighted multiple polynomial estimates true. value

For the two polynomial smoothings, non-default parameters can be set by supplying a dictionary to smoothing_coeffs={}. You shouldn't do that though so you'll have to read the code to figure out how if you really intend to do this.

This can be combined with keeping the filter or not.

Finally it is time to save the data, in this example we'll overwrite the present by saying that the target directory is ., but it could easily be changed to something else if you wish to keep both versions.

P.save_state('.')

There's currently no fast and easy way to redo spatial normalization, but this will come soon. Instead you would need to use the user interface for quality control at the moment.

If you wish to obtain some specific phenotype's non-normed data you can easily do that by e.g.:

from scanomatic.data_processing.growth_phenotypes import Phenotypes
GT = P.get_phenotype(Phenotypes.GenerationTime)

(in old versions of Scan-o-Matic data_processing might be named dataProcessing)

Getting some specific results from an analysis

Load results and if you haven't added meta-data before load meta-data too. Files that contain the meta-data must conform to Meta Data format specification:

from scanomatic.data_processing import phenotyper
P = phenotyper.Phenotyper.LoadFromState(".")
P.load_meta_data("/home/martin/Data2/scan-o-matic/qc_plate_layout.ods")

(in old versions of Scan-o-Matic data_processing might be named dataProcessing)

If we want only the results where meta-data matches a criteria we do:

S = P.find_in_meta_data(u'201-37')

If you want to limit to a column:

S = P.find_in_meta_data(u'201-37', column=0)

of you can give column by header name if they have headers in the meta-data. This selection object can be used and combined with other selection objects (e.g. S1 + S2).

Example plotting a selected groups of curves:

from matplotlib import pyplot as plt
plt.ion()
plt.semilogy(P.times, S.smooth_growth_data[1].T, basey=2)
plt.legend(S.meta_data[1], loc="lower right")
plt.xlabel("Time [h]")
plt.ylabel("Population size [cells]")

This plots the matching curves from plate indexed 1 (plate 2)

Manual QC

First load the phenotype state:

from scanomatic.data_processing import phenotyper
P = phenotyper.Phenotyper.LoadFromState(".")

(in old versions of Scan-o-Matic data_processing might be named dataProcessing)

Adding a filter setting for a specific position is done as follows:

P.add_position_mark(0, (1, 2), phenotype=phenotyper.Phenotypes.GenerationTime, position_mark=phenotyper.Filter.BadData)

This implies plate index 0, coordinate (1, 2) will for Generation Time be marked as BadData. All indices count from 0.

If no phenotype will be valid for a position you switch to phenotype=None (default value):

P.add_position_mark(2, (10, 12), phenotype=None, position_mark=phenotyper.Filter.Empty)

Remember to save the state when you are done, but note that this curve-filter becomes incompatible with the old QC interface!

The possible filters are:

Filter.OK
Filter.NoGrowth
Filter.Empty
Filter.BadData
Filter.UndecidedProblem

The Filter.UndecidedProblem shouldn't really be used by the user, that is an automatic assignment from Scan-o-Matic if a phenotype value is not finite.

To undo position marks use:

P.undo(plate=0)

Which will undo the most recent change to the plate with index 0.

Manual normalizing

There is by default only some of the phenotypes that will be normalized, displayed if you do:

P.phenotypes_that_normalize

If you want to add or remove some of the phenotypes you can do it by:

P.remove_phenotype_from_normalization(phenotyper.Phenotypes.ExperimentGrowthYield)
P.add_phenotype_to_normalization(phenotyper.Phenotypes.InitialValue)

Then phenotypes are normalized by:

P.normalize_phenotypes()

If you save state, normalized phenotypes will also be saved. If you want access to an individual normalized phenotype you do:

P.get_phenotype(phenotyper.Phenotypes.GenerationTime, normalized=True)

See above on getting specific results.

BioScreen C data

You can use BioScreen C data. This has not fully been validated, so again it is fully your responsibility to ensure data is processed and analysed correctly.

To get a Phenotyper-object to be used like the rest of the page's instructions you simply do:

from scanomatic.util import bioscreen
P = bioscreen.load("somefile.csv")

If you need another type of preprocessing of the data than the default Precog 2016 Saccharomyces cereviae, you can use any of the bioscreen.Preprocessing values like e.g.

P = bioscreen.load("somefile.csv", preprocess=bioscreen.Preprocessing.AsLoaded)

which will retain the values as they were in the data-file.