Generate MIAPA checklist-compliant nexml #46

cboettig · 2013-11-30T00:28:33Z

RNeXML should optionally be able to include all the basic metadata listed on the MIAPA checklist, hopefully guiding users that are unfamiliar with the process and being able to provide reasonable automated suggestions when possible (e.g. suggesting external identifiers based on OTU labels, #24) A function might be provided that could check (and perhaps summarize/return) miapa compliance(?).

I've reproduced the checklist below with notes added on how we're doing in RNeXML.

For each item, I've either made a note on if/how we handle it in NeXML, or a question when I'm unsure how to handle it. For instance, I can sometimes find a corresponding block in the example files in the miapa repo, but they are in OWL and the translation to NeXML's meta/RDFa isn't clear to me. An example nexml file that satisfies all these requirements would be super helpful to me.

Topology

The topology itself, possibly as an identifier of a database (such as a !TreeBASE) record. included in the nexml tree node

Is this a gene tree or species tree? Do we use the treebase namespace to define this, or is there a better alternative?

<meta content="Species Tree" datatype="xsd:string" id="meta24059" property="tb:kind.tree" xsi:type="nex:LiteralMeta"/>
<meta content="21" datatype="xsd:integer" id="meta24062" property="tb:ntax.tree" xsi:type="nex:LiteralMeta"/>
<meta content="Unrated" datatype="xsd:string" id="meta24061" property="tb:quality.tree" xsi:type="nex:LiteralMeta"/>

It is a tree or a network? nexml defines this by using <tree> or <network>
Is topology rooted or not? In nexml, defined by an attribute root="true" on a member nod_. Should we consider declaring this in metadata too?
The type of consensus if this a consensus topology (that summarizes the topology inference in some way, rather than being directly provided by the inference method)

Do we use the treebase namespace for this as well? e.g.
```
<meta content="Consensus" datatype="xsd:string" id="meta24060" property="tb:type.tree" xsi:type="nex:LiteralMeta"/>
```
The topology should be "well described", as applicable to the inference method being used. For example, a likelihood for maximum likelihood analysis. For Bayesian analyses this should also include the burn-in period excluded, and the convergence of the chain(s). This may also include more then one topology, for example a sample from the posterior probability distribution for Bayesian, or equally scoring topologies for a maximum parsimony analysis. Examples?

OTUs:

All terminal nodes should be appropriately labelled and referenced in one of the following ways. Internal nodes need not be.

A meaningful external identifier (a combination of database or resource and identifier/accession within that database).
We generate with taxize, add TSNs from species names using taxize #24
For specimens, museum, collection (if applicable), and specimen identifier. Alternatively, if a specimen is not in a museum collection, use the laboratory, laboratory collection, and accession within that collection.
Precise (GPS) georeferences for specimens are highly desirable (but not always available).
Branch lengths: Some measure of branch length required unless it is not applicable to the analysis method.. Further semantics of the measure should be implied by the tree inference method. length attribute in nexml is sufficient
Branch support: Some value of branch support should be provided, for example posterior probability, or bootstrap value, unless it is not applicable to the analysis method. meta annotation of edge node. example?

Character matrix:

I note that this description is entirely in reference to the character matrix being data from which the tree was derived. It appears that the MIAPA standard doesn't refer to comparative trait data. Further, it many not always be desirable to include a copy of the character matrix in the data file, where that alignment can be found in a separate file might suffice?

aligned data matrix that is the basis for the tree (by having been the input for the tree inference method)

MIAPA shows an example how how to state that the tree wasDerivedFrom the alignment, not sure whe corresponding rdfa in the nexml would look like

 <owl:NamedIndividual rdf:about="&Peters2011hymenoptera;tree0000001">
        <rdf:type rdf:resource="&obo;CDAO_0000012"/>
        <rdf:type rdf:resource="&obo;CDAO_0000073"/>
        <prov:wasGeneratedBy rdf:resource="&annot;InferenceOfPetersTree"/>
        <prov:wasDerivedFrom rdf:resource="&annot;PetersAlignment"/>
    </owl:NamedIndividual>

Data type must be provided, for example DNA, RNA, protein, morphology, etc.
For molecular matrices, the accession numbers (and respective database(s) if different from Genbank) of the sequences used for each row must be provided.
a mapping that relates each row identifier to a tip of the topology otu attribute present on row
a mapping that relates each accession number or specimen identifier to a row label inverse of the above map

Alignment method

name of software used, version of program

MIAPA defines that the alignment wasGeneratedBy some software.

    <owl:NamedIndividual rdf:about="&annot;PetersMUSCLEAlignmentActivity">
        <rdf:type rdf:resource="&edamontology;operation_2928"/>
        <rdf:type rdf:resource="&obo;MIAPA_0000003"/>
        <prov:wasAssociatedWith rdf:resource="&annot;Muscle"/>
        <prov:used rdf:resource="&obo;MIAPA_0000013"/>
    </owl:NamedIndividual>

parameters used (or default if default values were used).
whether alignment was manually corrected or edited

Character trait data

This is not part of the draft MIAPA standard, but merely my own suggestions/brainstorm list, based on the required metadata for EML description of character traits

character trait name (Or trait label/definition pair)
possible states a discrete trait can have
units (for continuous traits)
methodological description of how the trait was measured

Tree inference method

name of software used, version of program

    <owl:NamedIndividual rdf:about="&annot;RaXML_7.2.8">
        <rdf:type rdf:resource="&obo;MIAPA_0000016"/>
        <rdfs:label>RAxML_7.2.8</rdfs:label>
        <swo2:SWO_0000740 rdf:resource="&annot;UseMaximumLikelihood"/>
        <swo:SWO_0004000 rdf:resource="&obo;MIAPA_0000017"/>
    </owl:NamedIndividual>

parameters used, including model of evolution, and optimality criterion

 <owl:NamedIndividual rdf:about="&annot;UseMaximumLikelihood">
        <rdf:type rdf:resource="&obo;MIAPA_0000015"/>
        <rdfs:label>Maximum Likelihood algorithm</rdfs:label>
        <dc:description>The inference algorithm uses maximum likelihood as an optimality criterion. </dc:description>
    </owl:NamedIndividual>

character weights if (normally then morphological) characters were weighted.

The text was updated successfully, but these errors were encountered:

cboettig · 2014-03-25T22:09:54Z

Also see

How to express that a tree was generated by simulating under a specified process? evoinfo/miapa#21
how to express that a character matrix provides comparative trait data for the phylogeny? evoinfo/miapa#20
How to express that a tree is ultrametric, and how it was made ultrametric? evoinfo/miapa#19

This was referenced Mar 25, 2014

Interpreting comparative data in nexml files #44

Closed

Navigating ontologies #26

Closed

cboettig added metadata labels Mar 25, 2014

cboettig mentioned this issue Jul 9, 2014

Add citation to Cranston et al #74

Closed

cboettig added this to the Long term objectives milestone Jul 17, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate MIAPA checklist-compliant nexml #46

Generate MIAPA checklist-compliant nexml #46

cboettig commented Nov 30, 2013 •

edited by hlapp

Loading

cboettig commented Mar 25, 2014

Generate MIAPA checklist-compliant nexml #46

Generate MIAPA checklist-compliant nexml #46

Comments

cboettig commented Nov 30, 2013 • edited by hlapp Loading

Topology

OTUs:

Character matrix:

Alignment method

Character trait data

Tree inference method

cboettig commented Mar 25, 2014

cboettig commented Nov 30, 2013 •

edited by hlapp

Loading