Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CV terms in record spec #361

Open
meier-rene opened this issue Sep 27, 2022 · 4 comments
Open

CV terms in record spec #361

meier-rene opened this issue Sep 27, 2022 · 4 comments

Comments

@meier-rene
Copy link
Contributor

Hi all,
we have a request from @michaelwitting for the support of EAD as a value for AC$MASS_SPECTROMETRY: 'FRAGMENTATION_MODE. I would like to use this opportunity to push the usage of controlled vocabulary in MassBank. We had some brainstorming about the best way to get this in and here is our proposal for an extension of the format spec:
Ontology terms will be specified in the same way like in the mzTab format. A quote from the mzTab spec:

Parameters are always reported as [CV label, accession, name, value]. Any field that is not available MUST be left empty.

[MS, MS:1001477, SpectraST,]

Should the name of the param contain commas, quotes MUST be added to avoid problems with the parsing: [label, accession, “first part of the param name, second part of the name”, value].

[MOD, MOD:00648, "N,O-diacetylated L-serine",]

Its most probably backward compatible to other software packages.

In the case of Michaels request #359
AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE EAD
would become
AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE [MS, MS:1003294, electron activated dissociation,]
or if we also allow the official synonym from the ontology
AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE [MS, MS:1003294, EAD,]

This does not mean that we need to have this "not so human friendly" representation on the final web page. This can easily be transformed to something like:
AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE EAD
for html(please note that EAD is new in the ontology and not distributed to all the search engines, thats why I use a different term as link).

Benefits of CV terms are unified use of terms, clear meaning and, if properly implemented, automatic extension of allowed terms by all CV terms.

Any objections, comments? If not, I will pretty soon start with the implementation of this. I expect it will be a smooth change without any breaks, because its just a addition no real change. Hopefully I can convince Michael to be our guinea pig for this addition with his contribution 😉

Regards, Rene

@sneumann
Copy link
Member

For the validation, there is also some previous work to map which "corners" of the ontology are valid for specific fields.
Such a mapping file is https://github.com/HUPO-PSI/mzML/blob/master/validator/src/main/resources/ms-mapping.xml
and a more human-friendly version can be obtained with OpenMS tools that give a HTML file like https://msbi.ipb-halle.de/~sneumann/mzML_mapping_and_cv.html (which goes further to also specify whether a term can or has to be present).

@sneumann
Copy link
Member

Not mentioned above clearly is that the mzTab people also put values in:
[MS, MS:1001582, XCMS, 2.99.6] (<- version number)
While in mzML such CV parameters can also include a unit as in:
<"MS", "MS:1000927", "ion injection time" value="200.0", "UO", "UO:0000028", "millisecond">
units are not supported in the square bracket [] notation. Instead, mzTab-M specifies units elsewhere
like [UO, UO:0000010, second, ].

@meier-rene
Copy link
Contributor Author

I think an possible solution could be [MS, MS:1000927, ion injection time, 200.0],[UO, UO:0000028, millisecond,]

@meier-rene
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants