Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add constraints to labels #98

Open
mmaiers-nmdp opened this issue Jun 17, 2019 · 4 comments
Open

Add constraints to labels #98

mmaiers-nmdp opened this issue Jun 17, 2019 · 4 comments

Comments

@mmaiers-nmdp
Copy link
Contributor

  1. we want to constrain label types
    For example, a labelType of "ICCBBA ION" could refer to the data here
    We don't want label types of ICCCBBAAA ION etc.

  2. we want an explicit way to create new label types (and show what label types are in the database)
    Have a REST endpoint GET/POST LabelType
    Other label types:
    DOI - reference to a manuscript
    PMID - PubMed ID

@sjmack
Copy link

sjmack commented Jun 17, 2019

For 1. EBMT provides a PDF that lists a lot of IONs, but its there a more accessible (and potentially comprehensive) list (text-formatted maybe?) or repository that could be queried for IONs, or is EMBT the place (or is the www.iccbba.org/ document above the place to go?)?

We would want to validate the ION against something before sending it to the database.

For 2. above, I suggest we use the NCBI's PMCID - PMID - Manuscript ID - DOI converter to convert all provided PubMed IDs, PMC IDs, NIHMS IDs, UK IDs, etc. into DOIs. This can also be used to validate a provided ID, so that the client can reject the label if the ID is invalid.

However, what is to be done in cases where the haplotyping generator group does not have an ION and does not have a published reference for the data? This is not an uncommon issue with AFND, where unpublished data are loaded into AFND with no external citation.

@mmaiers-nmdp
Copy link
Contributor Author

  1. List of ions is the xml in the link above based on this xld xsd https://www.iccbba.org/docs/tech-library/database/grid-issuing-organizations-xml-schema.xsd
    ION should be optional

@kaeaton
Copy link
Collaborator

kaeaton commented Aug 18, 2019

The ION database exists only as either a xml document or an excel document (that could be exported to csv). The difference between the two is primarily that the excel version includes inactive facilities. There are a couple issues here:

A. They don't have any sort of flagging notification that they've updated the db. They keep a pdf log of the changes (they've added one facility a year since 2017), but there's no real way that I know of to ping and see if the files have changed short of redownloading them. Problems:

  1. We have no way of knowing when they add new data.
  2. We have no easy way of getting the data to check because it's file based, not a true database.

This second one brings up:

B. I can either download the xml and have the program parse it, or if we want the inactive facility IONs as well, download the Excel file and convert it into CSV file that gets physically saved with the program. (There's a resource folder that allows you to add non-Java files and access them within a compiled jar. It's how I added the help documentation.) In either case there are issues:

  1. If a facility has deactivated an ION, but still has data they (or someone else) want to upload using that old ION, we need the contents of the Excel file.
  2. If the facility is a new one, short of manually updating the program with a new CSV file, the only option is using the xml file and forcing it to re-download and reread the file. With this option we could conceivably put a button in options to force a re-download and parsing of the XML file. But that only works if iccbba has added the new facility to the db files.

The Phycus GUI currently does neither, but does screen to make sure the ION is valid per the iccbba naming conventions. (A four digit number that cannot start with a 0. So 1000 - 9999.)

@kaeaton
Copy link
Collaborator

kaeaton commented Aug 18, 2019

Regarding the other labels: right now we have haplotyping entity (this used to include the ION, now they're separate labels), genotyping entity, and ION. I was planning on adding DOIs next using the generator Steve found for converting assorted other IDs into DOIs.

Haplotyping and genotyping entities are inherited fields and can be changed or dropped. (The Java CLI included them by default, but didn't actually include a way to specify them, they were hard coded in the function.)

Martin, when you say you want an explicit way to create new label types, were you thinking of something like populations where they have to manually add them before using them? What about the values of these labels?

Also, we still need a place to put some sort of attribution data. Would putting that as a label be a valid option? And if yes, how do we want to do that? Name? Phone/email/address? If all of the above, separate labels for each? According to curation-swagger-spec.yaml there's a way to pull this information back out of the database, so having a tab with this all in it is an option. Maybe a dropdown with the available labels in it, that, when selected, shows the values found in the database associate with that particular label?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants