Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specific markers on general classes #16

Open
dosumis opened this issue Sep 18, 2022 · 8 comments
Open

Specific markers on general classes #16

dosumis opened this issue Sep 18, 2022 · 8 comments

Comments

@dosumis
Copy link
Collaborator

dosumis commented Sep 18, 2022

Specific markers are linked to very general cell classes, e.g. these are recorded as generic markers of epithelial cells

image

Checking out the CCF relationships indicated these are for epithelial cells of the iris epithelium:

image

As a result, any query for cells expressing AQP1, will find all epithelial cells - although that was clearly not the expert's intent. This will be true with a standard OWL query, but queries using CCF properties would suffer from the same issue unless we generate separate graphs for each table and don't allow for cross-table queries.
Here's another example that shows up an additional problem:

image

image

The same generic cell type is linked to two locations but has only one marker set - presumably coming from one location (not sure which). But as there's only one cell type term (IRI), the markers are associated with both locations.

The obvious fix is to make location specific terms for each of these using a standard ROBOT template and add these to CL.

e.g. iris epithelial cell

@anitacaron
Copy link
Collaborator

Here's a complete list of cell types with markers and their locations. https://yasgui.triply.cc/#

As Josef noticed in #8, a few cell types share the same marker in a different location, or a few don't have a marker in specific locations.

I'll create a report listing all cell types in specific locations and markers; then, I can make a template to add them to CL.

@dosumis
Copy link
Collaborator Author

dosumis commented Oct 6, 2022

Generic Yasgui link. #8 is basically the same issue.

@anitacaron
Copy link
Collaborator

Ops, here's again https://api.triplydb.com/s/jJ4B6VyBt

@anitacaron
Copy link
Collaborator

Here're the reports https://github.com/hubmapconsortium/ccf-validation-tools/tree/anitacaron/issue183/logs/ct_loc_markers.

@dosumis
Copy link
Collaborator Author

dosumis commented Oct 11, 2022

Some thoughts:

  • Interesting that Bruce's API appears to do transitive closure across ccf_located_in. This => 10,000 results when I suspect the actual figure for CT -> most specific AS term probably 1/4 of that. Also wondering if this affects our validation/build pipeline? This is the kind of thing we really need to co-ordinate on.

  • We could reduce the numbers by discarding CL terms with no SubClasses - we want to target grouping terms. I guess this would need to be done in Python though as we'd need a separate SPARQL query of full CL (e.g. in UberGraph).

  • Seeing so many immune cells there makes me think we need a quick fix for these. Rather than making terms in CL for, say, B-cell in all these specific locations, we should be making compound terms in CCF.owl, I think using located_in. This avoids the thorny question of whether immune cell types are resident (and so should be in CL with part_of to loc) or transient (CL won't take the terms).

Suggested decision tree:

Does cell have a markers? -Y -> Does it have subclasses? - Y ->
Is it a subClassOf 'leukocyte' or 'myeloid cell'
-Y -> add term to CCF OWL ROBOTe template (EquivalentTo CL located_in some AS*)
-N-> Review for addition to CL compound term template (EquivalentTo CL part_of some AS*)

*AS = most specific AS

@emquardokus
Copy link

Thanks @anitacaron! I had not realized this moved forward about the markers connected to cell types.
Perhaps the best way to resolve this is discussing how to connect these biomarker sets to cell types in the first place.
What I think needs to happen, which Josef and I discussed and I've discussed with Katy, is to compare all the "generic" cell types like "fibroblast" across organs with their associated biomarker sets and see if there are not different terms for them in those locations. Sometimes there are, but due to the activation energy of having to request a new term, the broader cell type that already exists instead gets used. Josef and I had discussed this whole topic quite a bit.
I agree, coordination is super important.

@anitacaron
Copy link
Collaborator

I've removed the Blood Vasculature from the merged report, which I'm using as input to generate the ROBOT template. This reduces a lot the size of the templates.

I'm aware of the duplications in the templates.

Here's the first draft of the ROBOT template for CCF https://github.com/hubmapconsortium/ccf-validation-tools/blob/2bf64dee8ea0246137db16c2128240d1908c88fb/templates/ccf_compound.tsv
Here's for CL https://github.com/hubmapconsortium/ccf-validation-tools/blob/2bf64dee8ea0246137db16c2128240d1908c88fb/templates/cl_compound.tsv.

@anitacaron
Copy link
Collaborator

@emquardokus I looked carefully into the template to bulk-create the compound CL terms and found different markers for the same cell type and location. The case I found is in the Skin and Large intestine tables. I'll try to find other instances. Is this intended?

On the Skin table (row 25):

AS/3 AS/3/LABEL AS/3/ID CT/1 CT/1/LABEL CT/1/ID BProtein/1 BProtein/1/LABEL BProtein/1/ID BProtein/2 BProtein/2/LABEL BProtein/2/ID
Eccrine (sweat) glands eccrine sweat gland UBERON:0000423 myoepithelial cell myoepithelial cell of sweat gland CL:1000417 keratin 19 KRT19 HGNC:6436 S100 calcium binding protein A2 S100A2 HGNC:10492

On the Large intestine (row 1324):

AS/4 AS/4/LABEL AS/4/ID CT/1 CT/1/LABEL CT/1/ID BProtein/1 BProtein/1/LABEL BProtein/1/ID
eccrine (sweat) gland eccrine (sweat) gland UBERON:0000423 myoepithelial myoepithelial cell of sweat gland CL:1000417 aSMA   HGNC:130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants