Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualization of gene expression on the UMAP plots in Vitescce #696

Closed
pecan88 opened this issue Jul 9, 2020 · 10 comments
Closed

Visualization of gene expression on the UMAP plots in Vitescce #696

pecan88 opened this issue Jul 9, 2020 · 10 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@pecan88
Copy link
Contributor

pecan88 commented Jul 9, 2020

Rahul is noting that the visualization of gene expression on the UMAP plots in Vitescce is supposed to be enabled and, presently, he's not seeing it. Not entirely sure if Bug, but putting in FYI.

Source email:

On Jul 9, 2020, at 3:47 PM, Kant, Peter M [email protected] wrote:

Thank you, Rahul. This has been a grand effort.

@nils – I can log this however wanted to check w/ you first on plans re: the gene expression visualization for UMAP plots.

Peter

From: Rahul Satija [mailto:[email protected]]
Sent: Thursday, July 9, 2020 3:27 PM
To: Kant, Peter M [email protected]; Gehlenborg, Nils [email protected]; Ziv Bar-Joseph [email protected]
Subject: Re: HuBMAP Portal & Data phased release approach, ALPHA feedback by Wednesday July 15 CoB
Importance: High

Hi Peter,

First of all congrats on the Alpha which looks great!

I can deliver feedback in more formal channels, but as an important start - is the plan to enable the visualization of gene expression on the UMAP plots in Vitescce? This would be quite an important feature for myself and Ziv too I think, in order to interpret his clusters and UMAP.

I remember Nils’ saying it was supported but I didn’t see this in the portal.

Best,
Rahul

@pecan88 pecan88 added bug Something isn't working UI labels Jul 9, 2020
@ngehlenborg ngehlenborg assigned keller-mark and unassigned mccalluc Jul 9, 2020
@ngehlenborg ngehlenborg added enhancement New feature or request and removed Alpha bug Something isn't working labels Jul 9, 2020
@ngehlenborg
Copy link
Member

@keller-mark: can you check if there is gene expression data for the datasets that we are currently hosting in our showcases?

@keller-mark
Copy link
Member

Yes I will check if it is available and implement a processing script if so

@keller-mark
Copy link
Member

keller-mark commented Jul 10, 2020

The current showcase datasets have the salmon_rnaseq_10x data type. There is gene expression data for these datasets in the files cluster_marker_genes.h5ad and out.h5ad.

It looks like the expression values in cluster_marker_genes.h5ad have been normalized, and many genes have been filtered out. For example, for dataset 2dca1bf5832a4102ba780e9e54f6c350, cluster_marker_genes.h5ad is 6010 cells × 9006 genes while out.h5ad is 6287 cells × 38032 genes.

Some questions:

  • Which of the two files should be used for gene expression visualization?
  • The gene names/IDs in these files are in the ENSG format. Do these names need to be converted to a different format?
  • The gene names have suffixes - what do these represent? Should I keep them?
  • With thousands of genes, do we need to update the gene selection component in vitessce to support searching?

Screen Shot 2020-07-09 at 9 47 52 PM

@ngehlenborg
Copy link
Member

ngehlenborg commented Jul 10, 2020

Good summary, @keller-mark!

  1. @mruffalo: How are you planning to do the mapping from Ensembl IDs to gene symbols? Since this is just for the showcase, we don't need have to have a final solution, but if you have done this for other HuBMAP data, we would follow your lead. @keller-mark: the suffixes are just version numbers, you can ignore them.

  2. @mruffalo: Can you explain what filtering is done to select the cluster marker genes?

  3. @mruffalo: Is there any way to identify a smaller core subset of genes of interest? @keller-mark: In the long run, we will definitely need to add support for long gene lists to Vitessce. Could you add an issue in the Vitessce repo? I hope that we can get around it for the showcases.

@mccalluc
Copy link
Contributor

@ilan-gold - hubmapconsortium/portal-containers#55 hasn't moved since september? Is there something blocking that? Can you assess whether there is still work to do here, and possibly file new, small, issues, and then close this?

@ilan-gold
Copy link
Member

ilan-gold commented Mar 19, 2021

My sense is that there is no longer a technical blocker to this as we have the "chunking" of vitessce/vitessce#876 and virtualized scroll of vitessce/vitessce#791. @keller-mark I believe there was a reason we didn't want to show the current genes in the heatmap example? I can't remember.

@keller-mark
Copy link
Member

keller-mark commented Mar 19, 2021

Now that we have support for AnnData-as-Zarr, it may make sense to update the implementation in hubmapconsortium/portal-containers#55 (which was using the custom expression-matrix.zarr format). Also @ilan-gold may want to take a look at how that pipeline is doing the chunking of the matrix to make sure it is compatible with loading genes on demand

Another thing is that I don't think there is a need for the .arrow intermediary file (we can produce all of the Vitessce outputs without converting the input .h5ad file to .arrow first), but please correct me if I am wrong about that.

@keller-mark
Copy link
Member

keller-mark commented Mar 19, 2021

Also does HuBMAP have a standard for gene naming? In hubmapconsortium/portal-containers#55 I added a step to convert the ENSEMBL gene IDs to gene names but since it has been a while (and I believe things like search-by-expression-level have started to be implemented) would it be better for one or the other be the "primary" gene identifier?

@mccalluc
Copy link
Contributor

mccalluc commented Apr 2, 2021

Also does HuBMAP have a standard for gene naming? In hubmapconsortium/portal-containers#55 I added a step to convert the ENSEMBL gene IDs to gene names but since it has been a while (and I believe things like search-by-expression-level have started to be implemented) would it be better for one or the other be the "primary" gene identifier?

My sense is that ENSEMBL is good for internal IDs, but it's not the best for use in the UI.

@ilan-gold - I'm going to put this in the next sprint... If it's definitely not a priority, fine to move it back out, but I'd like to know that it could be done, if we wanted to put the time there.

@mccalluc mccalluc added this to the v0.16 milestone Apr 2, 2021
@mccalluc
Copy link
Contributor

mccalluc commented Apr 8, 2021

Ilan Gold < 1 minute ago
I think the content of the discussion in the issue meandered but the original goal is complete.

Ilan Gold < 1 minute ago
I’d say we can close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants