Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCINA error message #8

Closed
maxlarosa opened this issue Jan 20, 2020 · 12 comments
Closed

SCINA error message #8

maxlarosa opened this issue Jan 20, 2020 · 12 comments
Assignees

Comments

@maxlarosa
Copy link

Hi, I'm working with SCINA but unfortunately when running the SCINA function I obtain the following error message:

Error in if (any(keep)) { : missing value where TRUE/FALSE needed

Can you help me please?

Best

Massimo

@jcao89757 jcao89757 self-assigned this Jan 21, 2020
@jcao89757
Copy link
Owner

Hi Massimo,

Thanks for raising up your question!

Mostly this error indicates that your signature lists contain many overlapped genes, and after removing overlaps automatically, one or more signature lists may become NULL lists without any genes. Would you like to set the parameter 'rm_overlap=FALSE' and let me know if your question would be solved?

Thanks again for letting me know!

Best regards, Ze

@maxlarosa
Copy link
Author

maxlarosa commented Jan 21, 2020 via email

@jcao89757
Copy link
Owner

Hi Massimo,

Sorry to hear that. Would you please provide me an example of your gene expression matrix and signature lists that can replicate your bug? I'll be glad to take a look and find a solution for you.

Best regards, Ze

@moldach
Copy link
Contributor

moldach commented Jan 21, 2020

I'm running into the same issue. When I first stumbled on @maxlarosa issue I thought it was because I had duplicates (I did have duplicates); however, after taking care of this I still get the error:

weird

I tried subsetting to find where this issue is occuring. But with some more slicing it looks like it's not row 509 ("LYPLAL1"). Not sure where the error is stemming from here:

weird02

I was going to attach the vector so you can troubleshoot but there seems to be errors when the length of vector exceeds n=481?

I will try and share it saved into a .csv instead

@maxlarosa
Copy link
Author

Hi Ze, as attachment two datasets

As for GSE72056_melanoma.txt, SCINA works with no issues, setting rm_overlap=FALSE
As for GSE103322_Head_and_neck, SCINA does not work, causing the error: Error in if (any(keep)) { : missing value where TRUE/FALSE needed

The two datasets have normalized expression values as log2(TPM/10+1), where TPM is transcript per million
I verified that both files have not duplicated rows nor NA values.

thanks for your help

Best

Massimo
GSE103322_Head_and_neck.txt
GSE72056_melanoma.txt

@maxlarosa
Copy link
Author

Here it is the marker list
marker_genes_immunity_filtered.txt

@jcao89757
Copy link
Owner

Hi Massimo and Matthew,

Thank you both for providing the data! I checked on my end and was able to replicate your bug.

For me, the problem seems like you have too many genes in one signature list. Matthew was right. One step In our algorithm calculates det(A), where A is a n*n diagonal matrix, with n=the number of genes in one list. With a large n, det(A) becomes an enormously huge number that overflows (shown like Inf in R) and crashes the whole algorithm. I can try to bypass that function in the next updated version.

For now, I would suggest that it's better to reduce the gene numbers in your signature lists. For example, to remove the genes with a low level of variation across all your cells. 20 to 50 genes per signature is good enough to have a good performance with SCINA.

I will let you know if I find a good way to bypass the det cauculation. Please let me know if this could solve your problem. Thanks again for helping to improve the tool!

Best regards, Ze

@maxlarosa
Copy link
Author

Thank you Ze, I'll try to reduce the number of genes for signature

Best
Massimo

@xiebb123456
Copy link

Hi Ze,

I've met the same question 'Error in if (any(keep)) { : missing value where TRUE/FALSE needed'.

When I do trouble shooting, 'prob_mat=t(t(prob_mat)/(1-sum(tao)+colSums(prob_mat)))' cause NAN. Which means the predict dataset do not have specific cell type in marker file, the denominator is 0.

After add the command 'prob_mat[prob_mat<1e-200] = 1e-200' extract from 'density_ratio' method, it works!

Best
Bingbing

@jcao89757 jcao89757 reopened this Dec 25, 2020
@jcao89757
Copy link
Owner

Hi Bingbing,

Thank you for letting us know! Your comment is very helpful. I really have tried to handle the small/large number blowup issues, however, it just could happen everywhere and become unpredictable. I'll keep trying and keep you updated once I find a universal solution.

However, the situation that happened to you is very rare. My concern is that your data quality may not be high enough to make any plausible predictions. For example, the data may contain too many zeros. One possible solution is to include as many signature genes as possible under each cell type. Hope this may help to solve your problem.

Please let me know if you have any other questions or comments. Thanks a lot for your contribution. Happy holidays!

Best regards, Ze

@xiebb123456
Copy link

Hi Ze,

You are right, I test 5 and 15 markers in each cell type, both returned very good prediction. So I think few marker (assume they are good marker) is enough to get plausible predictions.

Thanks for you timely reply and such a great tool!

Bingbing

@jcao89757
Copy link
Owner

It's good to know that acturally certain gene marker subsets works for you!
Thank you as well, Bingbing! Hope your work goes well and wish you all the best!

Best, Ze

Hi Ze,

You are right, I test 5 and 15 markers in each cell type, both returned very good prediction. So I think few marker (assume they are good marker) is enough to get plausible predictions.

Thanks for you timely reply and such a great tool!

Bingbing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants