Information on logrithm base value #2

minor7b5 · 2019-01-11T16:35:51Z

Could I please ask for a bit more information on the function of the base value - what it does and what impact higher or lower values will have, and correspondingly how I should decide a setting based on the structure of my dataset?

For extra information, my RNAseq dataset is comprised of multiple tissues from various heterozygous individuals. I intend to conduct a multiple-k approach - should I run ORNA separate times with different values of k on the dataset to generate a set of normalised reads for respective assembly run, or should a single normalisation using the smallest k suffice?

Best wishes,
Reza

SchulzLab · 2019-01-12T05:12:04Z

Dear Reza,
the base parameter, sets the logarithm base that is used to determine the minimal number of times a k-mer must be retained in the reduced dataset. For example consider a k-mer that occurs in 8 reads. With base=2, log_2(8)=3, at least 3 reads must be kept. With base=10, log_10(8)=0.9, at least 1 read must be kept. Because every value lower than one is set to 1.
Thus the higher the base the higher is the reduction.

If you have a very large dataset, as it sounds in your case, you can easily go higher with the base parameter. For example, base=3 with 1000 reads would retain 7 reads of them (at least).
Concerning multi-k assemblies. I would recommend to use the same normalised data for all of them. I guess you are thinking that when you were to redo the normalisation with the k-mer parameter used for each of the k-mer assemblies, you ensure that the k-mer connectivity is preserved for each of the k-mer assemblies. But what is also true is that larger k-mer values lead to less reduction if you use the same base parameter. The higher the value for k, the more unique k-mers are in the dataset, thus the more reads get preserved. To speed things up, I would stick with the smaller k-mer value used in the multi-k assembly, assuming of course that this is a reasonable value for your data.

Hope that helps,
Marcel

SchulzLab closed this as completed Feb 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Information on logrithm base value #2

Information on logrithm base value #2

minor7b5 commented Jan 11, 2019 •

edited

Loading

SchulzLab commented Jan 12, 2019

Information on logrithm base value #2

Information on logrithm base value #2

Comments

minor7b5 commented Jan 11, 2019 • edited Loading

SchulzLab commented Jan 12, 2019

minor7b5 commented Jan 11, 2019 •

edited

Loading