Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with classification bias while using single database and its own split version ? #289

Open
airmicrobiome opened this issue Oct 18, 2024 · 0 comments

Comments

@airmicrobiome
Copy link

To reduce memory requirement, I split the database into two equal parts and did kaiju. As expected based on earlier conversations, the results are different. Could you please comment on this issue and a possible other solution?

kaijumerge was done with lca using the -s option. Given below are two cases with readid, taxid and score

CASE1------------------------------------------------------------------------------------------------------------------
Kaiju with Full Database (nr) : READID 3486 147 | Species: Humulus lupulus

Kaiju Merge Output : READID 360336 148 | subspecies: Corymbia citriodora subsp. variegata
Individual output used for kaiju merge
kaiju with split-DB1 -- READID 3486 147 | Species: Humulus lupulus
kaiju with split-DB2 --READID 360336 148 | subspecies: Corymbia citriodora subsp. variegata

CASE2------------------------------------------------------------------------------------------------------------------
Kaiju with Full Database (nr) : READID 80864 241 | Family: Comamonadaceae

Kaiju Merge Output : READID 1862385 242 | Species: Rubrivivax rivuli
Individual output used for kaiju merge
kaiju with split-DB1 -- READID 1862385 242 | Species: Rubrivivax rivuli
kaiju with split-DB2 -- READID 80864 241 | Family: Comamonadaceae


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant