How to deal with classification bias while using single database and its own split version ? #289

airmicrobiome · 2024-10-18T06:42:34Z

To reduce memory requirement, I split the database into two equal parts and did kaiju. As expected based on earlier conversations, the results are different. Could you please comment on this issue and a possible other solution?

kaijumerge was done with lca using the -s option. Given below are two cases with readid, taxid and score

CASE1------------------------------------------------------------------------------------------------------------------
Kaiju with Full Database (nr) : READID 3486 147 | Species: Humulus lupulus

Kaiju Merge Output : READID 360336 148 | subspecies: Corymbia citriodora subsp. variegata
Individual output used for kaiju merge
kaiju with split-DB1 -- READID 3486 147 | Species: Humulus lupulus
kaiju with split-DB2 --READID 360336 148 | subspecies: Corymbia citriodora subsp. variegata

CASE2------------------------------------------------------------------------------------------------------------------
Kaiju with Full Database (nr) : READID 80864 241 | Family: Comamonadaceae

Kaiju Merge Output : READID 1862385 242 | Species: Rubrivivax rivuli
Individual output used for kaiju merge
kaiju with split-DB1 -- READID 1862385 242 | Species: Rubrivivax rivuli
kaiju with split-DB2 -- READID 80864 241 | Family: Comamonadaceae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with classification bias while using single database and its own split version ? #289

How to deal with classification bias while using single database and its own split version ? #289

airmicrobiome commented Oct 18, 2024

How to deal with classification bias while using single database and its own split version ? #289

How to deal with classification bias while using single database and its own split version ? #289

Comments

airmicrobiome commented Oct 18, 2024