Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Folder mode missed results for some bins #43

Closed
zhenjiaofenjie opened this issue Oct 13, 2023 · 5 comments
Closed

Folder mode missed results for some bins #43

zhenjiaofenjie opened this issue Oct 13, 2023 · 5 comments

Comments

@zhenjiaofenjie
Copy link

Hi,

I have several bins (say 1.faa 2.faa ... 100.faa) in a bin/ folder. In single mode, EukCC successfully gave results for ~ 70% of them. However, when in folder mode, some of the bins were lost from the eukcc.csv file. Is there a reason for that?

Also, the folder mode merged two bins together, even though I did not provide the link table. Would this be a bug or an expected behavior?

Thanks!
Jing

@rrohwer
Copy link

rrohwer commented Sep 30, 2024

Hi, I am having this same issue. I ran eukcc on 73 bins, but only 6 appear in the output file. However, in the log I can see that all bins were recognized, and several even had fairly high completeness but were not reported.

This seems like a bug to me.

Thanks,
Robin

# the command I used:
eukcc folder --out eukcc_output --threads 20  --db ../eukccdb/eukcc2_db_ver_1.1 --suffix .fna folder_with_73_MAGs

# output file has practically nothing in it
cat eukcc.csv
bin     completeness    contamination
bin102.fna     0.0     0.0
bin92.fna        7.32    0.0
bin96.fna        0.0     0.0
bin261.fna       10.6    1.0
bin115.fna       1.4     0.0
bin57.fna        3.17    0.0

# but in the log you can see that all 73 bins were found, and some of them were unreported even though they had decent completeness
cat eukcc.log
29-09-2024 14:46:29:  EukCC version 2.1.0
29-09-2024 14:46:29:  Found 73 bins
29-09-2024 14:56:30:  Searching for marker genes in base database
29-09-2024 14:56:31:  No placement marker genes found.
29-09-2024 14:56:31:  Searching for marker genes in base database
29-09-2024 14:56:31:  No placement marker genes found.
29-09-2024 14:56:31:  Searching for marker genes in base database
29-09-2024 14:56:31:  No placement marker genes found.
29-09-2024 14:56:31:  Searching for marker genes in base database
29-09-2024 14:56:31:  No placement marker genes found.
29-09-2024 14:56:32:  Searching for marker genes in base database
29-09-2024 14:56:33:  Found 7 marker genes, placing them in the tree using epa-ng
29-09-2024 14:56:46:  Genome belongs to clade: protozoa (Best TaxID: 33846)
29-09-2024 14:56:46:  Searching for marker genes in protozoa database
29-09-2024 14:56:47:  Found 4 marker genes, placing them in the tree using epa-ng
29-09-2024 14:56:50:  Automatically locating best SCMG set
29-09-2024 14:57:17:  ❗The choosen marker gene set is supported by only half (1/4) of the alignments. This generally is an unstable estimate.
29-09-2024 14:57:20:  Searching fasta for selected markers
29-09-2024 14:57:29:  Completeness: 32.74
29-09-2024 14:57:29:  Contamination: 3.54
29-09-2024 14:57:29:  Max silent contamination: 100.0
29-09-2024 14:57:29:  Searching for marker genes in base database
29-09-2024 14:57:29:  No placement marker genes found.
29-09-2024 14:57:29:  Searching for marker genes in base database
29-09-2024 14:57:29:  No placement marker genes found.
29-09-2024 14:57:29:  Searching for marker genes in base database
29-09-2024 14:57:31:  Found 15 marker genes, placing them in the tree using epa-ng
29-09-2024 14:58:04:  Genome belongs to clade: protozoa (Best TaxID: protist_common)
29-09-2024 14:58:04:  Searching for marker genes in protozoa database
29-09-2024 14:58:06:  Found 22 marker genes, placing them in the tree using epa-ng
29-09-2024 14:58:30:  Automatically locating best SCMG set
29-09-2024 14:59:02:  ❗The choosen marker gene set is supported by only half (9/22) of the alignments. This generally is an unstable estimate.
29-09-2024 14:59:02:  Searching fasta for selected markers
29-09-2024 14:59:05:  Completeness: 41.18
29-09-2024 14:59:05:  Contamination: 11.76
29-09-2024 14:59:05:  Max silent contamination: 100.0
29-09-2024 14:59:05:  Searching for marker genes in base database
29-09-2024 14:59:06:  No placement marker genes found.
29-09-2024 14:59:06:  Searching for marker genes in base database
29-09-2024 14:59:06:  Found 8 marker genes, placing them in the tree using epa-ng
29-09-2024 14:59:17:  Genome belongs to clade: protozoa (Best TaxID: 6020)
29-09-2024 14:59:17:  Searching for marker genes in protozoa database
29-09-2024 14:59:18:  Found 6 marker genes, placing them in the tree using epa-ng
29-09-2024 14:59:21:  Automatically locating best SCMG set
29-09-2024 14:59:42:  ❗The choosen marker gene set is supported by only half (2/6) of the alignments. This generally is an unstable estimate.
29-09-2024 14:59:43:  Searching fasta for selected markers
29-09-2024 14:59:44:  Completeness: 31.17
29-09-2024 14:59:44:  Contamination: 0.0
29-09-2024 14:59:44:  Max silent contamination: 100.0
29-09-2024 14:59:44:  Searching for marker genes in base database
29-09-2024 14:59:45:  No placement marker genes found.
29-09-2024 14:59:45:  Searching for marker genes in base database
29-09-2024 14:59:45:  No placement marker genes found.
29-09-2024 14:59:45:  Searching for marker genes in base database
29-09-2024 14:59:46:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 14:59:50:  Genome belongs to clade: protozoa (Best TaxID: protist_common)
29-09-2024 14:59:50:  Searching for marker genes in protozoa database
29-09-2024 14:59:50:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 14:59:53:  Automatically locating best SCMG set
29-09-2024 15:00:21:  Searching fasta for selected markers
29-09-2024 15:00:21:  Completeness: 0.0
29-09-2024 15:00:21:  Contamination: 0.0
29-09-2024 15:00:21:  Max silent contamination: 100.0
29-09-2024 15:00:21:  Searching for marker genes in base database
29-09-2024 15:00:22:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 15:00:26:  Genome belongs to clade: protozoa (Best TaxID: protist_common)
29-09-2024 15:00:26:  Searching for marker genes in protozoa database
29-09-2024 15:00:27:  No placement marker genes found.
29-09-2024 15:00:27:  Searching for marker genes in base database
29-09-2024 15:00:27:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:00:30:  Genome belongs to clade: protozoa (Best TaxID: 174815)
29-09-2024 15:00:30:  Searching for marker genes in protozoa database
29-09-2024 15:00:31:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:00:33:  Automatically locating best SCMG set
29-09-2024 15:00:56:  Searching fasta for selected markers
29-09-2024 15:00:56:  Completeness: 7.32
29-09-2024 15:00:56:  Contamination: 0.0
29-09-2024 15:00:56:  Max silent contamination: 100.0
29-09-2024 15:00:56:  Searching for marker genes in base database
29-09-2024 15:00:56:  No placement marker genes found.
29-09-2024 15:00:56:  Searching for marker genes in base database
29-09-2024 15:00:57:  No placement marker genes found.
29-09-2024 15:00:57:  Searching for marker genes in base database
29-09-2024 15:00:57:  Found 3 marker genes, placing them in the tree using epa-ng
29-09-2024 15:01:02:  Genome belongs to clade: protozoa (Best TaxID: protist_common)
29-09-2024 15:01:02:  Searching for marker genes in protozoa database
29-09-2024 15:01:02:  No placement marker genes found.
29-09-2024 15:01:02:  Searching for marker genes in base database
29-09-2024 15:01:03:  No placement marker genes found.
29-09-2024 15:01:03:  Searching for marker genes in base database
29-09-2024 15:01:03:  No placement marker genes found.
29-09-2024 15:01:03:  Searching for marker genes in base database
29-09-2024 15:01:03:  Found 10 marker genes, placing them in the tree using epa-ng
29-09-2024 15:01:16:  Genome belongs to clade: protozoa (Best TaxID: 5878)
29-09-2024 15:01:16:  Searching for marker genes in protozoa database
29-09-2024 15:01:17:  Found 6 marker genes, placing them in the tree using epa-ng
29-09-2024 15:01:21:  Automatically locating best SCMG set
29-09-2024 15:01:42:  ❗The choosen marker gene set is supported by only half (2/6) of the alignments. This generally is an unstable estimate.
29-09-2024 15:01:43:  Searching fasta for selected markers
29-09-2024 15:01:45:  Completeness: 31.17
29-09-2024 15:01:45:  Contamination: 6.49
29-09-2024 15:01:45:  Max silent contamination: 100.0
29-09-2024 15:01:45:  Searching for marker genes in base database
29-09-2024 15:01:45:  No placement marker genes found.
29-09-2024 15:01:45:  Searching for marker genes in base database
29-09-2024 15:01:45:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:01:49:  Genome belongs to clade: protozoa (Best TaxID: 5654)
29-09-2024 15:01:49:  Searching for marker genes in protozoa database
29-09-2024 15:01:50:  No placement marker genes found.
29-09-2024 15:01:50:  Searching for marker genes in base database
29-09-2024 15:01:50:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:01:53:  Genome belongs to clade: protozoa (Best TaxID: 2699528)
29-09-2024 15:01:53:  Searching for marker genes in protozoa database
29-09-2024 15:01:53:  No placement marker genes found.
29-09-2024 15:01:53:  Searching for marker genes in base database
29-09-2024 15:01:54:  No placement marker genes found.
29-09-2024 15:01:54:  Searching for marker genes in base database
29-09-2024 15:01:54:  No placement marker genes found.
29-09-2024 15:01:54:  Searching for marker genes in base database
29-09-2024 15:01:54:  No placement marker genes found.
29-09-2024 15:01:54:  Searching for marker genes in base database
29-09-2024 15:01:55:  No placement marker genes found.
29-09-2024 15:01:55:  Searching for marker genes in base database
29-09-2024 15:01:55:  No placement marker genes found.
29-09-2024 15:01:55:  Searching for marker genes in base database
29-09-2024 15:01:55:  No placement marker genes found.
29-09-2024 15:01:55:  Searching for marker genes in base database
29-09-2024 15:01:55:  No placement marker genes found.
29-09-2024 15:01:55:  Searching for marker genes in base database
29-09-2024 15:01:56:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 15:02:00:  Genome belongs to clade: protozoa (Best TaxID: protist_common)
29-09-2024 15:02:00:  Searching for marker genes in protozoa database
29-09-2024 15:02:01:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 15:02:04:  Automatically locating best SCMG set
29-09-2024 15:02:31:  Searching fasta for selected markers
29-09-2024 15:02:32:  Completeness: 0.0
29-09-2024 15:02:32:  Contamination: 0.0
29-09-2024 15:02:32:  Max silent contamination: 100.0
29-09-2024 15:02:32:  Searching for marker genes in base database
29-09-2024 15:02:32:  No placement marker genes found.
29-09-2024 15:02:32:  Searching for marker genes in base database
29-09-2024 15:02:32:  No placement marker genes found.
29-09-2024 15:02:32:  Searching for marker genes in base database
29-09-2024 15:02:33:  Found 3 marker genes, placing them in the tree using epa-ng
29-09-2024 15:02:37:  Genome belongs to clade: protozoa (Best TaxID: protist_common)
29-09-2024 15:02:37:  Searching for marker genes in protozoa database
29-09-2024 15:02:38:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 15:02:41:  Automatically locating best SCMG set
29-09-2024 15:03:17:  Searching fasta for selected markers
29-09-2024 15:03:25:  Completeness: 10.6
29-09-2024 15:03:25:  Contamination: 1.0
29-09-2024 15:03:25:  Max silent contamination: 100.0
29-09-2024 15:03:25:  Searching for marker genes in base database
29-09-2024 15:03:26:  Found 4 marker genes, placing them in the tree using epa-ng
29-09-2024 15:03:32:  Genome belongs to clade: protozoa (Best TaxID: 33634)
29-09-2024 15:03:32:  Searching for marker genes in protozoa database
29-09-2024 15:03:32:  Found 5 marker genes, placing them in the tree using epa-ng
29-09-2024 15:03:35:  Automatically locating best SCMG set
29-09-2024 15:04:02:  ❗The choosen marker gene set is supported by only half (1/5) of the alignments. This generally is an unstable estimate.
29-09-2024 15:04:06:  Searching fasta for selected markers
29-09-2024 15:04:09:  Completeness: 25.58
29-09-2024 15:04:09:  Contamination: 10.23
29-09-2024 15:04:09:  Max silent contamination: 100.0
29-09-2024 15:04:09:  Searching for marker genes in base database
29-09-2024 15:04:09:  No placement marker genes found.
29-09-2024 15:04:09:  Searching for marker genes in base database
29-09-2024 15:04:10:  No placement marker genes found.
29-09-2024 15:04:10:  Searching for marker genes in base database
29-09-2024 15:04:10:  No placement marker genes found.
29-09-2024 15:04:10:  Searching for marker genes in base database
29-09-2024 15:04:10:  No placement marker genes found.
29-09-2024 15:04:10:  Searching for marker genes in base database
29-09-2024 15:04:11:  Found 11 marker genes, placing them in the tree using epa-ng
29-09-2024 15:04:33:  Genome belongs to clade: protozoa (Best TaxID: 33846)
29-09-2024 15:04:33:  Searching for marker genes in protozoa database
29-09-2024 15:04:34:  Found 8 marker genes, placing them in the tree using epa-ng
29-09-2024 15:04:42:  Automatically locating best SCMG set
29-09-2024 15:05:08:  ❗The choosen marker gene set is supported by only half (1/8) of the alignments. This generally is an unstable estimate.
29-09-2024 15:05:12:  Searching fasta for selected markers
29-09-2024 15:05:20:  Completeness: 29.2
29-09-2024 15:05:20:  Contamination: 0.88
29-09-2024 15:05:20:  Max silent contamination: 100.0
29-09-2024 15:05:20:  Searching for marker genes in base database
29-09-2024 15:05:21:  Found 14 marker genes, placing them in the tree using epa-ng
29-09-2024 15:05:50:  Genome belongs to clade: protozoa (Best TaxID: 33846)
29-09-2024 15:05:50:  Searching for marker genes in protozoa database
29-09-2024 15:05:52:  Found 11 marker genes, placing them in the tree using epa-ng
29-09-2024 15:06:01:  Automatically locating best SCMG set
29-09-2024 15:06:28:  ❗The choosen marker gene set is supported by only half (1/11) of the alignments. This generally is an unstable estimate.
29-09-2024 15:06:32:  Searching fasta for selected markers
29-09-2024 15:06:42:  Completeness: 48.67
29-09-2024 15:06:42:  Contamination: 2.65
29-09-2024 15:06:42:  Max silent contamination: 100.0
29-09-2024 15:06:42:  Searching for marker genes in base database
29-09-2024 15:06:42:  No placement marker genes found.
29-09-2024 15:06:42:  Searching for marker genes in base database
29-09-2024 15:06:43:  No placement marker genes found.
29-09-2024 15:06:43:  Searching for marker genes in base database
29-09-2024 15:06:43:  No placement marker genes found.
29-09-2024 15:06:43:  Searching for marker genes in base database
29-09-2024 15:06:43:  No placement marker genes found.
29-09-2024 15:06:43:  Searching for marker genes in base database
29-09-2024 15:06:44:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:06:47:  Genome belongs to clade: protozoa (Best TaxID: 37358)
29-09-2024 15:06:47:  Searching for marker genes in protozoa database
29-09-2024 15:06:48:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:06:50:  Automatically locating best SCMG set
29-09-2024 15:07:17:  Could not identify a single suitable marker gene set
29-09-2024 15:07:17:  No marker gene set could be found with these settings 🙁
Change your parameters and try again?
29-09-2024 15:07:17:  Searching for marker genes in base database
29-09-2024 15:07:17:  No placement marker genes found.
29-09-2024 15:07:17:  Searching for marker genes in base database
29-09-2024 15:07:17:  No placement marker genes found.
29-09-2024 15:07:17:  Searching for marker genes in base database
29-09-2024 15:07:18:  No placement marker genes found.
29-09-2024 15:07:18:  Searching for marker genes in base database
29-09-2024 15:07:18:  No placement marker genes found.
29-09-2024 15:07:18:  Searching for marker genes in base database
29-09-2024 15:07:19:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 15:07:24:  Genome belongs to clade: protozoa (Best TaxID: protist_common)
29-09-2024 15:07:24:  Searching for marker genes in protozoa database
29-09-2024 15:07:25:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 15:07:28:  Automatically locating best SCMG set
29-09-2024 15:07:55:  Could not identify a single suitable marker gene set
29-09-2024 15:07:55:  No marker gene set could be found with these settings 🙁
Change your parameters and try again?
29-09-2024 15:07:55:  Searching for marker genes in base database
29-09-2024 15:07:55:  No placement marker genes found.
29-09-2024 15:07:55:  Searching for marker genes in base database
29-09-2024 15:07:55:  No placement marker genes found.
29-09-2024 15:07:55:  Searching for marker genes in base database
29-09-2024 15:07:55:  No placement marker genes found.
29-09-2024 15:07:55:  Searching for marker genes in base database
29-09-2024 15:07:56:  Found 5 marker genes, placing them in the tree using epa-ng
29-09-2024 15:08:04:  Genome belongs to clade: protozoa (Best TaxID: 33634)
29-09-2024 15:08:04:  Searching for marker genes in protozoa database
29-09-2024 15:08:05:  Found 3 marker genes, placing them in the tree using epa-ng
29-09-2024 15:08:08:  Automatically locating best SCMG set
29-09-2024 15:08:46:  ❗The choosen marker gene set is supported by only half (1/3) of the alignments. This generally is an unstable estimate.
29-09-2024 15:08:57:  Searching fasta for selected markers
29-09-2024 15:09:10:  Completeness: 2.0
29-09-2024 15:09:10:  Contamination: 0.4
29-09-2024 15:09:10:  Max silent contamination: 100.0
29-09-2024 15:09:10:  Searching for marker genes in base database
29-09-2024 15:09:10:  No placement marker genes found.
29-09-2024 15:09:10:  Searching for marker genes in base database
29-09-2024 15:09:11:  Found 4 marker genes, placing them in the tree using epa-ng
29-09-2024 15:09:18:  Genome belongs to clade: protozoa (Best TaxID: protist_common)
29-09-2024 15:09:18:  Searching for marker genes in protozoa database
29-09-2024 15:09:18:  Found 5 marker genes, placing them in the tree using epa-ng
29-09-2024 15:09:22:  Automatically locating best SCMG set
29-09-2024 15:09:43:  ❗The choosen marker gene set is supported by only half (2/5) of the alignments. This generally is an unstable estimate.
29-09-2024 15:09:44:  Searching fasta for selected markers
29-09-2024 15:09:45:  Completeness: 25.97
29-09-2024 15:09:45:  Contamination: 1.3
29-09-2024 15:09:45:  Max silent contamination: 100.0
29-09-2024 15:09:45:  Searching for marker genes in base database
29-09-2024 15:09:46:  No placement marker genes found.
29-09-2024 15:09:46:  Searching for marker genes in base database
29-09-2024 15:09:46:  No placement marker genes found.
29-09-2024 15:09:46:  Searching for marker genes in base database
29-09-2024 15:09:47:  Found 7 marker genes, placing them in the tree using epa-ng
29-09-2024 15:10:00:  Genome belongs to clade: protozoa (Best TaxID: 33630)
29-09-2024 15:10:00:  Searching for marker genes in protozoa database
29-09-2024 15:10:01:  Found 11 marker genes, placing them in the tree using epa-ng
29-09-2024 15:10:12:  Automatically locating best SCMG set
29-09-2024 15:10:48:  ❗The choosen marker gene set is supported by only half (4/11) of the alignments. This generally is an unstable estimate.
29-09-2024 15:10:55:  Searching fasta for selected markers
29-09-2024 15:11:11:  Completeness: 48.38
29-09-2024 15:11:11:  Contamination: 4.86
29-09-2024 15:11:11:  Max silent contamination: 100.0
29-09-2024 15:11:11:  Searching for marker genes in base database
29-09-2024 15:11:12:  No placement marker genes found.
29-09-2024 15:11:12:  Searching for marker genes in base database
29-09-2024 15:11:12:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:11:15:  Genome belongs to clade: protozoa (Best TaxID: 31345)
29-09-2024 15:11:15:  Searching for marker genes in protozoa database
29-09-2024 15:11:15:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 15:11:17:  Automatically locating best SCMG set
29-09-2024 15:11:53:  Searching fasta for selected markers
29-09-2024 15:11:57:  Completeness: 1.4
29-09-2024 15:11:57:  Contamination: 0.0
29-09-2024 15:11:57:  Max silent contamination: 100.0
29-09-2024 15:11:57:  Searching for marker genes in base database
29-09-2024 15:11:58:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:12:01:  Genome belongs to clade: protozoa (Best TaxID: 1242273)
29-09-2024 15:12:01:  Searching for marker genes in protozoa database
29-09-2024 15:12:01:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:12:03:  Automatically locating best SCMG set
29-09-2024 15:12:30:  Could not identify a single suitable marker gene set
29-09-2024 15:12:30:  No marker gene set could be found with these settings 🙁
Change your parameters and try again?
29-09-2024 15:12:30:  Searching for marker genes in base database
29-09-2024 15:12:30:  No placement marker genes found.
29-09-2024 15:12:30:  Searching for marker genes in base database
29-09-2024 15:12:31:  Found 9 marker genes, placing them in the tree using epa-ng
29-09-2024 15:12:42:  Genome belongs to clade: protozoa (Best TaxID: 5878)
29-09-2024 15:12:42:  Searching for marker genes in protozoa database
29-09-2024 15:12:43:  Found 10 marker genes, placing them in the tree using epa-ng
29-09-2024 15:12:47:  Automatically locating best SCMG set
29-09-2024 15:13:16:  ❗The choosen marker gene set is supported by only half (4/10) of the alignments. This generally is an unstable estimate.
29-09-2024 15:13:19:  Searching fasta for selected markers
29-09-2024 15:13:23:  Completeness: 33.02
29-09-2024 15:13:23:  Contamination: 11.16
29-09-2024 15:13:23:  Max silent contamination: 100.0
29-09-2024 15:13:23:  Searching for marker genes in base database
29-09-2024 15:13:24:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:13:28:  Genome belongs to clade: protozoa (Best TaxID: 5738)
29-09-2024 15:13:28:  Searching for marker genes in protozoa database
29-09-2024 15:13:28:  Found 2 marker genes, placing them in the tree using epa-ng
29-09-2024 15:13:31:  Automatically locating best SCMG set
29-09-2024 15:13:47:  Searching fasta for selected markers
29-09-2024 15:13:47:  Completeness: 3.17
29-09-2024 15:13:47:  Contamination: 0.0
29-09-2024 15:13:47:  Max silent contamination: 100.0
29-09-2024 15:13:47:  Searching for marker genes in base database
29-09-2024 15:13:48:  Found 9 marker genes, placing them in the tree using epa-ng
29-09-2024 15:14:07:  Genome belongs to clade: protozoa (Best TaxID: 33846)
29-09-2024 15:14:07:  Searching for marker genes in protozoa database
29-09-2024 15:14:08:  Found 6 marker genes, placing them in the tree using epa-ng
29-09-2024 15:14:14:  Automatically locating best SCMG set
29-09-2024 15:14:41:  Could not identify a single suitable marker gene set
29-09-2024 15:14:41:  No marker gene set could be found with these settings 🙁
Change your parameters and try again?
29-09-2024 15:14:41:  Searching for marker genes in base database
29-09-2024 15:14:41:  No placement marker genes found.
29-09-2024 15:14:41:  Searching for marker genes in base database
29-09-2024 15:14:41:  No placement marker genes found.
29-09-2024 15:14:41:  Searching for marker genes in base database
29-09-2024 15:14:41:  Found 1 marker genes, placing them in the tree using epa-ng
29-09-2024 15:14:45:  Genome belongs to clade: protozoa (Best TaxID: 2699528)
29-09-2024 15:14:45:  Searching for marker genes in protozoa database
29-09-2024 15:14:45:  No placement marker genes found.
29-09-2024 15:14:45:  Searching for marker genes in base database
29-09-2024 15:14:45:  No placement marker genes found.
29-09-2024 15:14:45:  Searching for marker genes in base database
29-09-2024 15:14:46:  Found 3 marker genes, placing them in the tree using epa-ng
29-09-2024 15:14:51:  Genome belongs to clade: protozoa (Best TaxID: protist_common)
29-09-2024 15:14:51:  Searching for marker genes in protozoa database
29-09-2024 15:14:52:  Found 4 marker genes, placing them in the tree using epa-ng
29-09-2024 15:14:55:  Automatically locating best SCMG set
29-09-2024 15:15:22:  ❗The choosen marker gene set is supported by only half (1/4) of the alignments. This generally is an unstable estimate.
29-09-2024 15:15:26:  Searching fasta for selected markers
29-09-2024 15:15:29:  Completeness: 6.64
29-09-2024 15:15:29:  Contamination: 0.44
29-09-2024 15:15:29:  Max silent contamination: 100.0
29-09-2024 15:15:29:  Found 0 large bins to merge with
29-09-2024 15:15:29:  Created 0 merged bins

@KateSakharova
Copy link
Contributor

Hi @rrohwer, could you share a couple of bin fasta-files that are missing from eukcc.csv, please?

@rrohwer
Copy link

rrohwer commented Dec 9, 2024

Here is an example, when run individually this one was ID'ed by eukCC as 48.38 complete and 4.86 contaminated, but when run as part of the folder it was not included in the output.
bin66.fna.zip

@KateSakharova
Copy link
Contributor

Hi @rrohwer and @zhenjiaofenjie,
EukCC has a check for "good"/"bad" genome happened here. When genome is "bad" quality you get a message about it starting with red exclamation mark (❗The choosen marker gene set is supported by only half...). You should get that message for your genomes in single mode and folder mode.
For folder mode eukcc ignores "bad" quality genomes in output eukcc.csv written here.
But for single mode it prints values to output.

I know that is not described anywhere, sorry for that! Maybe makes sense considering adding "bad_quality.csv" as output...

@rrohwer
Copy link

rrohwer commented Dec 16, 2024

Thanks for explaining! Another option instead of a separate csv file could also just be to report all in one file, with an added column for quality warnings/alignment support of marker set. I think this might be more intuitive for users, since bad_quality.csv is a little confusing (low completeness/high contam is also considered low quality, but those are getting reported in the primary output). Thanks again for your help with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants