Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAT: Database was built with a different version of Diamond and is incompatible. #90

Closed
ropolomx opened this issue Aug 15, 2020 · 9 comments

Comments

@ropolomx
Copy link

ropolomx commented Aug 15, 2020

I get the following error using CAT with the CAT_prepare_20200618.tar.gz database:

Error: Database was built with a different version of Diamond and is incompatible.
[2020-08-14 20:06:10.946813] ERROR: DIAMOND finished abnormally.

Not really an error of the mag pipeline, but it might be worth checking out mainly for purposes of updating the documentation.

@ropolomx
Copy link
Author

ropolomx commented Aug 15, 2020

I am using the conda profile and this revision: revision: a57d2f49ad [dev]

nf-core/mag -r dev -profile conda --reads './*_{1,2}.fastq.gz' --max_memory 512.GB -c myconfig.conf --kraken2_db /isilon/lethbridge-rdc/users/ortegapoloro/kraken2db/minikraken2_v1_8GB_201904_UPDATE.tgz --cat_db /isilon/lethbridge-rdc/users/ortegapoloro/cat_db/CAT_prepare_20200618.tar.gz --trimming-quality 20 --mean-quality 20 --email [email protected] --plaintext_email --monochrome_logs -resume

@ropolomx
Copy link
Author

ropolomx commented Aug 15, 2020

The diamond version in the conda environment is v0.9.24.125. From the CAT repo it seems that it has been tested only with 0.9.14. However, in the CAT Prepare download site, they have diamond versions 0.9.21 and 0.9.34. All are available in Bioconda, and I might try them. I also opened an issue in the CAT repository: MGXlab/CAT_pack#45. It would be interesting to know which version exactly they used to create that database.

@bastiaanvonmeijenfeldt
Copy link

This has been answered in dutilh/CAT#45 but you can find the DIAMOND database version in the CAT prepare log file:

$ grep "diamond version" 2020-06-18.CAT_prepare.fresh.log
[2020-06-18 14:53:07.163156] DIAMOND found: diamond version 0.9.34.

We're thinking of a more robust way to prevent DIAMOND incompatability issues in a future release of CAT.

Hope this helps for now!

@d4straub
Copy link
Collaborator

Thanks for reporting @ropolomx and also @bastiaanvonmeijenfeldt !

For reference, when using the docker image of mag v1.0.0 (currently identical diamond version to dev), CAT_prepare_20190108.tar.gz & CAT_prepare_20200304.tar.gz were working in the past for me.
I see that in the docs CAT_prepare_20200618.tar.gz is recommended, that is of course something that needs to be updated. Again, thanks for bringing this up.

@ropolomx
Copy link
Author

ropolomx commented Aug 17, 2020

Thank you @d4straub ! I am working in a cluster where it is not possible to use docker or singularity. I will explore installing diamond 0.9.34 in the conda environment in my runs with the dev workflow. If I try to this in the current project I already started to run mag on, I think it might mess with the hashes of the conda directory in work when one wants to resume. Maybe I might skip the CAT searches for now, and let the rest of the workflow finish. This is how the conda environment might change when adding diamond 0.9.3.4:

The following NEW packages will be INSTALLED:

  cdbtools           bioconda/linux-64::cdbtools-0.99-he513fc3_5
  libgcc             conda-forge/linux-64::libgcc-7.2.0-h69d50b8_2
  mysql-connector-c  bioconda/linux-64::mysql-connector-c-6.1.6-2
  ucsc-fatotwobit    bioconda/linux-64::ucsc-fatotwobit-357-1
  ucsc-twobitinfo    bioconda/linux-64::ucsc-twobitinfo-357-1

The following packages will be UPDATED:

  augustus                            3.3.2-pl526h985c5e9_2 --> 3.3.3-pl526hce533f5_2
  boost                            1.68.0-py36h8619c78_1001 --> 1.70.0-py36h9de70de_1
  boost-cpp                            1.68.0-h11c811c_1000 --> 1.70.0-ha2d47e9_1
  diamond                                 0.9.24-ha888412_1 --> 0.9.34-h56fc30b_0
  gsl                                     2.4-h294904e_1006 --> 2.5-h294904e_1
  openblas                              0.3.3-h9ac9557_1001 --> 0.3.6-h6e990d7_6
  openssl                                 1.1.1g-h516909a_0 --> 1.1.1g-h516909a_1

The following packages will be SUPERSEDED by a higher-priority channel:

  libopenblas        pkgs/main::libopenblas-0.3.10-h5a2b25~ --> conda-forge::libopenblas-0.3.6-h6e990d7_6
  sqlite              conda-forge::sqlite-3.32.3-hcee41ef_1 --> pkgs/main::sqlite-3.31.1-h7b6447c_0

The following packages will be DOWNGRADED:

  blast                              2.2.31-pl526he19e7b1_4 --> 2.2.31-pl526h3066fca_3
  curl                                    7.71.1-he644dc0_1 --> 7.68.0-hf8cf82a_0
  krb5                                    1.17.1-hfafb76e_1 --> 1.16.4-h2fd8d38_0
  libblas                                 3.8.0-17_openblas --> 3.8.0-11_openblas
  libcblas                                3.8.0-17_openblas --> 3.8.0-11_openblas
  libcurl                                 7.71.1-hcdd3856_1 --> 7.68.0-hda55be3_0
  liblapack                               3.8.0-17_openblas --> 3.8.0-11_openblas
  pysam                             0.16.0.1-py36h4c34d4e_1 --> 0.16.0-py36h873a209_0
  python                        3.6.7-h357f687_1008_cpython --> 3.6.7-h381d211_1004
  r-base                                   3.6.1-haffb61f_2 --> 3.6.0-hce969dd_0
  readline                                   8.0-h46ee950_1 --> 7.0-hf8c457e_1001

ropolomx added a commit to ropolomx/mag that referenced this issue Aug 18, 2020
Updated diamond version to 0.9.34 to address issue of incompatibility with CAT database: nf-core#90
@d4straub
Copy link
Collaborator

The updated boost and BUSCO might clash. And bin completeness (BUSCO) information is more important than running CAT with the newest database, in my opinion. But I might be wrong and it works, let me know.

@ropolomx
Copy link
Author

I agree @d4straub: bin completeness with BUSCO would take priority in my opinion too. I had only access to the newest CAT database, but I can always run that separately for now. I also saw the conflict of that diamond version with boost, so I will see if there is another way to make it work. Thanks!

@bastiaanvonmeijenfeldt
Copy link

As a workaround, CAT allows for runs with DIAMOND versions that are not in your $PATH variable, via the --path_to_diamond flag. So there could be two different versions of DIAMOND distributed, one in your $PATH for BUSCO and another for CAT.

We're thinking of making this the default option for CAT, where we supply the correct DIAMOND binary with the database files...

@skrakau
Copy link
Member

skrakau commented May 25, 2021

One solution for such problems in the future is to create the CAT database yourself within the mag pipeline (--cat_db_generate), ensuring the same DIAMOND version for building the database and running the classification. Added with #196.

@skrakau skrakau closed this as completed May 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants