-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bracken fails #51
Comments
Could you share an input file for Bracken? Maybe we should discuss if we want a shared "Tools debugging" Galaxy account, to make debugging issues faster? Direct access to a history with a failed run would lower the mental barrier to start digging into this... |
Not obvious to me what is wrong, but I'm far from a Galaxy data manager/library expert. Would be great if someone with more experience would also take a look. But that name you see in the error logs corresponds to the "Standard" Kraken database, which you based the Bracken database on, I assume? |
I build the Bracken database using Bracken Database Builder (Admin -> Local Data), with the same database that was built/downloaded for Kraken2 (Standard). I named the Bracken database "Bracken_standard_75mers_distrib_read_length100". This database is available when I select the Bracken tool, but from the error report, the tool seem to want another database "2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7/database100mers.kmer_distrib". Could this be hard coded in "est_abundance.py"? |
"Bracken_standard_75mers_distrib_read_length100" is the display name you chose for the database, which is shown to the users. But the actual file name will be "databaseNmers.kmer_distrib" where N is the read length specified when creating the database. This file is placed in a subdirectory named after the chosen Kraken database (the full name of the Standard database here is "2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7"), so the error report does indeed reference the database file you created. The problem, I believe, is that the path to this file is relative (as seen in the last column of the loc-file: "/srv/galaxy/server/tool-data/toolshed.g2.bx.psu.edu/repos/iuc/data_manager_build_bracken_database/fd5830f88314/bracken_databases.loc"). The files created by data managers should normally be placed somewhere beneath the path specified with the "galaxy_data_manager_data_path" setting in "galaxy.yml" (defaults to the same value as "tool_data_path"), but I have not been able to find the actual location of the database file yet. I remember briefly discussing some time ago where we should place such files (or maybe add them to CVMFS), but we did not conclude on anything, which may be why these settings have not been explicitly configured. |
I also suspect that the Bracken data manager tool by IUC may be to blame, since it does not actually move the generated files into a subdirectory of Here is the "data_manager_conf.xml" file for Bracken. It will output a line with 3 columns to the loc-file: a unique ID (value) for the reference dataset, a name displayed to the users and the path to the file(s). The path here is just the relative location of the file within the job working directory, so it is probably just deleted when the data manager job is finished. <data_managers>
<data_manager tool_file="data_manager/bracken_build_database.xml" id="bracken_build_database" version="2.5+galaxy0">
<data_table name="bracken_databases">
<output>
<column name="value"/>
<column name="name"/>
<column name="path" output_ref="out_file"/>
</output>
</data_table>
</data_manager>
</data_managers> Below is an example of a typical "data_manager_conf.xml" file (here HISAT2). The files are moved to a different location outside of the job working directory, and the path is translated to point to this new location (which is an absolute path). <?xml version="1.0"?>
<data_managers>
<data_manager tool_file="data_manager/hisat2_index_builder.xml" id="hisat2_index_builder" version="0.0.1">
<data_table name="hisat2_indexes">
<output>
<column name="value" />
<column name="dbkey" />
<column name="name" />
<column name="path" output_ref="out_file" >
<move type="directory" relativize_symlinks="True">
<!-- <source>${path}</source>--> <!-- out_file.extra_files_path is used as base by default --> <!-- if no source, eg for type=directory, then refers to base -->
<target base="${GALAXY_DATA_MANAGER_DATA_PATH}">${dbkey}/hisat2_index/${value}</target>
</move>
<value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/${dbkey}/hisat2_index/${value}/${path}</value_translation>
<value_translation type="function">abspath</value_translation>
</column>
</output>
</data_table>
</data_manager>
</data_managers> |
IUC would probably love a PR on this. |
The tool Bracken fails with the following error:
Bracken (https://usegalaxy.no/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.6.1+galaxy0) and Bracken database builder (https://usegalaxy.no/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/data_manager_build_bracken_database/bracken_build_database/2.5+galaxy0) was installed individually from the toolshed.
Building a database with Bracken database builder went well. This was named "Bracken_standard_75mer_distrib_read_length100", and it is possible to select this database when using Bracken. However, from the error log it seems that Bracken in searching for a different database "2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7/database100mers.kmer_distrib"
Any suggestions to what can be causing this?
The text was updated successfully, but these errors were encountered: