Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bracken fails #51

Open
ehj000 opened this issue Aug 12, 2021 · 6 comments
Open

Bracken fails #51

ehj000 opened this issue Aug 12, 2021 · 6 comments
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed tool

Comments

@ehj000
Copy link

ehj000 commented Aug 12, 2021

The tool Bracken fails with the following error:

Checking report file: /data/part0/000/855/dataset_855987.dat Traceback (most recent call last): File "/usr/local/bin/est_abundance.py", line 529, in main() File "/usr/local/bin/est_abundance.py", line 315, in main k_file = open(args.kmer_distr,'r') FileNotFoundError: [Errno 2] No such file or directory: '2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7/database100mers.kmer_distrib'

Bracken (https://usegalaxy.no/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/bracken/est_abundance/2.6.1+galaxy0) and Bracken database builder (https://usegalaxy.no/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/data_manager_build_bracken_database/bracken_build_database/2.5+galaxy0) was installed individually from the toolshed.

Building a database with Bracken database builder went well. This was named "Bracken_standard_75mer_distrib_read_length100", and it is possible to select this database when using Bracken. However, from the error log it seems that Bracken in searching for a different database "2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7/database100mers.kmer_distrib"

Any suggestions to what can be causing this?

@ehj000 ehj000 added bug Something isn't working tool labels Aug 12, 2021
@torfinnnome
Copy link

Could you share an input file for Bracken?

Maybe we should discuss if we want a shared "Tools debugging" Galaxy account, to make debugging issues faster? Direct access to a history with a failed run would lower the mental barrier to start digging into this...

@torfinnnome torfinnnome added the help wanted Extra attention is needed label Aug 17, 2021
@torfinnnome
Copy link

Not obvious to me what is wrong, but I'm far from a Galaxy data manager/library expert. Would be great if someone with more experience would also take a look.

But that name you see in the error logs corresponds to the "Standard" Kraken database, which you based the Bracken database on, I assume?

@ehj000
Copy link
Author

ehj000 commented Aug 18, 2021

I build the Bracken database using Bracken Database Builder (Admin -> Local Data), with the same database that was built/downloaded for Kraken2 (Standard). I named the Bracken database "Bracken_standard_75mers_distrib_read_length100".

This database is available when I select the Bracken tool, but from the error report, the tool seem to want another database "2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7/database100mers.kmer_distrib". Could this be hard coded in "est_abundance.py"?

@kjetilkl kjetilkl self-assigned this Aug 18, 2021
@kjetilkl
Copy link
Contributor

"Bracken_standard_75mers_distrib_read_length100" is the display name you chose for the database, which is shown to the users. But the actual file name will be "databaseNmers.kmer_distrib" where N is the read length specified when creating the database. This file is placed in a subdirectory named after the chosen Kraken database (the full name of the Standard database here is "2020-11-26T021706Z_standard_kmer-len_35_minimizer-len_31_minimizer-spaces_6_load-factor_0.7"), so the error report does indeed reference the database file you created. The problem, I believe, is that the path to this file is relative (as seen in the last column of the loc-file: "/srv/galaxy/server/tool-data/toolshed.g2.bx.psu.edu/repos/iuc/data_manager_build_bracken_database/fd5830f88314/bracken_databases.loc"). The files created by data managers should normally be placed somewhere beneath the path specified with the "galaxy_data_manager_data_path" setting in "galaxy.yml" (defaults to the same value as "tool_data_path"), but I have not been able to find the actual location of the database file yet. I remember briefly discussing some time ago where we should place such files (or maybe add them to CVMFS), but we did not conclude on anything, which may be why these settings have not been explicitly configured.

@kjetilkl
Copy link
Contributor

I also suspect that the Bracken data manager tool by IUC may be to blame, since it does not actually move the generated files into a subdirectory of ${GALAXY_DATA_MANAGER_DATA_PATH} (which is something all the other data managers I have looked at do).

Here is the "data_manager_conf.xml" file for Bracken. It will output a line with 3 columns to the loc-file: a unique ID (value) for the reference dataset, a name displayed to the users and the path to the file(s). The path here is just the relative location of the file within the job working directory, so it is probably just deleted when the data manager job is finished.

<data_managers>
    <data_manager tool_file="data_manager/bracken_build_database.xml" id="bracken_build_database" version="2.5+galaxy0">
        <data_table name="bracken_databases">
            <output>
                <column name="value"/>
                <column name="name"/>
                <column name="path" output_ref="out_file"/>
            </output>
        </data_table>
    </data_manager>
</data_managers>

Below is an example of a typical "data_manager_conf.xml" file (here HISAT2). The files are moved to a different location outside of the job working directory, and the path is translated to point to this new location (which is an absolute path).

<?xml version="1.0"?>
<data_managers>
    <data_manager tool_file="data_manager/hisat2_index_builder.xml" id="hisat2_index_builder" version="0.0.1">
        <data_table name="hisat2_indexes">
            <output>
                <column name="value" />
                <column name="dbkey" />
                <column name="name" />
                <column name="path" output_ref="out_file" >
                    <move type="directory" relativize_symlinks="True">
                        <!-- <source>${path}</source>--> <!-- out_file.extra_files_path is used as base by default --> <!-- if no source, eg for type=directory, then refers to base -->
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">${dbkey}/hisat2_index/${value}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/${dbkey}/hisat2_index/${value}/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
            </output>
        </data_table>
    </data_manager>
</data_managers>

@torfinnnome
Copy link

IUC would probably love a PR on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed tool
Projects
None yet
Development

No branches or pull requests

3 participants