Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add DADA2-formatted reference databases to test/main #273

Open
gregvonkuster opened this issue Nov 27, 2019 · 12 comments
Open

Please add DADA2-formatted reference databases to test/main #273

gregvonkuster opened this issue Nov 27, 2019 · 12 comments
Labels
reference data CVMFS / IDC / Refgenie

Comments

@gregvonkuster
Copy link
Contributor

The dada2 tools are currently installed on Galaxy test and will soon be installed on Galaxy main. Please add the dada2 reference datasets https://benjjneb.github.io/dada2/training.html so that the tools that require them are functional. I believe that the General FASTA release will be sufficient, but others may be requested. Here is the download link for the general fast release: https://doi.org/10.15156/BIO/786343.

@bernt-matthias
Copy link

I guess Silva is quite popular. Sometimes users prepare RDP because it comes with copy number variation data (if I remember correctly) but its older.

@bernt-matthias
Copy link

There is also quite a bit extra info in the data manager's help

@jennaj jennaj added the reference data CVMFS / IDC / Refgenie label Dec 4, 2019
@gregvonkuster
Copy link
Contributor Author

@jennaj I have confirmed with the lab testing this pipeline that the General FASTA release https://doi.org/10.15156/BIO/786343 is what they need for reference datasets for their testing.

@bernt-matthias
Copy link

Is this already in the data manager (aka dada manager)?

@gregvonkuster
Copy link
Contributor Author

I also just asked @martenson how to get these fixes galaxyproject/tools-iuc#2705 applied to the tools on Galaxy test. I'm working with a lab doing some critical work with this pipeline. ;)

@natefoo
Copy link
Member

natefoo commented Jan 15, 2020

I ran the data manager and it appeared to succeed, but I couldn't find the data on Test. It looks like all the DMs we've installed lately are going to be messed up, e.g.:

    <table comment_char="#" name="dada2_species">
        <columns>value, name, path</columns>
        <file path="/tmp/tool-data/toolshed.g2.bx.psu.edu/repos/iuc/dada2_filterandtrim/cc41546adf56/dada2_species.loc"/>
        <tool_shed_repository>
            <tool_shed>toolshed.g2.bx.psu.edu</tool_shed>
            <repository_name>dada2_filterandtrim</repository_name>
            <repository_owner>iuc</repository_owner>
            <installed_changeset_revision>cc41546adf56</installed_changeset_revision>
        </tool_shed_repository>
    </table>

This is discussed in #31. Except unlike before, this is even more of a problem since we don't have the tool-data files in CVMFS to copy as described in step 3 - they were discarded after installation.

@gregvonkuster
Copy link
Contributor Author

@natefoo @davebx thanks for everything you've done on this. Sorry this has created some issues.

@natefoo
Copy link
Member

natefoo commented Jan 15, 2020

I fixed all the paths and whatnot, but the DM fails. The handler logs:

galaxy.tools.data_manager.manager WARNING 2020-01-15 14:30:16,408 No values for data table "dada2_taxonomy" were returned by the data manager "toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/data_manager/dada2_fetcher/0.0.1".

However, the DM's primary output appears to return a data table entry:

{"data_tables": {"dada2_taxonomy": {"name": "UNITE: General Fasta release 8.0 for Fungi", "path": "unite_8.0_fungi.taxonomy", "taxlevels": "Kingdom,Phylum,Class,Order,Family,Genus,Species", "value": "unite_8.0_fungi"}}}

Anyone with a better understanding of DMs know what's going on here?

@natefoo
Copy link
Member

natefoo commented Jan 15, 2020

Interestingly... the log message references an old version of the DM (0.0.1) which I don't believe is even installed (both 0.0.7 and 0.0.8 appear to be installed, and 0.0.8 is the one that ran). It appears to come from the entry in shed_data_manager_conf.xml:

    <data_manager guid="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/data_manager/dada2_fetcher/0.0.1" id="dada2_fetcher" shed_conf_file="/cvmfs/test.galaxyproject.org/config/shed_tool_conf.xml">
        <tool file="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/f57c13f5878b/data_manager_dada2/data_manager/dada2_fetcher.xml" guid="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/dada2_fetcher/0.0.7"><tool_shed>toolshed.g2.bx.psu.edu</tool_shed><repository_name>data_manager_dada2</repository_name><repository_owner>iuc</repository_owner><installed_changeset_revision>f57c13f5878b</installed_changeset_revision><id>toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/dada2_fetcher/0.0.7</id><version>0.0.7</version></tool><data_table name="dada2_taxonomy">
            <output>
                <column name="value" />
                <column name="name" />
                <column name="path" output_ref="out_file">
                    <move relativize_symlinks="True" type="file">
                        <source>${path}</source>
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">dada2/${path}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/dada2/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
                <column name="taxlevels" />
            </output>
        </data_table>
        <data_table name="dada2_species">
            <output>
                <column name="value" />
                <column name="name" />
                <column name="path" output_ref="out_file">
                    <move relativize_symlinks="True" type="file">
                        <source>${path}</source>
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">dada2/${path}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/dada2/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
            </output>
        </data_table>
    </data_manager>
    <data_manager guid="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/data_manager/dada2_fetcher/0.0.1" id="dada2_fetcher" shed_conf_file="/cvmfs/test.galaxyproject.org/config/shed_tool_conf.xml">
        <tool file="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/bf7b2c14cabc/data_manager_dada2/data_manager/dada2_fetcher.xml" guid="toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/dada2_fetcher/0.0.8"><tool_shed>toolshed.g2.bx.psu.edu</tool_shed><repository_name>data_manager_dada2</repository_name><repository_owner>iuc</repository_owner><installed_changeset_revision>bf7b2c14cabc</installed_changeset_revision><id>toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/dada2_fetcher/0.0.8</id><version>0.0.8</version></tool><data_table name="dada2_taxonomy">
            <output>
                <column name="value" />
                <column name="name" />
                <column name="path" output_ref="out_file">
                    <move relativize_symlinks="True" type="file">
                        <source>${path}</source>
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">dada2/${path}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/dada2/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
                <column name="taxlevels" />
            </output>
        </data_table>
        <data_table name="dada2_species">
            <output>
                <column name="value" />
                <column name="name" />
                <column name="path" output_ref="out_file">
                    <move relativize_symlinks="True" type="file">
                        <source>${path}</source>
                        <target base="${GALAXY_DATA_MANAGER_DATA_PATH}">dada2/${path}</target>
                    </move>
                    <value_translation>${GALAXY_DATA_MANAGER_DATA_PATH}/dada2/${path}</value_translation>
                    <value_translation type="function">abspath</value_translation>
                </column>
            </output>
        </data_table>
    </data_manager>

The correct version appears in the tool tag but not the data_manager tag. No idea if this is the problem, though.

@natefoo
Copy link
Member

natefoo commented Jan 15, 2020

I fixed the version and it's the same thing:

galaxy.tools.data_manager.manager WARNING 2020-01-15 15:25:18,524 No values for data table "dada2_taxonomy" were returned by the data manager "toolshed.g2.bx.psu.edu/repos/iuc/data_manager_dada2/data_manager/dada2_fetcher/0.0.8".

@gregvonkuster
Copy link
Contributor Author

Hmm..strange. Thanks @natefoo for your help!

@bernt-matthias
Copy link

Btw. new data_manager with silva 138 available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reference data CVMFS / IDC / Refgenie
Projects
None yet
Development

No branches or pull requests

4 participants