Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unaccepted characters after Rfam-database-check #116

Open
Tim15-tech opened this issue Jul 16, 2024 · 4 comments
Open

Unaccepted characters after Rfam-database-check #116

Tim15-tech opened this issue Jul 16, 2024 · 4 comments

Comments

@Tim15-tech
Copy link

I encountered the error "assert (np.all(msa<=31))" when running the program sometimes. It seems that after checking the Rfam-database some annotations contain like an R, which is unknown within the RoseTTAFold2NA-RNAalphabet. I circumvent this problem by writing characters to an N. The code is in my forked repository in case someone encounters the same problem.

Thanks for the model!

@anar-rzayev
Copy link

First of all, thank you for your commits in the forked repo, they were quite helpful at least to get rid of the bad characters. I was wondering if there is any way to resolve the FASTA-Reader: Ignoring invalid residues at position(s) as I still get these issues when running your scripts.

Meanwhile, by any chance, have you encountered problems with downloading update_blastdb.pl --decompress n as it seems there are so many disconnections happening to NCBI, even with timeout 3600, passive FTP, and verbose outputs, I have hard time to download this 151GB dataset. Do you have any possible approaches to solve it I wanted to ask

@Tim15-tech
Copy link
Author

Sadly, I don't know. I'm not familiar with this kind of issue, however it might simply be due to the fasta-files? If the error is in the origin, then I assume it's fine for running the program - however, I finally don't know. I remember this message - if not wrongly remembered - and I think I simply ignored it.

However, regarding the second question: I think sadly the best is to have fast internet. In example, with poor internet git clone can be a horror. In case there are disconnections from NCBI-side maybe wait some days or a week. However, in the end a stable and fast connection is required. I downloaded it a my university on a server over days using screen, which allowed me to let the command run and detach my terminal to allow shutdown of my local PC.

@anar-rzayev
Copy link

Actually, you are right. As long as the bad characters are gone, it shouldn't be a big deal to have those FASTA invalid warnings. About the second point, in fact, I also run the tmux session as running nohup on the background didn't help initially:

/home/intern/protein/.conda/envs/RF2NA/bin/perl /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl --decompress --passive --timeout 3600 --force --verbose --verbose nt > nt_download_log.txt 2>&1

It is so strange that even with timeout of 1 hour, running the attached session disconnects after 1-2 hours. The output keeps telling that

Downloading nt.000.tar.gz...Net::FTP=GLOB(0x55b3acd21cf8)>>> PASV
Net::FTP=GLOB(0x55b3acd21cf8)<<< 227 Entering Passive Mode (130,14,250,12,196,235).
Net::FTP=GLOB(0x55b3acd21cf8)>>> RETR nt.000.tar.gz
Net::FTP=GLOB(0x55b3acd21cf8)<<< 150 Opening BINARY mode data connection for nt.000.tar.gz (4721879742 bytes)
Net::FTP: Net::Cmd::getline(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/lib/perl5/core_perl/Net/FTP/dataconn.pm line 82.
Unable to close datastream at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 202.
Net::FTP: Net::Cmd::_is_closed(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 203.
Net::FTP: Net::Cmd::_is_closed(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 203.
Failed to download nt.000.tar.gz.md5!
Net::FTP: Net::Cmd::_is_closed(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 101.
Net::FTP: Net::Cmd::_is_closed(): unexpected EOF on command channel:  at /home/intern/protein/.conda/envs/RF2NA/bin/update_blastdb.pl line 101.

Did you have any specific installations by any chance or running this perl script after maybe downloading MoreUtils or other library was sufficient?

@anar-rzayev
Copy link

Also, on top of the installations part, did you have any issues with downloading all the necessary packages for running RF2NA? Maybe any specific blast versions were necessary or any other specificity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants