-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Big changes in sftp_handle.py and minor in utils.py and configuration.json #247
Conversation
and error handling, linting
restransmition_data = self.get_files_from_sftp_folder( | ||
folder, req_retransmition | ||
remote_md5sum = self.find_remote_md5sum(folder) | ||
if remote_md5sum: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now it only works for a unique md5sum file, right? Is this described in the upload manual?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it only works for a single md5sum file, I could add the option to use one md5file for each file but I thought it would make the code even more messy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this described in the manual? I don't mind only one file, but we need to make this as clear as possible, and indicate this in the docs (when we have some of those)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it is not, there is a strong need for documentation in these tools
There are a lot of issues fixed! Congrats! The most concerning though I have is that we urgently need a summary for this process output (for all the modules really). We need to now when this finishes how many samples have been downloaded, which ones which errors and which errors. It's very dificut to automatize the communication with the labs if we can't easyly send all errors and warnings, and keep track of everything. |
That is something that I already posted about in issues , I'll try to implement it as soon as possible |
…d fixed some bugs I found along the way
I now introduced some changes that should resolve the things that were pointed out in the conversations. Nonetheless, I found some bugs during the implementation of these changes that I fixed, e.g. a better handling of only one missing file in the sftp and a custom error type to raise certain errors related with metadata validation. There are a few temporal solutions that I will add to issues in order to remove them in later commits Also I introduced a basic validation process for single-end/paired-end reads without the proper number of files, so this now also closes #249 |
…ome bugs I found along the way
sftp_handle.py was not verifying the md5 values of the remote files correctly so it had to be fixed. Moreover, I included several testings to extract data from three possible md5checksum configurations: The one from linux
md5sum * > md5checksum.csv
another one from windows which includes headers ("Hash" and "Path") and windows paths (meaning "" instead of "/") and a last one that has asterisks before the paths coming from third-party softwares.I also included handling of uncompressed files in order to reduce the space occupied by these downloaded files, and made the code more robust on a general basis by including better error handling and retries for corrupted files (which were previously skipped somehow) and removed the verification or R1/R2 delimiters as it was redundant and obscure.
In order to do so I had to include a few methods in utils.py and some fields into configuration.json such as the header and sheet for the metadata template in excel format.
These changes should close #242 and close #246 and close #249