You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fgrep functionality (available with grep -F) allows searching for m multiple fixed strings among n sequences in O(n) time rather than O(n*m) by leveraging the Aho-Corasick algorithm. For a concrete example, I have a fasta_to_tabular result (20,000 lines) that I want to search for many accession IDs (8,000); or, I might just as easily wish to search for a large number of arbitrary peptide sequences.
So, my issue (or question) is the approach to take:
If it's not good to modify the "Search in textfiles (grep)" tool, is there another tool that is a good fit?
Historically, fgrep functionality was merged into grep;
this may make sense to the standards developers, but bioinformaticians may not immediately assume that a tool labeled "grep" might be used with fixed strings, even though they are technically regular expressions matching one sequence.
if it's good to modify the "Search in textfiles (grep)" tool, the change that seems logical to me is:
Add a fourth option to Type of regex, e.g., "list of fixed strings (fgrep)";
and, when that option is chosen, enable an input field for a file of fixed strings, e.g., "File of fixed strings (one per line)".
When a dataset collection or multiple datasets are specified, they would be concatenated into a single file of substrings before invoking grep -F.
@bgruening Would you suggest that I submit a PR for the "Search in textfiles (grep)" tool?
The text was updated successfully, but these errors were encountered:
fgrep
functionality (available withgrep -F
) allows searching form
multiple fixed strings amongn
sequences in O(n) time rather than O(n*m) by leveraging the Aho-Corasick algorithm. For a concrete example, I have afasta_to_tabular
result (20,000 lines) that I want to search for many accession IDs (8,000); or, I might just as easily wish to search for a large number of arbitrary peptide sequences.So, my issue (or question) is the approach to take:
grep
" might be used with fixed strings, even though they are technically regular expressions matching one sequence.grep -F
.@bgruening Would you suggest that I submit a PR for the "Search in textfiles (grep)" tool?
The text was updated successfully, but these errors were encountered: