-
Notifications
You must be signed in to change notification settings - Fork 23
transliterate_vals
Martin Asser Hansen edited this page Oct 2, 2015
·
6 revisions
Transliteration is ultra fast search and replace (or search and delete) of characters in values and is useful for things as converting sequence from RNA to DNA or removing indels from patterns.
... | transliterate_vals [options]
[-? | --help] # Print full usage description.
[-k <list> | --keys=<list>] # List of values to transliterate
[-s <string> | --search=<string>] # String of chars to locate and replace
[-r <string> | --replace=<string>] # String of chars for replacing
[-d <string> | --delete=<string>] # String of chars to delete
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
To convert RNA sequence to DNA:
transliterate_vals -k SEQ -s Uu -r Tt
To remove indels from patterns:
transliterate_vals -k PATTERN -s '._-~' -d
To visualize FASTQ quality scores, consider this FASTQ entry in the file test.fq
:
@ILLUMINA-52179E_0004:2:1:1045:16499#TTAGGC/1
CTTGGTGCCCGTCACGCGCACTGCGTCGCCCTGAATGCTCGCCTGNNCCT
+ILLUMINA-52179E_0004:2:1:1045:16499#TTAGGC/1
ceceeee\e``cd^^Yb`b`cc``c\accccZT`YTbYb`Y\VZYBBa\Y
Using transliterate_vals we can do:
read_fastq -i test.fq |
transliterate_vals -k SCORES -s "[@-h]" -r " ..........ooooooooooOOOOOOOOOO" |
write_fastq -x
Thus:
- Q30-Q40 is replaced with
O
- Q20-Q30 is replaced with
o
- Q10-Q20 is replaced with
.
- Q0-Q10 is replaced with blanks
And this outputs:
@ILLUMINA-52179E_0004:2:1:1045:16499#TTAGGC/1
CTTGGTGCCCGTCACGCGCACTGCGTCGCCCTGAATGCTCGCCTGNNCCT
+
OOOOOOOoOOOOOOOoOOOOOOOOOoOOOOOooOooOoOOooooo Ooo
Martin Asser Hansen - Copyright (C) - All rights reserved.
August 2007
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
transliterate_vals is part of the Biopieces framework.