Graphite V2

Longest exact matches, LMEMS, in a collection of sequences

Graphite starts with two graph files (from Cuttlefish) and a set of query identifiers. It then builds a suffix array of the queries along with other datastructures to speed up matching. Then each sequence (i.e "reference") is read from the graph file and mapped onto the Suffix array. Each mapping is an identical sequence between the query and ref, also called Maximum Exact Matches (MEMs). Each time a MEM is found its length is compared to previously discovered MEMs. When the current MEM covers a region extending beyond a previous, or multiple previous MEMs Graphite replaces it.

Usage

Clone the repo git clone https://github.com/rickbeeloo/GraphiteV2 This already includes libasais but can then only be used from within the GraphiteV2 folder.

To call Graphite, use:

julia main.jl -g test_data/phage_graph.cf_seq -s test_data/phage_graph.cf_seg -k 31 -o test_data/output.txt -q test_data/query.txt

-g, the graph path file, when constructed using Cuttlefish the .cf_seq file.
-s, the node sequence file, when constructed using Cuttlefish the .cf_seq file
-k, the k-mer size used to build the graph (i.e. 31 nucleotides)
-q, a file with query identifiers. These should match those in -g
-o, the output file path

Output

The output file has four columns:

The query identifier (as in -q)
The start position in nucleotides on the query sequence
The end position in nucleotides on the query sequence
The original LMEM length*

*The LMEM length can differ from the size of the output match (as calculted from end-start+1). This is because the LMEM can have been replaced by another, longer, MEM, yet we keep track of the original length and output this to get an idea of how the matches were shortened

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
libsais-2.7.1		libsais-2.7.1
test_data		test_data
.gitignore		.gitignore
README.md		README.md
graph_io.jl		graph_io.jl
graphite.jl		graphite.jl
main.jl		main.jl
suffixArray.jl		suffixArray.jl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graphite V2

Longest exact matches, LMEMS, in a collection of sequences

Usage

Output

About

Releases

Packages

Languages

rickbeeloo/GraphiteV2

Folders and files

Latest commit

History

Repository files navigation

Graphite V2

Longest exact matches, LMEMS, in a collection of sequences

Usage

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages