Skip to content

Commit

Permalink
Merge pull request #1192 from smehringer/alignmentIO_cigar_string
Browse files Browse the repository at this point in the history
Add CIGAR string support to alignment IO
  • Loading branch information
rrahn authored Oct 9, 2019
2 parents 19ab4cd + 1cf7756 commit 9f98962
Show file tree
Hide file tree
Showing 15 changed files with 558 additions and 458 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ If possible, provide tooling that performs the changes, e.g. a shell-script.
#### Input/Output

* Asynchronous input (background file reading) supported via seqan3::view::async_input_buffer.
* Reading field::CIGAR into a vector over seqan3::cigar is supported via seqan3::alignment_file_input.
* Writing field::CIGAR into a vector over seqan3::cigar is supported via seqan3::alignment_file_output.

## API changes

Expand Down
44 changes: 44 additions & 0 deletions doc/tutorial/alignment_file/alignment_file_read_cigar.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#include <fstream>

#include <seqan3/std/filesystem>

struct write_file_dummy_struct
{
write_file_dummy_struct()
{

auto file_raw = R"//![sam_file](
@HD VN:1.6 SO:coordinate
@SQ SN:ref LN:45
r001 99 ref 7 30 8M2I4M1D3M = 37 39 TTAGATAAAGGATACTG *
r003 0 ref 9 30 5S6M * 0 0 GCCTAAGCTAA * SA:Z:ref,29,-,6H5M,17,0;
r004 0 ref 16 30 6M14N5M * 0 0 ATAGCTTCAGC *
r003 2064 ref 29 17 5M * 0 0 TAGGC * SA:Z:ref,9,+,5S6M,30,1;
r001 147 ref 37 30 9M = 7 -39 CAGCGGCAT * NM:i:1
)//![sam_file]";

std::ofstream file{std::filesystem::temp_directory_path()/"my.sam"};
std::string str{file_raw};
file << str.substr(1); // skip first newline
}
};

write_file_dummy_struct go{};

//![code]
#include <seqan3/core/debug_stream.hpp>
#include <seqan3/io/alignment_file/all.hpp>
#include <seqan3/std/filesystem>

using namespace seqan3;

int main()
{
std::filesystem::path tmp_dir = std::filesystem::temp_directory_path(); // get the temp directory

alignment_file_input fin{tmp_dir/"my.sam", fields<field::CIGAR>{}};

for (auto & [cigar] : fin)
debug_stream << cigar << std::endl;
}
//![code]
33 changes: 20 additions & 13 deletions doc/tutorial/alignment_file/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,19 +80,19 @@ but you needn't deal with this field manually.
Note that some of the fields are specific to the SAM format, while some are specific to BLAST.
To make things clearer, here is the table of SAM columns connected to the corresponding alignment file field:

| # | SAM Column ID | FIELD name |
|:--:|:--------------|:--------------------------------------------------|
| 1 | QNAME | seqan3::field::ID |
| 2 | FLAG | seqan3::field::FLAG |
| 3 | RNAME | seqan3::field::REF_ID |
| 4 | POS | seqan3::field::REF_OFFSET |
| 5 | MAPQ | seqan3::field::MAPQ |
| 6 | CIGAR | implicitly stored in seqan3::field::ALIGNMENT |
| 7 | RNEXT | seqan3::field::MATE (tuple pos 0) |
| 8 | PNEXT | seqan3::field::MATE (tuple pos 1) |
| 9 | TLEN | seqan3::field::MATE (tuple pos 2) |
| 10 | SEQ | seqan3::field::SEQ |
| 11 | QUAL | seqan3::field::QUAL |
| # | SAM Column ID | FIELD name |
|:--:|:--------------|:----------------------------------------------------------------------------------|
| 1 | QNAME | seqan3::field::ID |
| 2 | FLAG | seqan3::field::FLAG |
| 3 | RNAME | seqan3::field::REF_ID |
| 4 | POS | seqan3::field::REF_OFFSET |
| 5 | MAPQ | seqan3::field::MAPQ |
| 6 | CIGAR | implicitly stored in seqan3::field::ALIGNMENT or directly in seqan3::field::CIGAR |
| 7 | RNEXT | seqan3::field::MATE (tuple pos 0) |
| 8 | PNEXT | seqan3::field::MATE (tuple pos 1) |
| 9 | TLEN | seqan3::field::MATE (tuple pos 2) |
| 10 | SEQ | seqan3::field::SEQ |
| 11 | QUAL | seqan3::field::QUAL |

## File extensions

Expand Down Expand Up @@ -265,6 +265,13 @@ r004 mapped against 1 with 0 gaps in the read sequence and 0 gaps in the referen

\endsolution

## Reading the CIGAR string

If you are accustomed to the raw CIGAR information, we also provide reading the cigar information into a
`std::vector<seqan3::cigar>` if you specify the `seqan3::field::CIGAR`.

\snippet doc/tutorial/alignment_file/alignment_file_read_cigar.cpp code

# Writing alignment files

## Writing records
Expand Down
1 change: 1 addition & 0 deletions include/seqan3/io/alignment_file/all.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* \brief Provides files and formats for handling alignment data.
*/

#include <seqan3/io/alignment_file/format_bam.hpp>
#include <seqan3/io/alignment_file/format_sam.hpp>
#include <seqan3/io/alignment_file/header.hpp>
#include <seqan3/io/alignment_file/input.hpp>
Expand Down
Loading

0 comments on commit 9f98962

Please sign in to comment.