Replies: 4 comments
-
What you want will look something like this: let runtime = tokio::runtime::Builder::new_multi_thread()
.worker_threads(6)
.build()
.expect("Unable to create runtime.");
struct DataIter<'a> {
data: &'a BedData,
current_bin: usize
}
impl Iterator for DataIter<'a> {
impl Item = (String, bigtools::Value);
fn next(&mut Self) -> Option<Self::Item> {
let bin = data.coverage_data[self.current_bin];
// Figure out the chromosome and data
let chrom: String = ...;
let val = bigtools::Value {
start: ...,
end: ...,
value: ...,
};
self.current_bin += 1;
(chrom, val)
}
}
let data: BedData = ...;
let chrom_map: HashMap<String, u32> = data.genome_info.iter().map(|(chrom, len, _)| (chrom.clone(), len as u32)).collect();
let iter = DataIter { data: &data, current_bin: 0 };
let data = BedParserStreamingIterator::wrap_infallible_iter(iter);
let outfile: Path = ...;
let outb = BigWigWrite::create_file(outfile, chrom_map).unwrap();
outb.write(data, runtime).unwrap(); Any place with a I'm looking at https://github.com/stela2502/bam_tide/blob/main/src/bed_data.rs right now. First thing that pops out to me, is that it would probably be good if you stored your coverage data as a I'll also note that if your input data in sorted within chromosomes by start, then you can likely even make the entire coverage generation + writing completely lazy, but I'll leave that as an exercise for the reader. |
Beta Was this translation helpful? Give feedback.
-
Thank you! That looks like the thing I was lacking! I‘ll test that first thing on Monday!
…________________________________
Von: Jack Huey ***@***.***>
Gesendet: Freitag, 17. Januar 2025 15:32
An: jackh726/bigtools ***@***.***>
Cc: Stefan Lang ***@***.***>; Author ***@***.***>
Betreff: Re: [jackh726/bigtools] How do I write my own data using your lib? (Issue #73)
What you want will look something like this:
let runtime = tokio::runtime::Builder::new_multi_thread()
.worker_threads(6)
.build()
.expect("Unable to create runtime.");
struct DataIter<'a> {
data: &'a BedData,
current_bin: usize
}
impl Iterator for DataIter<'a> {
impl Item = (String, bigtools::Value);
fn next(&mut Self) -> Option<Self::Item> {
let bin = data.coverage_data[self.current_bin];
// Figure out the chromosome and data
let chrom: String = ...;
let val = bigtools::Value {
start: ...,
end: ...,
value: ...,
};
self.current_bin += 1;
(chrom, val)
}
}
let data: BedData = ...;
let chrom_map: HashMap<String, u32> = data.genome_info.iter().map(|(chrom, len, _)| (chrom.clone(), len as u32)).collect();
let iter = DataIter { data: &data, current_bin: 0 };
let data = BedParserStreamingIterator::wrap_infallible_iter(iter);
let outfile: Path = ...;
let outb = BigWigWrite::create_file(outfile, chrom_map).unwrap();
outb.write(data, runtime).unwrap();
Any place with a ... will be what you need to fill in.
________________________________
I'm looking at https://github.com/stela2502/bam_tide/blob/main/src/bed_data.rs right now. First thing that pops out to me, is that it would probably be good if you stored your coverage data as a Vec<(String, Vec<u32>)>, keyed by chromosome, or at the very least maintained some separate state with the offsets for each chromosome into the coverage vec. Otherwise, you're going to end up calculating the current chromosome for each value (though there are various ways to precompute this, but you already have that info when you create the BedData, so I suggest you just keep it around). As a bonus, storing the String in the Vec also allows you to return a &'a str for each value instead of a String, which will heavily reduce your allocation costs.
I'll also note that if your input data in sorted within chromosomes by start, then you can likely even make the entire coverage generation + writing completely lazy, but I'll leave that as an exercise for the reader.
—
Reply to this email directly, view it on GitHub<#73 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AC5TO2X3JSLKHDOPM5X2CGD2LEH7HAVCNFSM6AAAAABVLSRPMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJYGQ4TOMBVGY>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
My problem could be fixed with the additional info you gave me here! |
Beta Was this translation helpful? Give feedback.
-
Just in case this might become useful for somebody else, too: I have a class that defined the data as
And I have implemented the Value like that:
I put the iterator into a new file:
And the function that saves the data is this in BedData:
Hope that helps! |
Beta Was this translation helpful? Give feedback.
-
Hi Jack,
I want to convert bam to bigwig as the tools we use are clocking our server.
https://github.com/stela2502/bam_tide
My main problem is to get the data out as biwigs. And here is where your library comes in.
But I do not understand how I could create a writer from your library and feed it data from mine.
I would greatly appreciate if you could give me the basics of how to make my my class fit for being exported by you lib.
And I get the feeling that if I just would know what the different functions of yours would return I would get it.
My class looks like this:
pub struct BedData {
genome_info: Vec<(String, usize, usize)>, // (chromosome name, length, bin offset)
coverage_data: Vec, // coverage data array for bins
bin_width: usize, // bin width for coverage
}
The genome_info contains the chr name, the chr length and the start id in the coverage array. The coverage array is just all the data for the whole genome in one array and bin_width is exactly that. The genome_info can easily be converted to the chrom_sizes you need to initialize your Writer https://docs.rs/bigtools/latest/bigtools/struct.BigWigWrite.html#method.create_file, but the Traits I need to implement to get my data into yours are to complicated for me. But I get the impression it should not be too complicated - or?
Please help me!
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions