Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for rust uniwig to create output files from input .bam files #30

Open
9 tasks
donaldcampbelljr opened this issue Oct 7, 2024 · 1 comment
Open
9 tasks

Comments

@donaldcampbelljr
Copy link
Member

donaldcampbelljr commented Oct 7, 2024

Some work accomplished with PR #40.

We would like to use this code as a drop in replacement for bamSitesToWig.py from PEPATAC: https://github.com/databio/pepatac/blob/master/tools/bamSitesToWig.py

bamSitesToWig.py creates three files as output:

    • exact.bw
    • smoothed.bw
    • shift.bed

Currently, uniwig can take an input file of either:
bed, narrowPeak, bam

and create an output of:
wig
npy
bedGraph
bw (via an intermediate bedGraph conversion)

Some items to accomplish for this task:

  • add bed output support if given a non-bed-like file as input (i.e. bam).
  • Original bamSitesToWig adjusts positions based on flags. Noodles does support Flags but on first glance they are named differently and must be reviewed.
  • I believe PEPATAC pipeline only needs starts counts? So add ability to only output any counts the user wishes instead of always outputting all three (starts,ends, core).
  • Scaling is used in bamSitesToWig.py. Currently, uniwig does not scale counts. Should it?
  • Smooth vs Exact, currently uniwig only offers the smooth option, though the user can set smoothing =0. Is this the same output as the exact code? If so, why were two perl scripts used? For original counting algorithms, see:
    https://github.com/databio/pepatac/blob/master/tools/cutsToWig.pl
    https://github.com/databio/pepatac/blob/master/tools/smoothWig.pl

Nice to have:

  • if associated .bai does not exist, have uniwig create it (using noodles crate)
@donaldcampbelljr donaldcampbelljr changed the title Add ability for rust uniwig to create wiggle files from .bam files Add ability for rust uniwig to create output files from input .bam files Oct 24, 2024
@donaldcampbelljr
Copy link
Member Author

Working proof of concept in #40 , however, it uses an intermediate BedGraph file written to disk.

Therefore, we are exploring an alternative method in #47 which streams values directly to bigtools bw writer.

However, some challenges remain, namely:

  • when spawning separate threads to handle consumer/producer workflows, I see issues with error handling where an error does not seem to close the producer, causing the consumer to "hang"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant