Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --binning, --zoomify and --balancing arguments to pipeline #76

Merged
merged 17 commits into from
Aug 24, 2023
Merged

Add --binning, --zoomify and --balancing arguments to pipeline #76

merged 17 commits into from
Aug 24, 2023

Conversation

js2264
Copy link

@js2264 js2264 commented Aug 21, 2023

  • --binning INT: when --mat_fmt cool, a binned cool matrix can also be generated from pairs file;
  • --zoomify: if added (and --binning INT is present), the binned matrix will be zoomified at binning * [1, 2, 5, 10, 20, 50, 100] and balanced
  • --balancing STR if zoomify, the balancing options for the .mcool file can be specified (cooler default used by default)

So the suggested pipeline call could be something like:

hicstuff pipeline \
    --enzyme "DpnII,HinfI" \
    --matfmt cool  --binning=1000 --zoomify --balancing "--cis-only" \
    --genome <REF> \
    <R1> <R2>

I'm very tempted to change the default matfmt to "cool"...

@js2264
Copy link
Author

js2264 commented Aug 21, 2023

Note: I added cooler to the requirements.txt... I hope this is ok?

@cmdoret
Copy link
Member

cmdoret commented Aug 23, 2023

Note: I added cooler to the requirements.txt... I hope this is ok?

It's currently listed in the extras_require (i.e. optional dependency) since at the time we did not use it as the main format.

I think it would make sense to make it a mandatory dependency, but then it should be removed from the extras_require.

Similarly, changing the default matfmt to cool sounds reasonable to me.

Copy link
Member

@cmdoret cmdoret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome improvements!! I spotted a mistake and left a few suggestions :)

@@ -743,8 +743,9 @@ class Pipeline(AbstractCommand):
pipeline [--aligner=bowtie2] [--centromeres=FILE] [--circular] [--distance-law]
[--duplicates] [--enzyme=5000] [--filter] [--force] [--mapping=normal]
[--matfmt=graal] [--no-cleanup] [--outdir=DIR] [--plot] [--prefix=PREFIX]
[--quality-min=30] [--read-len=INT] [--remove-centromeres=0] [--size=0]
[--start-stage=fastq] [--threads=1] [--tmpdir=DIR] --genome=FILE <input1> [<input2>]
[--binning=INT] [--zoomify] [--balancing=STR] [--quality-min=30]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're adding a "binning" argument, should we disallow integers in enzyme?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even remove enzyme, and only use it in cutsite. What do you think @ABignaud ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe even remove enzyme, and only use it in cutsite. What do you think @ABignaud ?

We most likely should keep enzyme, AFAIU the filtering depends on it.

If we're adding a "binning" argument, should we disallow integers in enzyme?

Also, I believe enzyme=100 is used for mnase/dnaseC, no? So probably best to still allow integers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok we'll keep allowing integers for enzyme for now

hicstuff/commands.py Outdated Show resolved Hide resolved
hicstuff/commands.py Outdated Show resolved Hide resolved
hicstuff/pipeline.py Outdated Show resolved Hide resolved
hicstuff/pipeline.py Outdated Show resolved Hide resolved
@js2264
Copy link
Author

js2264 commented Aug 23, 2023

@cmdoret thanks for your review, I've replied to your comments. Let's wait for @ABignaud input on enzyme

@js2264 js2264 merged commit 00f37df into koszullab:master Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants