Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify Alignment Job and Resolve SAM Flag Issue in CRAM to FASTQ Conversion #802

Draft
wants to merge 90 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
794651e
Creating separate branch for simplification of Align job as well as r…
michael-harper Jun 24, 2024
acc179d
Implementing Picard extraction to compare times
michael-harper Jun 27, 2024
415ab15
Fixing picard command
michael-harper Jun 27, 2024
fbf19fe
Fixing picard command
michael-harper Jun 27, 2024
51258a6
Fixing picard command
michael-harper Jun 27, 2024
a57e896
Saving to dataset tmp path so collate and picard extract jobs use the…
michael-harper Jun 27, 2024
24deaae
Attempting to fix reading collated bam
michael-harper Jun 27, 2024
bbf55c3
Attempting to fix reading collated bam
michael-harper Jun 27, 2024
e73a6a7
Fixing collate command
michael-harper Jun 27, 2024
0f65275
Attempting to fix intermediate collate.bam filenotfound error
michael-harper Jun 27, 2024
4783d55
Attempting to fix intermediate collate.bam filenotfound error
michael-harper Jun 27, 2024
18a85dd
Attempting to fix intermediate collate.bam filenotfound error
michael-harper Jun 27, 2024
1545a5f
Fixed bash command. Tidying up unused code
michael-harper Jun 27, 2024
7b1890f
For testing purposes, changing where realigned crams are saved to dif…
michael-harper Jul 2, 2024
1cea2b8
Fixing config retrieval
michael-harper Jul 2, 2024
74201da
Fixing paths
michael-harper Jul 3, 2024
7981d48
Fixing paths
michael-harper Jul 3, 2024
8da0cff
Removing forced path output for fewgenomes testing
michael-harper Jul 4, 2024
2406cde
adding back path
michael-harper Jul 4, 2024
b0cb4e9
Adding some logging
michael-harper Jul 5, 2024
c245452
Subsetting cram to chr21 for speed
michael-harper Jul 5, 2024
2a218b4
Fixing input group reading in subset_cram
michael-harper Jul 5, 2024
9cc7b81
Fixing input group reading in subset_cram
michael-harper Jul 5, 2024
c51f2e7
AttributeError
michael-harper Jul 5, 2024
cdc3c96
AttributeError. Returning the job
michael-harper Jul 5, 2024
006cef5
Fixing job attribute extension
michael-harper Jul 5, 2024
b178bce
Typing
michael-harper Jul 5, 2024
3af9d65
Fixing resource grouping
michael-harper Jul 5, 2024
83b2f91
Fixing resource grouping
michael-harper Jul 5, 2024
8d8fbc8
Fixing resource grouping
michael-harper Jul 5, 2024
b807272
Fixing resource grouping
michael-harper Jul 5, 2024
7476d18
Fixing resource grouping
michael-harper Jul 5, 2024
4e31e02
Fixing resource grouping
michael-harper Jul 5, 2024
ef89f12
Fixing resource grouping
michael-harper Jul 5, 2024
6eb6647
Fixing resource grouping
michael-harper Jul 5, 2024
176b2ee
Fixing resource grouping
michael-harper Jul 7, 2024
4022ec6
Fixing resource grouping
michael-harper Jul 8, 2024
fbc69b7
Fixing resource grouping
michael-harper Jul 8, 2024
6f3a34a
Fixing resource grouping
michael-harper Jul 8, 2024
98b101b
Fixing resource grouping
michael-harper Jul 8, 2024
bb374ed
Fixing subset_cram command
michael-harper Jul 8, 2024
8dceb52
Fixing subset_cram command
michael-harper Jul 8, 2024
6b61ef7
Fixing alignment input command
michael-harper Jul 8, 2024
ebd1b27
Fixing subset_cram command
michael-harper Jul 8, 2024
b4dcfed
Fixing subset_cram command
michael-harper Jul 8, 2024
3378f30
Fixing subset_cram command
michael-harper Jul 8, 2024
d4f20ac
Fixing subset_cram command
michael-harper Jul 8, 2024
889b09e
Fixing subset_cram command
michael-harper Jul 8, 2024
81f025e
Fixing subset_cram command
michael-harper Jul 8, 2024
0db2427
Fixing subset_cram command
michael-harper Jul 8, 2024
3142741
adding extra logging
michael-harper Jul 8, 2024
971fe72
for testing fewgenomes, removing the subset_cram_j job
michael-harper Jul 8, 2024
1d61ea3
for testing fewgenomes, removing the output destination change
michael-harper Jul 8, 2024
3a1c163
Changing samtools fastq command to outputing read1 and read2 issues a…
michael-harper Jul 9, 2024
7ea39f1
Adding back subsetting ability
michael-harper Jul 9, 2024
4e1480c
changing output to nagim title for clarity
michael-harper Jul 9, 2024
24998a0
Changin back to discarding reads where the READ1 and READ2 FLAG bits …
michael-harper Jul 9, 2024
62d7a60
minor change
michael-harper Jul 9, 2024
4d0e058
attempting to align from nagim cram
michael-harper Jul 9, 2024
898b376
nagim cram edits
michael-harper Jul 9, 2024
cd12408
Ensuring correct reference assembly during subset
michael-harper Jul 9, 2024
0e2909a
Ensuring correct reference assembly during fastq extraction
michael-harper Jul 9, 2024
7185a0e
logging
michael-harper Jul 9, 2024
18abb2d
pointing to nagim index path
michael-harper Jul 9, 2024
687c4c9
pointing to sequencing_group_id
michael-harper Jul 9, 2024
4cc18b7
Changing to not discarding singletons and instead producing an interl…
michael-harper Jul 10, 2024
018661b
ensuring singletons are pushed to interleaved fastq
michael-harper Jul 10, 2024
a2ec786
Adding some logging
michael-harper Jul 10, 2024
9d2d50d
Fixing file referencing
michael-harper Jul 10, 2024
bfd926a
Fixing file referencing
michael-harper Jul 10, 2024
e1093da
Fixing dragen-os command to accept interleaved
michael-harper Jul 10, 2024
7ddf351
Fixing dragen-os command to accept interleaved
michael-harper Jul 10, 2024
3b02efe
Gzipping samtools fastq output
michael-harper Jul 10, 2024
1dc99d3
Checking for possibly -0 output reads
michael-harper Jul 16, 2024
a4f23d5
Checking for possibly -0 output reads
michael-harper Jul 16, 2024
e5512af
Not overwriting discarded.fq
michael-harper Jul 16, 2024
02515c9
Reverting back to discarding reads that meet -0 flag criteria in samt…
michael-harper Jul 16, 2024
7e8b9c3
Tidying up script: removing code used for testing
michael-harper Jul 17, 2024
bc0d25e
Fixing use_interleaved variable assignment
michael-harper Jul 17, 2024
ce5832d
More tidying up
michael-harper Jul 17, 2024
b5ba502
Removing reference to no longer supported aligners
michael-harper Jul 17, 2024
44d1223
Ensuring we get alignment input correctly when realigning from cram. …
michael-harper Jul 18, 2024
6fb5137
Improving error messaging for new realignment version
michael-harper Jul 18, 2024
d114f14
Tidying up
michael-harper Jul 18, 2024
80567f8
Fixing referencing to config parameters regarding realigning from cra…
michael-harper Jul 18, 2024
9f4715c
Refactoring aproach to realigning from cram. Bubbling up realignment …
michael-harper Jul 21, 2024
175092f
Changes to retrieving reliagnment input if cram
michael-harper Jul 23, 2024
cb6872d
Adding function to retrieve realignment options at the align stage th…
michael-harper Jul 23, 2024
b6c2d61
Adding 'align' as config key in the WorkflowConfig dataclass test fac…
michael-harper Jul 23, 2024
59a025c
Casting the reference assembly to a string as b.read_input cannot par…
michael-harper Jul 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions configs/defaults/large_cohort.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,24 @@ status_reporter = 'metamist'
# define reference fasta. If not specified here, the reference file is pulled from [references.broad][ref_fasta]
ref_fasta='gs://cpg-common-main/references/hg38/v0/dragen_reference/Homo_sapiens_assembly38_masked.fasta'
highmem_workers = true
# Realign CRAM when available, instead of using FASTQ.
# The parameter value should correspond to CRAM version
# (e.g. v0 in gs://cpg-fewgenomes-main/cram/v0/CPGaaa.cram
#realign_from_cram_version = 'v0'
# Realign CRAM instead of using FASTQ.
# The parameter value should correspond to CRAM version.
# For example, to realign from CRAM version 'v0', ensure the following configuration is set:
# ["workflow"]["align"]["realign_from_cram"]["version"] = "v0"
# E.g.
# [align.realign_from_cram]
# version = "v0"
# This will use the CRAM file located at gs://<dataset-prefix>/cram/v0/<SequencingGroupID>.cram
# and realign it using the reference specified in ["workflow"]["align"]["realign_from_cram"][<new_version_id>].
# A new CRAM file will be created at gs://<dataset-prefix>/cram/<new_version_id>/<SequencingGroupID>.cram after realignment.
# Ensure the correct reference is configured for the current version of the cram in ["workflow"]["align"]["realign_from_cram"]["cram_version_map"]
# as well as the new version.
# [align.realign_from_cram]
# version = 'v1'
# new_version = 'v2'
# [align.realign_from_cram.cram_version_map]
# v1 = 'gs://cpg-common-main/references/hg38/v0/dragen_reference/Homo_sapiens_assembly38_masked.fasta'
# v2 = 'gs://<path-to-new/updated/alternative-reference'

# Calling intervals (defauls to whole genome intervals)
#intervals_path =
Expand Down
Loading
Loading