Skip to content

Commit

Permalink
Merge pull request nf-core#1127 from MatthiasZepper/FilterGTF_enlage_…
Browse files Browse the repository at this point in the history
…sample

Enlarge the sampling range for column determination in FilterGTF script.
  • Loading branch information
pinin4fjords authored Nov 20, 2023
2 parents da99418 + 9250c4c commit 76464e4
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Enhancements and fixes

- [PR #1126](https://github.com/nf-core/rnaseq/pull/1126) - Fixes error when transcript_fasta not provided and skip_gtf_filter set to true
- [#1125](https://github.com/nf-core/rnaseq/issues/1125) - Pipeline fails if transcript_fasta not provided and skip_gtf_filter = true
- [[PR #1126](https://github.com/nf-core/rnaseq/pull/1126)] [[#1125](https://github.com/nf-core/rnaseq/issues/1125)] - Pipeline fails if transcript_fasta not provided and `skip_gtf_filter = true`.
- [[PR #1127](https://github.com/nf-core/rnaseq/pull/1127)] - Enlarge sampling to determine the number of columns in `filter_gtf.py` script.

## [[3.13.1](https://github.com/nf-core/rnaseq/releases/tag/3.13.1)] - 2023-11-17

Expand Down
4 changes: 2 additions & 2 deletions bin/filter_gtf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,14 @@ def extract_fasta_seq_names(fasta_name: str) -> Set[str]:
def tab_delimited(file: str) -> float:
"""Check if file is tab-delimited and return median number of tabs."""
with open(file, "r") as f:
data = f.read(1024)
data = f.read(102400)
return statistics.median(line.count("\t") for line in data.split("\n"))


def filter_gtf(fasta: str, gtf_in: str, filtered_gtf_out: str, skip_transcript_id_check: bool) -> None:
"""Filter GTF file based on FASTA sequence names."""
if tab_delimited(gtf_in) != 8:
raise ValueError("Invalid GTF file: Expected 8 tab-separated columns.")
raise ValueError("Invalid GTF file: Expected nine tab-separated columns.")

seq_names_in_genome = extract_fasta_seq_names(fasta)
logger.info(f"Extracted chromosome sequence names from {fasta}")
Expand Down

0 comments on commit 76464e4

Please sign in to comment.