Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue with preparing region features #1

Open
husamia opened this issue Feb 6, 2021 · 8 comments
Open

issue with preparing region features #1

husamia opened this issue Feb 6, 2021 · 8 comments

Comments

@husamia
Copy link

husamia commented Feb 6, 2021

I am getting error about the bed file. The instructions don't specific how to generate it.

python scotch/scotch.py prepare-region-features --beds_dir=beds/ --all_rfs_dir=scotch-data-grch37/ --output_trim_rfs_dir=trim_rfs/
Traceback (most recent call last):
File "scotch/scotch.py", line 423, in
COMMANDScommand
File "scotch/scotch.py", line 128, in prepare_region_features
assert bed_file.is_file(), f"beds_dir must contain a file for {chrom}, {bed_file}"
AssertionError: beds_dir must contain a file for 1, beds/1.bed

@husamia
Copy link
Author

husamia commented Feb 7, 2021

Also it seems the region feature data are not correct.
gzip: scotch-data-grch37/20.rfs.gz: not in gzip format

@iamh2o
Copy link

iamh2o commented Mar 7, 2021

I've gotten as far as needing to use the genome annotations you provide, but for both GRCH37 and 38, the CHR.rfs.gz files are:
a) Txt files, not gzipped.
b) Contain no annotations. All files, for both builds contain the following:
---->more 6.rfs.gz

version https://git-lfs.github.com/spec/v1
oid sha256:8d512a5c2a7a9f11947f67c621720c1bba89349ff51f95450b06aa922a5a0339
size 763424410

I'm blocked from using your caller at this point as it is not clear from the documentation what format these RFS files need to take or I'd try to calculate them myself. Could you please either post instructions on this file format and/or please fix the data repos?

Thank You-
John Major

@iamh2o
Copy link

iamh2o commented Mar 7, 2021

For issue #1, it worked for me when I create a directory of bed files, one per chrm named 1.bed, 2.bed.... Each with the single entry of CHR START END for the entirre chrm
If you have a new line at the end of the file, you'll surface another bug latter one that I fixed with this:

////getFeatures-getReadFeatures.py

 for region in csv.reader(b, delimiter="\t"):
        #add 1 since BED is 0-based                                                                             
        if len(region) == 0:
            print('WARNING', bed, region, 'has length of zero, could be <EOF> or a bug if there is more than 1 \
warning...')
        else:
            regions.append([region[0], int(region[1]) + 1, int(region[2]) + 1])

I believe bed spec requires a new line at the end of the file, that was failing the assertion check however, so i added the if/else block above.

@iamh2o
Copy link

iamh2o commented Mar 7, 2021

Another weird item.... compileFeatures.py had a first line "#!/ibin/bash" despite it being a python script. It only caused problems with emacs editing, but prob a small bug.

@iamh2o
Copy link

iamh2o commented Mar 7, 2021

In closing- I'm blocked b/c there is no rfs data available, and am really eager to have this caller in consideration for the clinical WGS product I'm developing. But time is short for the investigative phase I'm in now, so I hope this might be resolved soon.

Thank you--
John Major

@iamh2o
Copy link

iamh2o commented Mar 7, 2021

Ah! I think I sorted out the RFS file problem. You need to use git-lsf. This should be explained someplace in the repo.
To get it to work i installed git-lfs, then moved to the dir containing the RFS.gz files and executed git-lfs pull. Which is not complete, but seems to be pulling down annotation files.

@iamh2o
Copy link

iamh2o commented Mar 7, 2021

And specifically- this command needs to be run AFTER you fetch the RFS files
python /scotch/scotch.py prepare-region-features --beds_dir=/beds/ --all_rfs_dir=/scotch-data/ --output_trim_rfs_dir=/trim_rfs/

@elighlola
Copy link

hi iamh2o ,
i'm struggling with this error.
after i run this command python /home/elsh/scotch/scotch.py get-features-depth --project_dir=/home/elsh/ABC123/ --chrom=1 --beds_dir=/home/elsh/beds/ --fasta_ref=/home/elsh/refseq/hg19.fasta ( i just prefered to run the code for chr 1 then if all works fine i will use the boucle), i got this error awk: cmd. line:1: fatal: division by zero attempted
Done.
the output is 3 empty files : depth.feat.gz depth.feat.log depth.feat.stats.
if you got to this point and it worked well , it will be nice if you can help
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants