Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

branch for intron junctions work #56

Open
wants to merge 9 commits into
base: webdb_workflow
Choose a base branch
from

Conversation

jbrestel
Copy link
Member

isr and isrpm to results table

make geneintronjunction table outside of tuning job

load htseq raw counts

@kathryncrouch
Copy link
Member

Several commits to alter the directory structure in order to avoid ReFlow writing files into symlinked directories.

I am assuming here that we will have a directory structure like experimentName/final with the counts files etc inside it. For EBI artefacts, we get this by symlinking the final dir from manualDelivery. For output of the Nextflow workflow, we can make this directory structure when we copy the artefact back from the cluster. In the first case, final will be symlinked, so I'm trying to avoid writing any output back into it.

  • makeTpmFromHtseqCount.pl. Minor change to this script to take an outputDir as well as an inputDir. In workflow context, I suggest that if inputDir is $experimentName/final, outputDir should be $experimentName/TPM

  • RnaSeqAnalysisEbi - I had to jump through a few hoops with this one to make it work without altering the classes in CBIL. Part of this involved making a new subclass (RnaSeqCounts.tpm) to override a method in ProfileFromSeparateFiles.pm. Some additional minor changes in the script. Assuming a structure like $experimentName/final containing counts and $experimentName/TPM for tpm (see above), this will read the counts from the final dir and the tpm values from the tpm dir. The output and study config will be written in the parent $experimentName dir.

  • normalizeCoverageEbi.pl - minor change to get input from final.

To test:

  • Copy one of the EBI artefacts from manualDelivery
  • makeTpmFromHtseqCounts.pl --geneFootprintFile $geneFootprintFile --studyDir $experimentDir/final --outputDir $experimentDir/TPM --analysisConfig $experimentDir/final/analysisConfig.xml
  • doStudyAssayResults.pl --xml_file $experimentdir/final/analysisConfig.xml --mainDir $experimentDir --technologyType RNASeqEbi (NOTE assumes directories are named final and TPM, but the old implementation was also assuming directory names)
  • normalizeCoverageEbi.pl --inputDir $experimentDir --topLevelSeqSizeFile topLevelSeqSizes.txt --analysisConfig $experimentDir/final/analysisConfig.xml

If this is ok to merge, we should close this branch unless John has more intron junctions changes.

@kathryncrouch kathryncrouch requested a review from sybah2 November 7, 2024 17:53
@sybah2
Copy link
Member

sybah2 commented Nov 8, 2024

I ran all the scripts successfully using the EBI results and all work fine producing the right output and folder structure.

Only issues which I need addressing is the usage output. This is what the usage saysrnaseqMerge.pl --dir=s --organism_abbrev=s --outdir=s --chromSize=s

while the GetOptions has these options: 'dir=s experimentName=s chromSize=s analysisConfig=s'.

This could be misleading. I had to check the script to find the arguments.

We also need to update the usage for the makeTpmFromHtseqCounts.pl to add the --outputDir

Beside that unless John has other objection the pull could be merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants