Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RNAEditing Analysis #107

Open
kokyriakidis opened this issue Jan 17, 2022 · 17 comments
Open

RNAEditing Analysis #107

kokyriakidis opened this issue Jan 17, 2022 · 17 comments

Comments

@kokyriakidis
Copy link

kokyriakidis commented Jan 17, 2022

Hi @brianjohnhaas

Q1: I want to use ctat-mutations to find RNA Editing events. I have only RNA Seq data. Does your implementation find most of the variants around all genomic regions or does it have a preference for variants in specific regions of the genome (e.g. coding). I am telling this because my data have high intron mapping for some reason.

Q2: Should I use a boosting method for my use case? If yes, which one do you propose? I saw in your code that it removes RNAEditing column when it prepares for boosting. Does that mean that the variants that are annotated with RNAEDIT will not be in the final vcf file? And this vcf file will likely not contain other RNAEditing events because they will be filtered by the model? Do you think that boosting will likely remove potential RNAEditing events from the vcf because they will be considered as FPs?

Q3: Let's say I do not use boosting. Do you thing HC hard filters will likely remove potential RNAEditing sites from the vcf because they will be considered as FPs by the filters?

Q4: Is it in your plan to update REDIportal to V2?

Thanks!

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented Jan 18, 2022 via email

@kokyriakidis
Copy link
Author

Hi @brianjohnhaas

I get the following errors when I try to run the pipeline

[2022-01-19 05:51:12,10] [error] Failed to hash "/ctat_genome_lib_dir/ctat_mutation_lib/ref_annot.splice_adj.bed.gz": Cannot hash file /ctat_genome_lib_dir/ctat_mutation_lib/ref_annot.splice_adj.bed.gz because it can't be found

[2022-01-19 05:51:12,10] [error] d062b032:annotate_variants_wf.annotate_splice_distance:-1:1: Hash error (Cannot hash file /ctat_genome_lib_dir/ctat_mutation_lib/ref_annot.splice_adj.bed.gz because it can't be found), disabling call caching for this job.

There in no such file inside the GRCh38.mutation_lib_supplement.Jul272020. For that reason the workflow fails.

Could not localize /ctat_genome_lib_dir/ctat_mutation_lib/ref_annot.splice_adj.bed.gz -> /output/C4E/cromwell-executions/ctat_mutations/9c151408-da17-4155-afa9-0b83b9d81bbd/call-AnnotateVariants/annotate_variants_wf/d062b032-21b1-4df8-943a-03a20344967a/call-annotate_splice_distance/inputs/-1376602862/ref_annot.splice_adj.bed.gz:
	/ctat_genome_lib_dir/ctat_mutation_lib/ref_annot.splice_adj.bed.gz doesn't exist
	File not found /output/C4E/cromwell-executions/ctat_mutations/9c151408-da17-4155-afa9-0b83b9d81bbd/call-AnnotateVariants/annotate_variants_wf/d062b032-21b1-4df8-943a-03a20344967a/call-annotate_splice_distance/inputs/-1376602862/ref_annot.splice_adj.bed.gz -> /ctat_genome_lib_dir/ctat_mutation_lib/ref_annot.splice_adj.bed.gz
	File not found /ctat_genome_lib_dir/ctat_mutation_lib/ref_annot.splice_adj.bed.gz
	File not found /ctat_genome_lib_dir/ctat_mutation_lib/ref_annot.splice_adj.bed.gz
	at common.validation.Validation$ValidationTry$.toTry$extension1(Validation.scala:94)
	at common.validation.Validation$ValidationTry$.toTry$extension0(Validation.scala:90)
	at cromwell.backend.standard.StandardAsyncExecutionActor.instantiatedCommand(StandardAsyncExecutionActor.scala:668)
	... 35 more

[2022-01-19 05:51:47,86] [info] WorkflowManagerActor WorkflowActor-9c151408-da17-4155-afa9-0b83b9d81bbd is in a terminal state: WorkflowFailedState
[2022-01-19 05:52:10,91] [info] SingleWorkflowRunnerActor workflow finished with status 'Failed'.
[2022-01-19 05:52:15,35] [info] SingleWorkflowRunnerActor writing metadata to /tmp/tmpn8b6knbi.json
[2022-01-19 05:52:15,37] [info] Workflow polling stopped
[2022-01-19 05:52:15,39] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2022-01-19 05:52:15,39] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2022-01-19 05:52:15,39] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2022-01-19 05:52:15,39] [info] Aborting all running workflows.
[2022-01-19 05:52:15,39] [info] 0 workflows released by cromid-007ba5c
[2022-01-19 05:52:15,39] [info] JobExecutionTokenDispenser stopped
[2022-01-19 05:52:15,39] [info] WorkflowStoreActor stopped
[2022-01-19 05:52:15,40] [info] WorkflowLogCopyRouter stopped
[2022-01-19 05:52:15,40] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2022-01-19 05:52:15,40] [info] WorkflowManagerActor stopped
[2022-01-19 05:52:15,40] [info] WorkflowManagerActor All workflows finished
[2022-01-19 05:52:15,54] [info] Connection pools shut down
[2022-01-19 05:52:15,54] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2022-01-19 05:52:15,54] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2022-01-19 05:52:15,54] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2022-01-19 05:52:15,54] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2022-01-19 05:52:15,54] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2022-01-19 05:52:15,54] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2022-01-19 05:52:15,54] [info] SubWorkflowStoreActor stopped
[2022-01-19 05:52:15,54] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2022-01-19 05:52:15,54] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2022-01-19 05:52:15,54] [info] KvWriteActor Shutting down: 0 queued messages to process
[2022-01-19 05:52:15,54] [info] JobStoreActor stopped
[2022-01-19 05:52:15,54] [info] CallCacheWriteActor stopped
[2022-01-19 05:52:15,54] [info] IoProxy stopped
[2022-01-19 05:52:15,54] [info] ServiceRegistryActor stopped
[2022-01-19 05:52:15,55] [info] DockerHashActor stopped
[2022-01-19 05:52:15,58] [info] Database closed
[2022-01-19 05:52:15,58] [info] Stream materializer shut down
[2022-01-19 05:52:15,59] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2022-01-19 05:52:15,59] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2022-01-19 05:52:15,59] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2022-01-19 05:52:15,59] [info] Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
[2022-01-19 05:52:15,60] [info] WDL HTTP import resolver closed
Workflow 9c151408-da17-4155-afa9-0b83b9d81bbd transitioned to state Failed

@kokyriakidis
Copy link
Author

My bad! The integrations step did not run properly the first time!

@kokyriakidis
Copy link
Author

kokyriakidis commented Jan 20, 2022

Annotate BLAT ED takes several hours to complete (more than 5) and uses only one cpu core. I see that pblat is multithreaded. Shouldn't it use all specified cores?

@kokyriakidis kokyriakidis reopened this Jan 20, 2022
@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented Jan 20, 2022 via email

@kokyriakidis
Copy link
Author

Hmm... It starts to run using all cores and after an 1-2 hours it continues with only one core. Then, it takes several hours to finish.

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented Jan 20, 2022 via email

@kokyriakidis
Copy link
Author

It is probably your code:

+ echo '########### Annotate BLAT ED #############'
+ /usr/local/src/ctat-mutations/src/annotate_ED.py --input_vcf /output/AD3E/cromwell-executions/ctat_mutations/6162ec39-8de3-4584-889f-cac0f9a5d904/call-AnnotateVariants/annotate_variants_wf/d4df6ce4-e64b-43ab-a1d3-8f2ff9c302cd/call-annotate_blat_ED/inputs/1608256378/AD3E.splice_distance.vcf.gz --output_vcf AD3E.blat_ED.vcf --reference /output/AD3E/cromwell-executions/ctat_mutations/6162ec39-8de3-4584-889f-cac0f9a5d904/call-AnnotateVariants/annotate_variants_wf/d4df6ce4-e64b-43ab-a1d3-8f2ff9c302cd/call-annotate_blat_ED/inputs/817306935/ref_genome.fa --temp_dir /output/AD3E/cromwell-executions/ctat_mutations/6162ec39-8de3-4584-889f-cac0f9a5d904/call-AnnotateVariants/annotate_variants_wf/d4df6ce4-e64b-43ab-a1d3-8f2ff9c302cd/call-annotate_blat_ED/tmp.a535223e --threads 15
15:24:57 : INFO : 
################################
 Annotating VCF: Calculating ED 
################################

15:24:57 : INFO : Processing VCF Positions
15:25:03 : INFO : Running samtools faidx
15:25:07 : INFO : Running Blat
15:59:34 : INFO : Processing Output
16:00:25 : INFO : Creating ED features

It started 15:24 and it is now 19:22 and it is still running using 1 core.

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented Jan 20, 2022 via email

@kokyriakidis
Copy link
Author

Just for the record, it did 12h to complete. From Annotate BLAT ED to finish.

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented Jan 21, 2022 via email

@kokyriakidis
Copy link
Author

I was looking for something like this! Thanks so much for bringing it up!

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented Jan 21, 2022 via email

@kokyriakidis
Copy link
Author

Hi @brianjohnhaas !

What does ED=-1 means?

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented Jan 25, 2022 via email

@kokyriakidis
Copy link
Author

I see a lot of them but not a ton.

WHat about entropy? Can you please explain what it means and how can I evaluate it?

@brianjohnhaas
Copy link
Collaborator

brianjohnhaas commented Jan 27, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants