-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RNAEditing Analysis #107
Comments
hi,
responses below
On Mon, Jan 17, 2022 at 3:06 AM Konstantinos Kyriakidis < ***@***.***> wrote:
Hi @brianjohnhaas <https://github.com/brianjohnhaas>
Q1: I want to use ctat-mutations to find RNA Editing events. I have *only*
RNA Seq data. Does your implementation find most of the variants around all
genomic regions or does it have a preference for variants in specific
regions of the genome (e.g. coding). I am telling this because my data have
high intron mapping for some reason.
it doesn't restrict to specific genomic regions
Q2: Should I use a boosting method for my use case? If yes, which one do
you propose?
The default 'xgboost' tends to work the best.
Q3: Is it in your plan to update REDIportal to V2?
we will plan to update rediportal later this year for the next release. It
requires that we redo our benchmarking and testing.
best of luck!
… Thanks!
—
Reply to this email directly, view it on GitHub
<#107>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX76JRBTFSSZTRQ46IDUWPEXNANCNFSM5MD76ALA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
I get the following errors when I try to run the pipeline
There in no such file inside the
|
My bad! The integrations step did not run properly the first time! |
|
hi,
Can you see the pblat command that's running? ie. psauxww | grep pblat
and does it have threads set to 1 ?
The workflow is supposed to propagate the CPU parameter setting to the
command here. I'm curious if it's not happening.
best,
~b
…On Thu, Jan 20, 2022 at 5:09 AM Konstantinos Kyriakidis < ***@***.***> wrote:
Reopened #107 <#107>.
—
Reply to this email directly, view it on GitHub
<#107 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX2UXENOOWSGEIYUJDDUW7NPJANCNFSM5MD76ALA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hmm... It starts to run using all cores and after an 1-2 hours it continues with only one core. Then, it takes several hours to finish. |
ok, I'm wondering if it's the pblat step that's to blame here, or if it's
our code that analyzes the results of pblat. I'll have to look into it
some more. Most of that step should be using multiple threads. It is time
consuming though - one of the slowest steps in the whole pipeline.
…On Thu, Jan 20, 2022 at 11:44 AM Konstantinos Kyriakidis < ***@***.***> wrote:
Hmm... It starts to run using all cores and after an 1-2 hours it
continues with only one core. Then, it takes several hours to finish.
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX4JEV5VCK5K6ASRQ7LUXA3XZANCNFSM5MD76ALA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
It is probably your code:
It started 15:24 and it is now 19:22 and it is still running using 1 core. |
We had done some testing on this step to ensure it was using multiple
threads and once upon a time it did scale. Leave this open and we'll
revisit it before the next release.
…On Thu, Jan 20, 2022 at 12:22 PM Konstantinos Kyriakidis < ***@***.***> wrote:
It is probably your code:
+ echo '########### Annotate BLAT ED #############'
+ /usr/local/src/ctat-mutations/src/annotate_ED.py --input_vcf /output/AD3E/cromwell-executions/ctat_mutations/6162ec39-8de3-4584-889f-cac0f9a5d904/call-AnnotateVariants/annotate_variants_wf/d4df6ce4-e64b-43ab-a1d3-8f2ff9c302cd/call-annotate_blat_ED/inputs/1608256378/AD3E.splice_distance.vcf.gz --output_vcf AD3E.blat_ED.vcf --reference /output/AD3E/cromwell-executions/ctat_mutations/6162ec39-8de3-4584-889f-cac0f9a5d904/call-AnnotateVariants/annotate_variants_wf/d4df6ce4-e64b-43ab-a1d3-8f2ff9c302cd/call-annotate_blat_ED/inputs/817306935/ref_genome.fa --temp_dir /output/AD3E/cromwell-executions/ctat_mutations/6162ec39-8de3-4584-889f-cac0f9a5d904/call-AnnotateVariants/annotate_variants_wf/d4df6ce4-e64b-43ab-a1d3-8f2ff9c302cd/call-annotate_blat_ED/tmp.a535223e --threads 15
15:24:57 : INFO :
################################
Annotating VCF: Calculating ED
################################
15:24:57 : INFO : Processing VCF Positions
15:25:03 : INFO : Running samtools faidx
15:25:07 : INFO : Running Blat
15:59:34 : INFO : Processing Output
16:00:25 : INFO : Creating ED features
It started 15:24 and it is now 19:22 and it is still running using 1 core.
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX45WE2YWIHOOU3EQFDUXBAHFANCNFSM5MD76ALA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Just for the record, it did 12h to complete. From |
in the meantime, if you don't need boosting (can set boosting method =
'none'), then you don't necessarily require the ED annotation. You can turn
off the annotations that you don't in the main wdl under WDL/
ctat_mutations.wdl
# annotation options
Boolean incl_snpEff = true
Boolean incl_dbsnp = true
Boolean incl_gnomad = true
Boolean incl_rna_editing = true
Boolean include_read_var_pos_annotations = true
Boolean incl_repeats = true
Boolean incl_homopolymers = true
Boolean incl_splice_dist = true
Boolean incl_blat_ED = true
Boolean incl_cosmic = true
Boolean incl_cravat = true
…On Fri, Jan 21, 2022 at 2:41 AM Konstantinos Kyriakidis < ***@***.***> wrote:
Just for the record, it did 12h to complete. From Annotate BLAT ED to
finish.
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKXYFRC7WBQ3SICLHUSLUXEEYZANCNFSM5MD76ALA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
I was looking for something like this! Thanks so much for bringing it up! |
sure thing!
…On Fri, Jan 21, 2022 at 9:37 AM Konstantinos Kyriakidis < ***@***.***> wrote:
I was looking for something like this! Thanks so much for bringing it up!
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX4U5OTIOOCNTZDRDSTUXFVRFANCNFSM5MD76ALA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi @brianjohnhaas ! What does ED=-1 means? |
Pretty sure this means it's overly repetitive - it's the default in case
blat doesn't return hits, which would mean that it has more hits than are
within the range of reporting. @maxwell Brown ***@***.***> ,
can you confirm?
Are you seeing a ton of these?
…On Tue, Jan 25, 2022 at 2:43 PM Konstantinos Kyriakidis < ***@***.***> wrote:
Hi @brianjohnhaas <https://github.com/brianjohnhaas> !
What does ED=-1 means?
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX6UVLRR2QYT26EWKQDUX34N5ANCNFSM5MD76ALA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
I see a lot of them but not a ton. WHat about |
entropy is just a measure of sequence complexity. If it's close to 2.0,
then it's highly complex. If it's closer to 1 or below, then it's low
complexity (like a polyA run).
We use all these annotations for boosting prediction accuracy with the
downstream machine learning steps (ie. XGBoost).
…On Thu, Jan 27, 2022 at 12:22 PM Konstantinos Kyriakidis < ***@***.***> wrote:
I see a lot of them but not a ton.
WHat about entropy? Can you please explain what it means and how can I
evaluate it?
—
Reply to this email directly, view it on GitHub
<#107 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZRKX6EZUJJAXBH2FUL63DUYF5O5ANCNFSM5MD76ALA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas <http://broad.mit.edu/~bhaas>
|
Hi @brianjohnhaas
Q1: I want to use ctat-mutations to find RNA Editing events. I have only RNA Seq data. Does your implementation find most of the variants around all genomic regions or does it have a preference for variants in specific regions of the genome (e.g. coding). I am telling this because my data have high intron mapping for some reason.
Q2: Should I use a boosting method for my use case? If yes, which one do you propose? I saw in your code that it removes RNAEditing column when it prepares for boosting. Does that mean that the variants that are annotated with RNAEDIT will not be in the final vcf file? And this vcf file will likely not contain other RNAEditing events because they will be filtered by the model? Do you think that boosting will likely remove potential RNAEditing events from the vcf because they will be considered as FPs?
Q3: Let's say I do not use boosting. Do you thing HC hard filters will likely remove potential RNAEditing sites from the vcf because they will be considered as FPs by the filters?
Q4: Is it in your plan to update REDIportal to V2?
Thanks!
The text was updated successfully, but these errors were encountered: