-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GoodCigarFilter + tests #380
Conversation
@vruano is on vacation. @lbergelson please review |
Done with my review. I'm not assigning it back to @akiezun in case @lbergelson wants to add. |
I think we should write most readable code for now and optimize with a profiler later. The regexp machinery in Java libraries definitely does a better job optimizing the state machine than a person can. |
Finished with my review back to @akiezun |
Please check on my general comments posted at the end of the second commit, I thought that they would be posted here as well. |
@vruano you're raising 3 issues:
For 1, I'm going to create a new ticket - it's not under discussion here. My first goal in this tickets is to port the existing filter. If we can improve the readability at the same time, that's great. Fixing any bugs or deficiencies of the algorithm is strictly future work (if it has been broken all this time in GATK3, it's ok for it to be broken for a few more weeks in GATK4). |
@eitanbanks and @vdauwera I heard from @droazen that you guys are working on the BadCigarFilter in GATK3. Can you summarize the work? we need to synchronize the changes/improvements across the 2 codebases. |
BadCigarFilter was made a default Walker filter and functionality was added to it so that it now also filters out reads whose cigar length does not equal the read length. |
is the code in? On Tue, Apr 21, 2015 at 4:51 PM, Eric Banks [email protected]
|
See https://github.com/broadinstitute/gsa-unstable/pull/930 The title is misleading -- see the last few comments in the conversation for a summary of the work done. |
why not rename the issue then? |
@eitanbanks thanks. the code and tests are now in hellbender too. |
@vruano can you give an example for "Also it does accept non empty CIGAR with out any operator that consume read bases such as N or P. That would not happen with the GATK3 code." ? |
What I meant is that for example "10P" or "101N" would fail in the GATK3 code but the would not in the current pull-request. The reason is that the P and N operators return false for the method "consumesReadBases()" and the GATK3 code pays attention to that; it requires that at least there is an operator that consumes read bases in the non-clipping core section of the cigar (I,M,EQ or X). |
alright @vruano @droazen |
"@vruano to clarify. both 10P and 101N pass on GATK3. I just checked." I see... it seems that this might be cause by yet another bug in GATK3 code ... but I guess that is to be left for the other issue. From this code within the second loop: if (!hasMeaningfulElements && op.consumesReadBases()) {
hasMeaningfulElements = true;
} one would think that to have a meaningful-element is to have one that consumes read bases. However it turns out that this if condition is never true as: boolean hasMeaningfulElements = (firstOp != CigarOperator.H && firstOp != CigarOperator.S); is guaranteed to initialize it to true. So that is some code you can get rid off for now. But I guess that it need to be fixed eventually. |
Does anyone know why there are 3 implementation of validity for Cigars in GATK3?
|
the third one is in htsjdk obviously. anyway, i'm merging the first two |
I clarified the semantics of Good CIGAR. It does not make sense to have 'good cigars' that are invalid according to htsjdk so I clarified the semantics to state that a "good" cigar is a valid cigar that passes additional criteria:
(now, we can argue about those criteria but those are the criteria in GATK3 and the current code and doc makes this explicit.) I rewrote the whole code again to reflect this change. @vruano back to you. |
The code is far easier to understand now. Mostly minor changes and a suggestion to improve the readability of one of the supporting private methods. CigarUtils.countRefBasesBasedOnCigar have some issues but these are not really part of your code so it might be fair to address them separately in another pull-request. Back to @akiezun. |
This filter did not deserve a special file after I moved the isGood method to CigarUtils. So I moved this filter to ReadFilterLibrary and moved tests accordingly. back to @vruano |
@akiezun please rebase, squash and merge. |
3589239
to
175fd37
Compare
ported BadCigarFilter (in Hellbender filters have positive names, in concordance with Java filter semantics) + tests (added 2 tests to cover 2 more branches, 1 branch seems unreachable).
@vruano please review
addresses #373