Improvements and refactoring of Nucleotide.java #4846

vruano · 2018-06-01T16:37:29Z

Change method names valueOf and toBase for decode and encodeAsByte. Reason is that valueOf (taking an string for example) has a bit of a meaning in Java Enums (e.g. it should result in a NotSuchElementException if such a constant does not exist and we don't do that here (returns Nucleotide#X).

Apart from that I added a few more methods to cover for functionality that I need as part of a larger pull-request for SV.

vruano · 2018-06-01T16:37:45Z

@samuelklee please take a look.

vruano · 2018-06-01T20:48:00Z

There were travis issue solvable by rebasing (automatic merge didn't compile). Should work now.

codecov-io · 2018-06-01T21:40:48Z

Codecov Report

Merging #4846 into master will increase coverage by 0.023%.
The diff coverage is 94.318%.

@@               Coverage Diff               @@
##              master     #4846       +/-   ##
===============================================
+ Coverage     86.661%   86.684%   +0.023%     
- Complexity     29043     29179      +136     
===============================================
  Files           1808      1808               
  Lines         134662    135021      +359     
  Branches       14935     14986       +51     
===============================================
+ Hits          116700    117042      +342     
- Misses         12550     12563       +13     
- Partials        5412      5416        +4

Impacted Files	Coverage Δ	Complexity Δ
...llbender/utils/reference/FastaReferenceWriter.java	`92.414% <0%> (ø)`	`51 <0> (ø)`	⬇️
...ers/readorientation/LearnReadOrientationModel.java	`96% <100%> (ø)`	`30 <0> (ø)`	⬇️
...org/broadinstitute/hellbender/utils/RandomDNA.java	`95.556% <100%> (ø)`	`23 <0> (ø)`	⬇️
...er/formats/collections/AllelicCountCollection.java	`100% <100%> (ø)`	`5 <0> (ø)`	⬇️
...lbender/tools/copynumber/CollectAllelicCounts.java	`92% <100%> (ø)`	`9 <0> (ø)`	⬇️
...dinstitute/hellbender/utils/RandomDNAUnitTest.java	`93.151% <100%> (ø)`	`37 <0> (ø)`	⬇️
...llbender/tools/copynumber/PreprocessIntervals.java	`100% <100%> (ø)`	`19 <2> (ø)`	⬇️
...adorientation/LearnReadOrientationModelEngine.java	`94.872% <100%> (ø)`	`37 <0> (ø)`	⬇️
...ols/walkers/readorientation/CollectF1R2Counts.java	`75.641% <100%> (ø)`	`29 <6> (ø)`	⬇️
...s/walkers/readorientation/F1R2FilterConstants.java	`83.333% <100%> (ø)`	`4 <0> (ø)`	⬇️
... and 5 more

samuelklee

Just some typos and minor comments. I did not check closely for correctness, but your tests look OK to me.

samuelklee · 2018-06-05T17:51:28Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+
+    // Extended codes:
+    // CODE(included nucs)
+    R(A,G), // Purine


Nitpicking, but perhaps make the descriptions here a little more consistent in style? Currently, some are plural, some end in periods, etc.

samuelklee · 2018-06-05T17:52:36Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+    }
+
+    /**
+     * Checks whether the nucleotide refer to an ambiguous base.


refer -> refers

samuelklee · 2018-06-05T17:53:05Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+    /**
+     * Checks whether this nucleotide code encloses all possible nucleotides for another code.
+     * @param other the other nucleotide to compare to.
+     * @return {@code true} iff any nucleotide in {@code other} is enclosed it this code.


samuelklee · 2018-06-05T17:53:22Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+    }
+
+    /**
+     * Checks whether to base encodings make reference to the same {@link #Nucleotide}


Still need to fix this.

Oops, I missed that ok.

samuelklee · 2018-06-05T17:54:00Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+     *     each possible nucleotide in this code.
+     * </p>
+     * <p>
+     *     The complement of the {@link #INVALID} nucleotide its itself.


samuelklee · 2018-06-05T17:54:30Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+    }
+
+    /**
+     * Returns the instance that would include all possible transition mutation from this one.


mutation -> mutations

(or perhaps: transition mutation -> transitions)

samuelklee · 2018-06-05T17:55:52Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+    }
+
+    /**
+     * Returns the instance that would include all possible tranversions mutation from this one.


similar here, transversions mutation -> transversion mutations (or transversions)

samuelklee · 2018-06-05T17:56:28Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

        public long sum() {
            return LongStream.of(counts).sum();
        }
    }
+


white space

samuelklee · 2018-06-05T17:58:02Z

src/test/java/org/broadinstitute/hellbender/utils/NucleotideUnitTest.java

-        Assert.assertEquals(Nucleotide.X.toBase(), (byte)'X');
+    @Test(dataProvider = "values")
+    public void testEncodeAsByte(final Nucleotide nuc) {
+        // Will always use the first letter of the constant as the one byt


one byt -> ?

samuelklee · 2018-06-05T17:58:41Z

src/test/java/org/broadinstitute/hellbender/utils/NucleotideUnitTest.java

            }
-            Assert.assertSame(Nucleotide.valueOf(i), expected, "Failed with base " + i + " returning nucleotide " + Nucleotide.valueOf(i));
+        }
+


white space

magicDGS · 2018-07-18T15:20:55Z

Hello @vruano - the code that you have here is quite nice, and I would like to include some bits on the next version of htsjdk (htsjdk-next-beta). Do you have any problem with that?

vruano · 2018-08-18T06:17:56Z

@samuelklee
Sorry for the delay, I had addressed your comments plus some conflict resolutions from a rebase.
Please let me know if you want me to change anything else.

vruano · 2018-08-18T06:19:21Z

@magicDGS sure, anything for the common good.

samuelklee · 2018-08-21T18:25:46Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+            final int lowerCaseIndex = nucleotide.lowerCaseByteEncoding & 0xFF;
+            final int upperCaseIndex = nucleotide.upperCaseByteEncoding & 0xFF;
+            maskToValue[nucleotide.mask]
+                    = baseToValue[lowerCaseIndex]


Not sure if multiple assignment here and below adheres to the Java style guide.

I quick google does not show anything against multiple assigns, perhaps you can post a link to it.

I will break into two lines the assignations to maskToValue from the other two to baseToValue as they go into different arrays, is perhaps more clear that way.

samuelklee · 2018-08-21T18:26:04Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

    }

    private final int mask;
    private final boolean isStandard;
+
+    // Some properties initialized after construction as these depends on some static arrays.


depends -> depend

samuelklee · 2018-08-21T18:29:43Z

A couple more minor comments. I will take your word that your changes improve performance!

vruano · 2018-08-21T21:58:32Z

@samuelklee
I addressed your comments + improved docs + added support for char and String encodings. Won't add anything else after your final review, promissed : P

samuelklee

Just a few more minor suggestions, which you can take or leave. Good to merge after!

samuelklee · 2018-08-27T11:34:44Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java


 /**
 * Represents the nucleotide alphabet with support for IUPAC ambiguity codes.
 *
 * <p>
 *    This enumeration not only contains standard (non-ambiguous) nucleotides, but also
- *    values to represent ambiguous and invalid codes.
+ *    values to represent ambiguous and an invalid nucleotide call ({@link #X} aka {@link #INVALID}.


Perhaps "...but also contains ambiguous nucleotides, as well as a code {@link #X} (a.k.a. {@link #INVALID}) for invalid nucleotide calls."

samuelklee · 2018-08-27T11:35:38Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

 * </p>
 *
+ * <p>
+ *     You can query whether a value refers to a non-ambiguous nucleotide with {@link #isStandard()} or
+ *     {@link #isAmbiguous()} depending of your preference. Notice that the special value {@link #X}


X is grouped together with the ambiguous nucleotides below in line 95. Perhaps add a line break to avoid confusion.

samuelklee · 2018-08-27T11:37:19Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+ * <p>
+ *     Querying the {@link #X} value for its {@link #complement}, {@link #transition} or
+ *     {@link #transversion} or using it in other operations
+ *     likes {@link #intersect} would result in returning also a {@link #X}; similar to {@link Double#NaN} in


likes intersect would result in returning also a X -> like (or perhaps "such as") intersect will return an X

samuelklee · 2018-08-27T11:38:41Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+ *     Finally, notice that there is no code of the "gap nucleotide" that may appear in aligned sequences as in fact that is not a nucleotide.
+ *     A base encoding using the typical gap representation such as '.' or '-' would
+ *     be interpreted as an {@link #INVALID} (i.e. {@link #X}) call which is probably not what you want.
+ *     So code that mus support those will need to do so outside this {@code enum}.


So code that mus support - > So code to support?

BTW, thanks for adding all of this documentation!

samuelklee · 2018-08-27T11:38:52Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+ *     convenient longer form names constant aliases (e.g. {@link #ADENINE} for {@link #A}, {@link #PURINE} for {@link #R}, etc.).
+ * </p>
+ * <p>
+ *     Finally, notice that there is no code of the "gap nucleotide" that may appear in aligned sequences as in fact that is not a nucleotide.


code of -> code for

samuelklee · 2018-08-27T11:46:02Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+
+    /**
+     * Returns this nucleotide's exclusive upper-case {@code byte} encoding.
+     * @return <i>ditto</i>.


I think you can probably just leave out the @return lines here and below.

samuelklee · 2018-08-27T11:48:02Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

-    public static Nucleotide decode(final char base) {
-        return baseToValue[base & 0xFF];
+    public static Nucleotide decode(final char ch) {
+        if ((ch & 0xFF00) != 0) {


Perhaps extract the operations used in this method.

samuelklee · 2018-08-27T11:49:06Z

src/main/java/org/broadinstitute/hellbender/utils/Nucleotide.java

+     * Transform a single-letter character string into the corresponding nucleotide.
+     * <p>
+     *    {@code Null}, empty or multi-letter input will result in an {@link IllegalArgumentException}.
+     *    These are not simply invalid encodings as the fact that are not a single character is


encodings as the fact that are not a single character -> encodings, and the fact that they are not a single character?

samuelklee · 2018-08-27T11:50:20Z

src/test/java/org/broadinstitute/hellbender/utils/NucleotideUnitTest.java

+
+    /**
+     * Checks the assumption that each nuc canonical name is a single upper-case letter.
+     * If this test fails that would  indicates that you are modifying {@link Nucleotide} in a way


that would indicates -> that would indicate

samuelklee · 2018-08-27T11:50:58Z

src/test/java/org/broadinstitute/hellbender/utils/NucleotideUnitTest.java

+            Assert.assertEquals(subject.get(n), (long) shadow.getOrDefault(n, 0));
+        }
+        Assert.assertEquals(subject.sum(), shadow.values().stream().mapToLong(l -> l).sum());
+    }


Thanks for adding these tests!

…hars and CharSequences). Also the Counter has now methods to add `char` and String typed encodings. I have added test for these. I fixed a bug where the upper byte of `char` encodings was ignored (high-ordinal chars should be considered INVALID) The counter sum implementation is now more efficient avoding the use of streams. Improved documentation

vruano assigned samuelklee Jun 1, 2018

vruano requested a review from samuelklee June 1, 2018 16:37

vruano force-pushed the vrr_nucs branch from ec7029d to 3d7a36c Compare June 1, 2018 20:45

samuelklee requested changes Jun 5, 2018

View reviewed changes

samuelklee assigned vruano and unassigned samuelklee Jun 20, 2018

vruano force-pushed the vrr_nucs branch from 3d7a36c to 139f0b7 Compare August 18, 2018 06:15

vruano assigned samuelklee and unassigned vruano Aug 18, 2018

vruano force-pushed the vrr_nucs branch 2 times, most recently from b6636e8 to 6cf0639 Compare August 21, 2018 01:15

samuelklee reviewed Aug 21, 2018

View reviewed changes

vruano force-pushed the vrr_nucs branch 3 times, most recently from c5d5284 to 45eab69 Compare August 21, 2018 20:59

vruano force-pushed the vrr_nucs branch from 042b50b to b8f982e Compare August 21, 2018 22:23

samuelklee approved these changes Aug 27, 2018

View reviewed changes

vruano added 4 commits August 27, 2018 11:09

Improvements and refactoring of Nucleotide.java

977ba8a

Some corrections and performance improvements (not benchmarked)

7db79b5

Final set of corrections (hopefully).

e72732d

vruano force-pushed the vrr_nucs branch from b8f982e to e72732d Compare August 27, 2018 15:09

vruano merged commit 9271542 into master Aug 27, 2018

vruano deleted the vrr_nucs branch August 27, 2018 19:34

Improvements and refactoring of Nucleotide.java #4846

Improvements and refactoring of Nucleotide.java #4846

Conversation

vruano commented Jun 1, 2018

vruano commented Jun 1, 2018

vruano commented Jun 1, 2018 • edited Loading

codecov-io commented Jun 1, 2018 • edited Loading

Codecov Report

samuelklee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

magicDGS commented Jul 18, 2018

vruano commented Aug 18, 2018

vruano commented Aug 18, 2018

samuelklee Aug 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

samuelklee commented Aug 21, 2018

vruano commented Aug 21, 2018

samuelklee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vruano commented Jun 1, 2018 •

edited

Loading

codecov-io commented Jun 1, 2018 •

edited

Loading

samuelklee Aug 21, 2018 •

edited

Loading