-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a simple TSV/CSV/XSV writer with cloud write support #5930
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5930 +/- ##
==============================================
- Coverage 86.84% 86.823% -0.017%
- Complexity 32326 32344 +18
==============================================
Files 1991 1993 +2
Lines 149342 149460 +118
Branches 16482 16502 +20
==============================================
+ Hits 129689 129766 +77
- Misses 13646 13679 +33
- Partials 6007 6015 +8
|
a868db7
to
bc0a7c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you with my comments @jamesemery -- go ahead and merge after addressing them
* | ||
* Why didn't I use a tableWriter here? Who really holds the patent on the wheel anyway? Certainly not TableWriter. | ||
*/ | ||
class SimpleCSVWriterWrapperWithHeader implements Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class should be public
, and renamed to SimpleXSVWriter
import java.util.*; | ||
|
||
/** | ||
* A simple helper class wrapper around CSVWriter that has the ingrained concept of a header line with indexed fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change description to something like: "A simple TSV/CSV/XSV writer with support for writing to the cloud"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and mention that the delimiter is configurable)
* construct a new line call {@link #getNewLineBuilder} to get a line builder for each line, which then has convienent | ||
* methods for individually assigning column values based on the header line etc. Once a line is finished being mutated | ||
* one simply needs to call write() on the line to validate and finalize the line. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a couple lines of working example code here showing how to initialize the writer, set the header, and write an entire line by passing in all column values at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also describe the format of the header in the output file.
* methods for individually assigning column values based on the header line etc. Once a line is finished being mutated | ||
* one simply needs to call write() on the line to validate and finalize the line. | ||
* | ||
* Why didn't I use a tableWriter here? Who really holds the patent on the wheel anyway? Certainly not TableWriter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd replace this with a brief 1-2 sentence explanation of the deficiencies of TableWriter that motivated the creation of this class.
src/main/java/org/broadinstitute/hellbender/utils/tsv/SimpleCSVWriterWrapperWithHeader.java
Outdated
Show resolved
Hide resolved
/** | ||
* @param row complete line corresponding to this row of the tsv | ||
*/ | ||
public SimpleCSVWriterLineBuilder setRow(final String[] row) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also add an overload that takes List<String>
* @param row complete line corresponding to this row of the tsv | ||
*/ | ||
public SimpleCSVWriterLineBuilder setRow(final String[] row) { | ||
checkAlteration(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkAlteration()
-> checkAlterationAfterWrite()
|
||
public class SimpleCSVWriterWrapperWithHeaderUnitTest extends GATKBaseTest { | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate the cloud and local file test cases. Also add tests covering setRow()
, fill()
, the version of setColumn()
that takes an index, and the other methods you're not currently testing.
SimpleCSVWriterWrapperWithHeader.SimpleCSVWriterLineBuilder bucketLine = bucketWriter.getNewLineBuilder(); | ||
SimpleCSVWriterWrapperWithHeader.SimpleCSVWriterLineBuilder localLine = localWriter.getNewLineBuilder(); | ||
Arrays.stream(header).forEach(column -> { | ||
double rand = Math.random(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use random data -- write known data in a pattern, then assert that you get the right values back on read.
} | ||
|
||
bucketWriter.close(); | ||
localWriter.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Close using try-with-resources instead.
* methods for individually assigning column values based on the header line etc. Once a line is finished being mutated | ||
* one simply needs to call write() on the line to validate and finalize the line. | ||
* | ||
* Header lines are encoded in the same format as each row, a single row of delimeted column titles as the first row in the table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delimited
…rnative to TableWriter (broadinstitute#5930)
…rnative to TableWriter (broadinstitute#5930)
This is what you asked for @droazen. This is one approach to table writing and importantly I found this exponentially less bothersome than the restrictions placed upon me by
TableWriter
when handling the table outputs in #5913. This probably needs more tests and I can add some more comprehensive ones if you would like. Let me know your thoughts