Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a simple TSV/CSV/XSV writer with cloud write support #5930

Merged
merged 3 commits into from
May 10, 2019

Conversation

jamesemery
Copy link
Collaborator

This is what you asked for @droazen. This is one approach to table writing and importantly I found this exponentially less bothersome than the restrictions placed upon me by TableWriter when handling the table outputs in #5913. This probably needs more tests and I can add some more comprehensive ones if you would like. Let me know your thoughts

@codecov
Copy link

codecov bot commented May 9, 2019

Codecov Report

Merging #5930 into master will decrease coverage by 0.017%.
The diff coverage is 64.407%.

@@              Coverage Diff               @@
##             master     #5930       +/-   ##
==============================================
- Coverage     86.84%   86.823%   -0.017%     
- Complexity    32326     32344       +18     
==============================================
  Files          1991      1993        +2     
  Lines        149342    149460      +118     
  Branches      16482     16502       +20     
==============================================
+ Hits         129689    129766       +77     
- Misses        13646     13679       +33     
- Partials       6007      6015        +8
Impacted Files Coverage Δ Complexity Δ
.../tsv/SimpleCSVWriterWrapperWithHeaderUnitTest.java 48.077% <48.077%> (ø) 7 <7> (?)
...nstitute/hellbender/utils/tsv/SimpleXSVWriter.java 77.273% <77.273%> (ø) 11 <11> (?)
...lotypecaller/readthreading/ReadThreadingGraph.java 88.971% <0%> (+0.245%) 159% <0%> (ø) ⬇️

@jamesemery jamesemery force-pushed the je_addHackyTSVWriter branch from a868db7 to bc0a7c3 Compare May 10, 2019 17:22
@droazen droazen changed the title Added hackey alternative to TableWriter with a test of cloud write support Added a simple TSV/CSV/XSV writer with cloud write support May 10, 2019
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to you with my comments @jamesemery -- go ahead and merge after addressing them

*
* Why didn't I use a tableWriter here? Who really holds the patent on the wheel anyway? Certainly not TableWriter.
*/
class SimpleCSVWriterWrapperWithHeader implements Closeable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Class should be public, and renamed to SimpleXSVWriter

import java.util.*;

/**
* A simple helper class wrapper around CSVWriter that has the ingrained concept of a header line with indexed fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change description to something like: "A simple TSV/CSV/XSV writer with support for writing to the cloud"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and mention that the delimiter is configurable)

* construct a new line call {@link #getNewLineBuilder} to get a line builder for each line, which then has convienent
* methods for individually assigning column values based on the header line etc. Once a line is finished being mutated
* one simply needs to call write() on the line to validate and finalize the line.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a couple lines of working example code here showing how to initialize the writer, set the header, and write an entire line by passing in all column values at once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also describe the format of the header in the output file.

* methods for individually assigning column values based on the header line etc. Once a line is finished being mutated
* one simply needs to call write() on the line to validate and finalize the line.
*
* Why didn't I use a tableWriter here? Who really holds the patent on the wheel anyway? Certainly not TableWriter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd replace this with a brief 1-2 sentence explanation of the deficiencies of TableWriter that motivated the creation of this class.

/**
* @param row complete line corresponding to this row of the tsv
*/
public SimpleCSVWriterLineBuilder setRow(final String[] row) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add an overload that takes List<String>

* @param row complete line corresponding to this row of the tsv
*/
public SimpleCSVWriterLineBuilder setRow(final String[] row) {
checkAlteration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkAlteration() -> checkAlterationAfterWrite()


public class SimpleCSVWriterWrapperWithHeaderUnitTest extends GATKBaseTest {


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate the cloud and local file test cases. Also add tests covering setRow(), fill(), the version of setColumn() that takes an index, and the other methods you're not currently testing.

SimpleCSVWriterWrapperWithHeader.SimpleCSVWriterLineBuilder bucketLine = bucketWriter.getNewLineBuilder();
SimpleCSVWriterWrapperWithHeader.SimpleCSVWriterLineBuilder localLine = localWriter.getNewLineBuilder();
Arrays.stream(header).forEach(column -> {
double rand = Math.random();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use random data -- write known data in a pattern, then assert that you get the right values back on read.

}

bucketWriter.close();
localWriter.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close using try-with-resources instead.

@droazen droazen assigned jamesemery and unassigned droazen May 10, 2019
* methods for individually assigning column values based on the header line etc. Once a line is finished being mutated
* one simply needs to call write() on the line to validate and finalize the line.
*
* Header lines are encoded in the same format as each row, a single row of delimeted column titles as the first row in the table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delimited

@jamesemery jamesemery merged commit 350a6bf into master May 10, 2019
@jamesemery jamesemery deleted the je_addHackyTSVWriter branch May 10, 2019 20:24
RoriCremer pushed a commit to RoriCremer/gatk that referenced this pull request May 15, 2019
RoriCremer pushed a commit to RoriCremer/gatk that referenced this pull request May 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants