-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tabulate by Batch, and readiness for Tabulate By X #816
Conversation
1fd1860
to
d97838b
Compare
@yezr this has been lightly tested but I could use some help gathering test data, test configurations, and checking edge cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems great. My feedback is mainly around the language. The verb phrase "tabulate by field" is OK, but could be clearer. There and elsewhere, my concern is mostly about how generic the word "field" is.
Logger.severe("tabulateByPrecinct may not be used with CDF files."); | ||
for (TabulateByField tabulateByField : enabledFields()) { | ||
validationErrors.add(ValidationError.CVR_CDF_TABULATE_BY_DISAGREEMENT); | ||
Logger.severe("%s may not be used with CDF files.", tabulateByField); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should maybe note that this is just because we haven't chosen to (fully) implement it.
@@ -1291,5 +1332,21 @@ public String getInternalLabel() { | |||
} | |||
} | |||
|
|||
enum TabulateByField { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this name kind of confusing because it sounds like a verb phrase (as in "I want to tabulate by field"). Maybe TabulationBucketingField or TabulationGroupingField or TabulationSliceField?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like slice
, and think it will make sense to s/field/slice throughout this PR
@@ -9,7 +9,7 @@ | |||
|
|||
/* | |||
* Purpose: Ingests tabulation results and generates various summary report files. | |||
* Design: Generates per-precinct files if specified. | |||
* Design: Generates per-field files if specified. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would clarify what "per-field" means in this comment. Ideally we can choose a global term for this subgroups, like one of the terms I suggested above or something similar. Bucket, group, subgroup, slice, subset. "Pivot" and "metadata" feel maybe a bit too technical. Anyway, there are a bunch of places in this PR where we're just saying "field" or "fieldId", and I find that confusingly generic. I'd prefer for all of those to say something more along the lines of "bucketType"/"bucketId" or whatever to convey that that's the purpose of the fields and the IDs.
@@ -63,8 +66,8 @@ class ResultsWriter { | |||
|
|||
// number of rounds needed to elect winner(s) | |||
private int numRounds; | |||
// all precinct Ids which may appear in the output cvrs | |||
private Set<String> precinctIds; | |||
// all Field Ids which may appear in the output cvrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/which/that
@@ -565,7 +573,8 @@ String writeRctabCvrCsv( | |||
outputFile.getAbsolutePath()); | |||
CSVPrinter csvPrinter; | |||
BufferedWriter writer = Files.newBufferedWriter(outputFile.toPath()); | |||
csvPrinter = new CSVPrinter(writer, CSVFormat.DEFAULT); | |||
CSVFormat format = CSVFormat.DEFAULT.builder().setNullString("").build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convenient
private final Map<String, Map<Integer, RoundTally>> precinctRoundTallies = new HashMap<>(); | ||
private final RoundTallies roundTallies = new RoundTallies(); | ||
// roundTalliesByField is a map from a tabulate-by field to roundTallies for that field | ||
private final BreakdownByField<RoundTallies> roundTalliesByField = new BreakdownByField<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I misreading the code, or does this mean that if you're using more than one breakdown (with the current code, that would be both batch and precinct), there will be a collision if there's any overlap in the IDs? E.g. if there's a batch 1 and also a precinct 1, they'll be writing to and reading from the same tallies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no, now I see that roundTalliesByField
is a map from field to fieldId to round to vote count. Maybe roundTalliesByFields
(plural) would be more clear?
String precinctId = cvr.getPrecinct(); | ||
if (precinctId != null) { | ||
precinctIds.add(precinctId); | ||
for (TabulateByField field : config.enabledFields()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good example of the ambiguity here: without context, I would never guess that config.enabledFields()
is giving me a list of enabled breakdown fields
@@ -789,8 +798,8 @@ void generateSummaryFiles(String timestamp) throws IOException { | |||
} | |||
} | |||
|
|||
Set<String> getPrecinctIds() throws IOException { | |||
return precinctIds; | |||
FieldIdSet getFieldIds() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should fieldIds
/getFieldIds
clarify that it's referring to the enabled ones? And why do we need the fieldIds
instance variable instead of just using config.enabledFields()
? Aren't we throwing an exception if there's a discrepancy between the config and the CVRs in this regard anyway?
* Breaks down the templated type T by a specific field. | ||
* The field type is the outer map (TabulateByField), and the field ID is the inner map (String). | ||
*/ | ||
static class BreakdownByField<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so you made this generic so that you could reuse it for both tallies and tally transfers.
@tarheel all comments are addressed! |
Request for comment on language.
As a noun, I call it a "a tabulate-by field"
As a verb, I call it "Tabulate By Field"
I can imagine preferences for other names, by "Tabulate By Pivot" or "Tabulate By Metadata Field" or something else, but "Tabulate By Field" seems sufficiently clear to me. Thoughts? Any other words that may reduce ambiguity?
Closes #807