Individual CVR Files or Batch Manifests #45

adghayes · 2023-08-01T21:03:42Z

Organization Name: VotingWorks

Organization Type: 2, Manufacturer

Document (e.g., CastVoteRecords): CastVoteRecords

Reference (Include section and paragraph number): -

Comment (Include rationale for comment):

The Cast Vote Records CDF is, when exported from a scanner at the end of election day, one massive file. Including linked images, it can be gigabytes of data. For us, this introduces two significant problems:

1 - VVSG Backup Requirement Compatibility

VVSG 1.2-H:

The voting system must withstand, without loss of data, the failure of any data input or storage device.
The intent of this requirement is to prevent votes from being permanently lost due to the failure of a storage device that contains votes. For example, if a scanner fails, the voting system must have the ability to swap in a replacement data input device without the losing cast vote records that were previously recorded by the failed scanner.

This means that we need to be exporting CVR data to some other storage medium continuously throughout a scanner's usage, not just at the end of the day. We could use some internal format, but ideally we would use the CVR CDF format. That way, the CVRs are immediately consumable by a tabulator.

The current format does not really allow a continuous export, as it's a single JSON (or XML) blob. We don't have a way to append a CVR to an existing CVR CDF file, as it's not a newline delimited format. And due to the privacy issue of revealing the order of CVRs cast, appending would not be privacy-conscious anyway.

2 - Export Time

Because we can't export to CDF files incrementally, it all has to be exported at the end of the night. This can take a very long time when there's gigabytes of data and standard quality storage mediums. The export time is another reason to find some way to export continuously.

Suggested Change:

Allow CVR Data Distributed Across Files

Ultimately, this suggestion is similar to the suggestion in #30. We need a way to place some CVRs into their own files - that way they can be exported discretely. To make that work, we need some sort of CVR Manifest or Batch Manifest file that links to the various CVRs or at least their containing directories.

There are a lot of implementation paths to this. Roughly speaking, I think there could be an "individual CVR" file format that is only the CVR.CVR data. There could be a separate or related "batch of CVRs" file format which contains the CVRs and some batch metadata. For simplicity, these could be one file format with some options at the top-level. Then, there would have to be a manifest file or manifest section in the metadata file which also contains the CVR.Election metadata and other top-level fields.

Why This Matter

We'd really like to stick to the CDF format as faithfully as possible, but the current format of one JSON blob + linked files for a day's worth of scanning will not work for our purposes. We're hoping we can create some sort of CDF extension that works for our use cases.

Organization Type: 1 = Federal, 2 = Industry, 3 = Academia, 4 = Self, 5 = Other

The text was updated successfully, but these errors were encountered:

adghayes · 2023-08-01T21:21:42Z

Point of Clarification

To export individual CVRs or batches, we could just export each CVR or batch as its own CDF compliant file. I don't think this is ideal because:

It de-normalizes and duplicates a lot of election metadata
The overall group of CVR files still has no formal CDF description (how to relate the different files)

JDziurlaj · 2023-08-02T11:22:09Z

Hi @adghayes, we are looking into this!

adghayes · 2023-08-03T15:21:06Z

@JDziurlaj great! Thank you! Please do involve us or request more detail if that would be helpful, we'd be excited to help.

JDziurlaj · 2023-08-22T17:37:29Z

To restate the problem space:

The first issue, "VVSG Backward Requirement Compatibility" refers to
VVSG 1.2-H

The voting system must withstand, without loss of data, the failure of
any data input or storage device.
The intent of this requirement is to prevent votes from being
permanently lost due to the failure of a storage device that contains
votes. For example, if a scanner fails, the voting system must have
the ability to swap in a replacement data input device without the
losing cast vote records that were previously recorded by the failed
scanner.

It sounds like you are interpreting this as requiring the scanner to write out
each cast vote record as soon as it can be constructed, i.e., from an
in-memory representation to a permanent storage location.

The second issue "Export Time", is the assertion that the structure of
the CDFs are not well suited to export as a single file. You say the expected
raw CVR data is several gigabytes and would prefer to export each CVR as
its own file.

Analysis

The CVR CDF supports export of multiple CVRs per CDF instance (e.g., a
file). The intended use is for all CVRs for a vote capture device to
exist in a single file, as it minimizes overhead.

We discuss our potential approaches to solving issues (1) and (2) below:

Approach 1: Use proprietary format for CVRs

This approach is for implementers to devise their own format
for internal CVR storage. There is no requirement to use the CVR CDF for internal
representation.

Pros:

No changes to the specification are needed

Cons:

Does not solve the export time issue
You would prefer to use the CVR CDF internally

Approach 2: Generate separate NIST CVR 1.0 instance for each individual CVR

This approach is for implementer's to use the CVR CDF such
that there is a one to one correspondence between CVRs and CVR CDF
instances (i.e. files).

Pros:

No changes to the specification are needed

Cons:

Each file would contain its own metadata section (i.e. the non CVR
specific part containing data regarding the potential contests /
options in the election), which will lead to large duplication
between files.
Because metadata is not centralized, it could change from file to
file, which could lead to conflicting definitions for contests,
options and other election definition data.
You have already said you do not prefer this approach

Approach 3a: Use XIncludes

Xincludes are essentially inclusions
as found in programming languages. One primary file would be built
containing includes to the other files. When processed, the inclusions
would be dereferenced and a single logical CVR would be constructed.

e.g.

<CastVoteRecordReport xmlns=http://itl.nist.gov/ns/voting/1500-103/v1 xmlns:xi=”http://www.w3.org/2001/XInclude”>
	<xi:include href=”cvr_fragment_1.xml”/>
	<xi:include href=”cvr_fragment_2.xml”/>
	<xi:include href=”cvr_fragment_3.xml”/>
               …
</CastVoteRecordReport>

Pros:
- Theoretically, so long as the CVRs presented at required
  interfaces point are fully expanded, no changes to the specs are
  required
Cons:
- This may not fully resolve the export time issue. The files will
  still need to be combined to conform to CVR CDF schema.
- It is unclear if you are using XML or JSON in this use-case
- Requires an XML processor that support the XIncludes
  specification
- We are not currently aware of a JSON equivalent

Would any of these approaches work for you?

benadida · 2023-08-25T11:43:17Z

Thanks for these suggestions @JDziurlaj .

I want to pop up a level: I think we may have a problem on our hands if, by following the VVSG2 standard, we are forced to either:

duplicate batch metadata for every CVR, which, as you point out, dramatically blows up the storage size and introduces a risk of data inconsistency -- not to mention that it's unclear whether a compliant CDF implementation would know how to reassemble these disparate files properly, it's not an obvious compliance requirement.
drop support for JSON as a serialization format, which, let's be frank, is the serialization format almost any modern system will prefer.
forced to not use a standard for a critical point of interoperability in a voting system.

I understand, of course, that there are good reasons to resist changes to the standard, but I want to suggest that maybe the standard has been frozen too soon. We wouldn't have been able to contribute this input to the standard until we attempted this implementation – and thus it seems premature to freeze the standard before there are at least two successful compliant implementations.

The only way we can meet the VVSG2 requirements, if CVR CDF remains unchanged, is to not comply with CVR CDF. That seems like the worst possible outcome for the standard.

adghayes changed the title ~~Individual CVR Exports or Batch Manifests~~ Individual CVR Files or Batch Manifests Aug 1, 2023

arsalansufi mentioned this issue Oct 4, 2023

Add feature flag for excluding redundant metadata in individual CVR reports votingworks/vxsuite#4036

Merged

2 tasks

mattroe mentioned this issue Nov 17, 2023

Switch from environment variables to system settings for the CAST_VOTE_RECORD_OPTIMIZATION_EXCLUDE_REDUNDANT_METADATA and CAST_VOTE_RECORD_OPTIMIZATION_EXCLUDE_ORIGINAL_SNAPSHOTS feature flags votingworks/vxsuite#4234

Closed

mattroe mentioned this issue Jul 24, 2024

VxAdmin: Strict CDF votingworks/vxsuite#4953

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Individual CVR Files or Batch Manifests #45

Individual CVR Files or Batch Manifests #45

adghayes commented Aug 1, 2023 •

edited

Loading

adghayes commented Aug 1, 2023 •

edited

Loading

JDziurlaj commented Aug 2, 2023

adghayes commented Aug 3, 2023

JDziurlaj commented Aug 22, 2023

benadida commented Aug 25, 2023

Individual CVR Files or Batch Manifests #45

Individual CVR Files or Batch Manifests #45

Comments

adghayes commented Aug 1, 2023 • edited Loading

1 - VVSG Backup Requirement Compatibility

2 - Export Time

Allow CVR Data Distributed Across Files

Why This Matter

adghayes commented Aug 1, 2023 • edited Loading

Point of Clarification

JDziurlaj commented Aug 2, 2023

adghayes commented Aug 3, 2023

JDziurlaj commented Aug 22, 2023

Analysis

Approach 1: Use proprietary format for CVRs

Approach 2: Generate separate NIST CVR 1.0 instance for each individual CVR

Approach 3a: Use XIncludes

benadida commented Aug 25, 2023

adghayes commented Aug 1, 2023 •

edited

Loading

adghayes commented Aug 1, 2023 •

edited

Loading