Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ballot-interpreter): improve vertical streak detection #5522

Conversation

eventualbuddha
Copy link
Collaborator

Overview

VxCentralScan sometimes produces images with gray backgrounds that get binarized to black. If that black is inset enough and consistent enough it can cause false positives when detecting vertical streaks. The approach I took to fix this is to increase the width of the ignored border pixels from 5px to 20px. This does leave us open to the same issue, but makes it less likely to occur.

Alternatives

I initially considered switching the order of detecting streaks and detecting timing marks, figuring that we could use the area within the timing marks to find streaks to ensure we don't accidentally count a black edge as a real streak. However, @jonahkagan pointed out that this would prevent us from detecting streaks that intersect with L/R timing marks if those streaks affected the ability to detect the timing marks. I believe such a situation is likely because of the limits on allowed rotation making it likely that a streak would line up with the line of timing marks fairly well. Therefore, I opted instead to simply increase the number of pixels we ignore at each edge when detecting streaks.

Demo Video or Screenshot

The screenshots below are the debug image for the same scanned ballot, one with 5px ignored and one with 20px ignored. The dark cyan area in each image is the area that was not considered for vertical streak detection.

Old ignored border pixel value (5px)

One incorrect streak detected with this value.
image

New ignored border pixel value (20px)

No streaks detected with this value.
image

Testing Plan

Tested with NH Test Ballots with the fi-7180 scanner. Added automated tests to cover finding real mid-ballot streaks, ignoring edge streaks, and finding L/R edge timing mark-intersecting streaks.

When scanning in VxCentralScan sometimes we get wider gray areas outside the ballot paper than we expected. Most of this gray is binarized to black, leading to false positives when detecting streaks. This reduces the likelihood of false positives while preserving detection of streaks within the ballot, including within the timing marks.
Binarizes the debug image to make it clearer what the streak detector was working with.
The comment says to draw on side B, so I updated the code to do that.
Ensures that streaks through timing marks are still detected and semi-wide edge "streaks" do not cause false positives.
@@ -293,7 +293,7 @@ pub fn detect_vertical_streaks(
) -> Vec<PixelPosition> {
const PERCENT_BLACK_PIXELS_IN_STREAK: f32 = 0.75;
const MAX_WHITE_GAP_PIXELS: PixelUnit = 15;
const BORDER_COLUMNS_TO_EXCLUDE: PixelUnit = 5;
const BORDER_COLUMNS_TO_EXCLUDE: PixelUnit = 20;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable approach and am happy to move forward with it.

I wonder if we could keep this threshold lower, however, if we cropped all black columns from the edges of the image (similar to our cropping logic for the top/bottom)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to, but in at least one example I've seen the first 4 pixels at the left edge actually do contain some white, but the next 7 are pure black. That's why that example failed with 5. We could succeed with that particular ballot with a value of 12, but only barely. It's possible that 15 would be sufficient, but I deemed the benefit of detecting streaks in that region not to be worth the risk of false positives.

However, this is not based on a lot of evidence and I'm speculating. If we think it's worth doing, I could do a more thorough analysis with the fi-7180 to see what the distribution of the black sides looks like.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok interesting that there's some white on the edge first. Given what you've seen, a margin of 20 pixels seems reasonable.

We could also consider setting different margin values for different scanning hardware, but seems like we don't have strong evidence requiring that currently.

@eventualbuddha eventualbuddha merged commit 0171ffa into main Oct 16, 2024
62 checks passed
@eventualbuddha eventualbuddha deleted the 5512-address-vertical-streak-detection-false-positives-and-properly-account-for-in-vxcentralscan-ui branch October 16, 2024 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Address vertical streak detection false positives and properly account for in VxCentralScan UI
2 participants