-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: scytl connector #737
Conversation
Hi @agreenspan24! Thanks so much for this contribution. I haven't had a chance to go through the whole PR yet but I'm hoping to get to it and leave a review later this week. I appreciate you having thorough tests for you connector! I am a little hesitant about the 25-ish test files though. Are they all necessary? Might they cause issues for long-term maintainability if there are formatting changes? |
Hi Shauna! Makes total sense - was trying to keep the 18 county files to match the real results, but I think you're right to reduce them. I cut out all but two and adjusted the tests - run a good bit faster now too! For maintainability, those files are just direct copies of real election results as Scytl holds them. If they change their format, I'm hoping we can just drop newer versions of those files into the tests and adjust the parser accordingly. But let me know your thoughts! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Mostly docs issues here.
ex: Clarke | ||
""" | ||
|
||
def __init__(self, state: str, election_id: str, county=""): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unusual for anything other than auth info to get passed in to a Parsons connector, although it certainly does happen, and it makes sense to pass in state
and election_id
here so they don't have to be passed in with get_summary_results
, etc.
That said, county
in particular confuses me. How does the county parameter here fit with the counties passed in with get_detailed_results_for_participating_counties
? Does this county parameter get used at any other point? It seems like the other two are always state level but maybe I'm misunderstanding. Does passing in the county on init mean that get_summary_results
and get_county_results
apply only to the county?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the setup is kinda confusing, would love your help navigating it.
Essentially, they have a concept of an election (at the state or county level), which can have summary results and detailed results. The detailed results show results by county for state level elections, while they show precinct results for a county-level election.
State-level elections can also have sub-elections in each county, and link to each of those county elections, which each have summary and detailed results. So, a county-level election won't have participating counties and the method won't make sense.
The main reason I wanted to add the state and election_id at the class level instead of the method is so the class could track the previous version- I have a feeling people would use this for polling and I want to make that as easy as possible for them to see if results have updated.
One idea might be to split this up into a StateScytlElection
and CountyScytlElection
class, which would have some of the same implementation, but not all of it. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that was a bit of a word salad. Does it make sense to you to split up the state-wide election and county-wide election into two separate connectors, since they have slightly different functions and outputs?
Separately, if we're using this for polling, to prevent fetching unnecessary results, does it make sense to you to store the previous version in the class or return it with the results and have them pass it back in the function?
@agreenspan24 I think your test fixes look good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes.
test/test_scytl/test_scytl.py
Outdated
scy = Scytl(TEST_STATE, TEST_ELECTION_ID) | ||
result = scy.get_summary_results() | ||
|
||
with open(f'{_dir}/114729_summary_expected.csv', 'r') as expected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it hard to understand if the data is correct since it's hard to compare the source and destination data files in this diff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a reasonable point. I think it's important to use the zip files since unzipping is a big part of what we're testing.
I guess it's somewhat helpful to just read the expected CSV and make sure all the data looks good and reasonable, but not ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alxmrs Thanks for the review! Still pretty new to proper python conventions, so I appreciate the deeper look! A couple of ideas here sound interesting to me, like typing and dataclasses, but I don't want to introduce new patterns without approval from maintainers, so @shaunagm let us know your thoughts!
test/test_scytl/test_scytl.py
Outdated
scy = Scytl(TEST_STATE, TEST_ELECTION_ID) | ||
result = scy.get_summary_results() | ||
|
||
with open(f'{_dir}/114729_summary_expected.csv', 'r') as expected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a reasonable point. I think it's important to use the zip files since unzipping is a big part of what we're testing.
I guess it's somewhat helpful to just read the expected CSV and make sure all the data looks good and reasonable, but not ideal.
Responded re: dataclasses inline. The question of typing is a bigger one. We currently don't require any sort of type annotations and certainly not any strict enforcement, but perhaps that's something we could at least ask of contributors going forward. We don't have capacity to retroactively add typing to the existing codebase but incremental improvements are better than nothing. Thanks for raising these questions @agreenspan24 & @alxmrs |
@shaunagm any idea what's causing |
I don't think it's an issue with your PR. @SorenSpicknall? This and a couple other open PRs are failing with this same error |
I'm working on this now; definitely looks like a CircleCI config breakage. If we can't figure it out in the next day or so, we may just merge some of these newer PRs without the additional testing step in order to not pause ongoing work, since the PRs generally test green aside from the CI issue. |
@SorenSpicknall I know you submitted a PR to fix the CI issue (#752), what do we need to do to get this PR passing? Do we need to rebase it onto the updated main branch? |
Yes, changes from |
Let me know if you need help rebasing, @agreenspan24. It can be tricky. |
Hi @agreenspan24! Just checking in. I know a lot of folks are busy with the election, so I wanted to offer a few different options here: (a) I can rebase your PR for you (I should be able to as a maintainer) Totally up to you! |
@shaunagm so sorry I missed this request, I can find some time in the next few days to rebase |
@shaunagm all done! |
Thank you so much for the effort you put into this PR! Excited to finally get to merge it. :) |
Scytl, or Clarity Elections, is a company that creates a tool for publishing election results in real-time. It's used in the U.S. by several states and over 800 counties, including Georgia, Colorado, Arkansas, and Dallas County, Texas.
For each participating election administrator, they publish a site with results that can be published on election night. Unfortunately, while that site contains downloadable data, that data is formatted in a complex way, making it difficult for developers to fetch election results. In general, their results either come zipped in either an unformatted text file or a complex XML document. Summary results come as a zipped CSV, but just contain top-line results. The JSON that powers the site results is even more complicated.
This connector provides methods to download the latest election results from their site and formats them into readable lists of dictionaries, which can easily be converted into a Parsons Table or Pandas dataframe.
Because this connector is can be useful for live reporting, it also contains a polling feature. As long as the class is instantiated, it will only fetch results that are new since the previous fetch of that method. To skip this feature, set force_update to true on any of the fetch methods.
Note: please do not confuse with Clarity Campaign Labs
Example microsite: https://results.enr.clarityelections.com/CO/105975/web.276603/