Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change approach to fake test data #835

Open
shaunagm opened this issue Jun 6, 2023 · 3 comments
Open

Change approach to fake test data #835

shaunagm opened this issue Jun 6, 2023 · 3 comments
Labels
enhancement Impact - something should be added to or changed about Parsons that isn't causing a current breakage medium priority Priority - this doesn't need to be addressed immediately, but will broadly impact Parsons users needs discussion needs community input and/or maintainer discussion

Comments

@shaunagm
Copy link
Collaborator

shaunagm commented Jun 6, 2023

Currently, our connector tests involve large amounts of fake data, usually in JSON format (but occasionally stored as Python dicts, csvs, or other formats). Sometimes this data is incorporated into the tests themselves, making them hard to read. Sometimes they're put in separate files, which is better, but it's still not ideal to have, say, a 400 line test data file to test just one connector.

Are there other approaches that might be more readable, easier to maintain, and easier to write? (I know generating the test data is often the most annoying part of writing tests for connectors.)

I'm aware of tools like Factory Boy but that's for Python objects, not really for data. There's Faker which seems more promising.

Another option might be making use of Json Schemas although "validate the schema" isn't a huge part of the tests we're doing.

(I don't love that any of these approaches would involve adding another dependency - maybe it's time to separate out the handful of dev dependencies, like we do the docs dependencies?)

Whatever we do, we should make sure to document it really well so that it makes the lives of people writing Parsons tests easier rather than harder and more confusing.

What do folks think?

@shaunagm shaunagm added enhancement Impact - something should be added to or changed about Parsons that isn't causing a current breakage medium priority Priority - this doesn't need to be addressed immediately, but will broadly impact Parsons users needs discussion needs community input and/or maintainer discussion labels Jun 6, 2023
@corasaurus-hex
Copy link

corasaurus-hex commented Jun 7, 2023

What do you think about something like hypothesis-jsonschema? The plus side to using something like this is that you can default to running just one example per test but can, in CI or otherwise, use more examples to stress test.

@shaunagm
Copy link
Collaborator Author

@corasaurus-hex great suggestion. I haven't used hypothesis-jsonschema before, so my main concern is around usability for people who don't have engineering backgrounds. Also just general time to implement vs other solutions. But this definitely deserves consideration!

@corasaurus-hex
Copy link

@shaunagm that's an extremely fair take on that, it's definitely more challenging and time-consuming to implement, and maybe a little confusing if they find it fails in once instance and not in another because the data is all generated. so, consider that suggestion retracted.

As a side note, it looks like Factory Boy can create dicts, which I wasn't aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Impact - something should be added to or changed about Parsons that isn't causing a current breakage medium priority Priority - this doesn't need to be addressed immediately, but will broadly impact Parsons users needs discussion needs community input and/or maintainer discussion
Projects
None yet
Development

No branches or pull requests

2 participants