-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using decorators to perform schema validations on DataFrames #141
base: main
Are you sure you want to change the base?
Conversation
Hey @edmondop, I'll take a look at this! |
@edmondop, @MrPowers To an end user, I'm not sure if this difference would be obvious. Would it be possible to make the existing functions work as optional decorators? Let me know what y'all think |
Hi @jeffbrennan I tried a couple of strategy but that doesn't work well. The problem seems to be that Python has a limitation around overloading, that is not fixed in the standard library. If you define let's say |
So this pull request adds a few functions to the public interface: I am actually curious what would happen with the quinn already has the following functions: From what I understand, we can't use the existing names because Python doesn't support overloading. It seems like From a naming perspective, I definitely don't think we should have "presence_of_columns" and "columns_present". I am on the fence about this one. I really like the decorator syntax. If we didn't already have something then this would be a no-brainer. I am concerned that introducing another way to accomplish the same task might confuse users. I am interested in thoughts from the community. |
There are workarounds, but they all come with trade-offs I believe. In particular, I think if we use https://pypi.org/project/multimethod/, we can achieve the desired behavior of having a single consistent API . I just didn't feel like adding a new dependency without asking was appropriate. I tried with Other ideas:
I think the use case here that I have in mind is that someone could desire that in a complex Spark pipeline, being all lazy, there are explicit points where the schema is checked. Almost as an instrumentation system for a spark job |
@edmondop - I am really happy that you opened this pull request and put this on my radar. Decorators could be really nice for Python programmers. I created a separate issue where we can brainstorm this more generically. I'm interested in brainstorming the ideal decorator interface for PySpark programmers, seeing how it helps for common pipelines, and exploring any potential downsides of the approach. I think we should detach this "ideal end state" brainstorming exercise from this "dealing with legacy code and all the semver/deprecation/public interface messiness considerations. Let's brainstorm the ideal end state in the other issue! |
Addresses #140