-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensured that schema validation matches nested structures that are in different order #144
Conversation
Enables nested structures to be compared regardless of order.
@skestle - thanks for submitting this PR. Is the "Just a test at this stage" comment still the latest? A high level overview of the current behavior and what this PR changes would be great! Thank you so much for helping! |
case reqSchema: StructType => | ||
namedField.dataType match { | ||
case fieldSchema: StructType => | ||
diff(reqSchema, fieldSchema).isEmpty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place basicMatch
is necessary - the && should happen here and the value should be lazy. (Nitpick)
Thanks - I'd forgotten to update the description. |
@skestle - thanks for following up. I'm still trying to figure out the high level description of the issue. Here's what I can tell from the other PR and this one. Suppose you have two DataFrames, Here's the schema of
Here's the schema of
You're proposing changing this behavior and considering these schemas true by default. Is that correct? What's your proposal for schema that are not nested, but also out of order? Does this PR impact that scenario as well? Thanks for the help! |
The The issue was is that the nested Structs will just be tested for equality. The test verifies that not nested, nested, and double nested out-of-order structures are handled consistently. So yes, I'm proposing changing the nested behaviour to consider the schemas the same by default - since that makes it consistent with the non-nested schema verification. |
Thanks and sorry for the delay in getting this merged! |
…different order (mrpowers-io#144) * Highlighted issues with schema validation's order sensitivity of nested structures. * Implemented recursive schema checker Enables nested structures to be compared regardless of order.
Adds test for nested out-of-order structures and resolved with recursive checking.
This (was found / is "necessary" for spark 3.2) because it seems that scala 2.13 has changed the hashing function, and the
.groupBy
call results in the existing nested validation test failing due to the schema being shuffled while being validated.