You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I spent quite a bit of time today trying to track this down. First step was to run the window test suite but it surprisingly doesn't fail when just running those tests. It does fail when running all the tests. I was able to track down the start of the failures to #2919 which is very odd since nothing in that PR had anything to do with canonicalization. However the tests regularly fail after that PR and succeed before it.
Interestingly, if I revert just the test cases from that PR then canonicalization testing reliably passes again. So somehow state from previous tests is leaking into these tests. I've also seen odd things like CPU canonicalization failing but GPU working or vice-versa on other window tests. It doesn't always fail in the same way but does consistently seem to fail in some way after the tests in #2919.
The problem lies in the Spark 3.0.x logical optimizer. I verified that it is not deterministic in the order which it processes windows that use the same range. This even bears out in the test failure output above, note that it's the CPU that is failing to canonicalize in one case:
Turns out we are almost always failing to canonicalize these range queries on Spark 3.0.x, but this wasn't failing before because both the CPU and GPU were failing to canonicalize and it compares-the-compares to determine if the test fails. false == false, so it wasn't failing. Somehow #2919 was perturbing the logical optimizer to sometimes come up with the same plan twice in a row for these tests, and that's why it would fail. I verified that Spark 3.1+ is always producing the same logical plan for a particular window query even if it has the same ranges.
Window func UT failed on spark 3.0.3+ (not affect 3.1.x)
To Reproduce,
The Error,
The text was updated successfully, but these errors were encountered: