You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Actually there is also `SymmetricHashJoinExec` -- but, I suppose, its memory management is a bit more complicated and may be behind the scope of this issue:
It's, sort of, isolated -- at this moment there are no options for planning this operator with DataFusion planner
I don't think that simply throwing error and aborting execution is acceptable for SymmetricHashJoinExec -- if my understanding is correct -- main use case for this operator is joining two unbounded sources (streaming jobs), and from this point of view it doesn't make much sense to limit memory without any spilling fallbacks (subjectively, it doesn't seem correct to fail data streaming job in case of memory overallocation attempt)
My proposal here would be to file separate issue for SymmetricHashJoinExec memory management, and (as I see it) implement memory limitation along with data spilling. Maybe, we can go for it when we have reliable spilling for HashJoinExec, however, prior to it is also an option.
SymmetricHashJoinExec
-- if my understanding is correct -- main use case for this operator is joining two unbounded sources (streaming jobs), and from this point of view it doesn't make much sense to limit memory without any spilling fallbacks (subjectively, it doesn't seem correct to fail data streaming job in case of memory overallocation attempt)My proposal here would be to file separate issue for
SymmetricHashJoinExec
memory management, and (as I see it) implement memory limitation along with data spilling. Maybe, we can go for it when we have reliable spilling forHashJoinExec
, however, prior to it is also an option.@ozankabak, @metesynnada it would be great to hear your thoughts on it.
Originally posted by @korowa in #5220 (comment)
The text was updated successfully, but these errors were encountered: