-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50558][SQL] Add configurable logging limits for the number of elements in InSet and In #49177
[SPARK-50558][SQL] Add configurable logging limits for the number of elements in InSet and In #49177
Conversation
…nts in InSet and In
cc @cloud-fan |
@@ -487,7 +487,14 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate { | |||
} | |||
} | |||
|
|||
override def toString: String = s"$value IN ${list.mkString("(", ",", ")")}" | |||
override def toString: String = { | |||
val limit = SQLConf.get.optimizerInSetMemberLoggingLimit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of the config confuses slightly. How is it related to optimizer
and Logging
?
@@ -291,6 +291,15 @@ object SQLConf { | |||
"for using switch statements in InSet must be non-negative and less than or equal to 600") | |||
.createWithDefault(400) | |||
|
|||
val OPTIMIZER_INSET_MEMBER_LOGGING_LIMIT = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we reuse the config MAX_TO_STRING_FIELDS
?
} | ||
|
||
override def toString: String = simpleString(SQLConf.get.maxToStringFields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk @cloud-fan I guess the question is now if we want this or expect clients to use simpleString.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it's risky to access a SQLConf in such a basic API toString
. I'd refer to call simpleString(Int.Max)
in toString
and then ask the clients to use simpleString
if they care about verbose strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I made the change
val limit = SQLConf.get.maxToStringFields | ||
if (list.length <= limit) { | ||
s"$value IN ${list.mkString("(", ",", ")")}" | ||
} else { | ||
s"$value IN ${list.take(limit).mkString("(", ",", ", ...)")}" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val limit = SQLConf.get.maxToStringFields | |
if (list.length <= limit) { | |
s"$value IN ${list.mkString("(", ",", ")")}" | |
} else { | |
s"$value IN ${list.take(limit).mkString("(", ",", ", ...)")}" | |
} | |
s"$value IN ${truncatedString(list, "(", ",", ")", SQLConf.get.maxToStringFields)}" |
override def toString: String = s"$value IN ${list.mkString("(", ",", ")")}" | ||
override def simpleString(maxFields: Int): String = { | ||
if (list.length <= maxFields) { | ||
s"$value IN ${list.mkString("(", ",", ")")}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s"$value IN ${list.mkString("(", ",", ")")}" | |
s"$value IN ${list.mkString("(", ", ", ")")}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we consider the plan output stable API (which I would do) then adding a comma here changes that and in contrast to my truncation change there is no flag to disable it. If we want to change the output format, which is nice but less valuable because not performance relevant, we should really do this separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we consider the plan output stable API
We don't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not stable API, but I don't see why we should update it in an unrelated PR. This PR is for truncation, shall we open a new PR to add the space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olaky How many places do you have to update? If just a couple in explain.sql.out
, it is ok I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's likely quite some golden files. Let's do it in followup to not block this PR.
if (list.length <= maxFields) { | ||
s"$value IN ${list.mkString("(", ",", ")")}" | ||
} else { | ||
s"$value IN ${list.take(maxFields).mkString("(", ",", ", ...)")}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s"$value IN ${list.take(maxFields).mkString("(", ",", ", ...)")}" | |
s"$value IN ${list.take(maxFields).mkString("(", ", ", ", ...)")}" |
thanks, merging to master! |
What changes were proposed in this pull request?
Introduce a new SQL conf
OPTIMIZER_INSET_MEMBER_LOGGING_LIMIT
that allows to limit the number of elements logged for In and InSet in toString. The current implementation does log arbitrarily many elements and in the case of InSet even sorts them before, which can lead to a lot of unwanted log volumes and have significant performance impact.If clients require all elements, they can set to conf to a very high value or preferably use the
sql
function.Why are the changes needed?
I have observed performance impact of the logging
Does this PR introduce any user-facing change?
Teh logging string changes. Given that there is also the
sql
function, clients should not depend on the output of the logging function.How was this patch tested?
New tests have been added
Was this patch authored or co-authored using generative AI tooling?
No