Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-50558][SQL] Add configurable logging limits for the number of elements in InSet and In #49177

Conversation

olaky
Copy link
Contributor

@olaky olaky commented Dec 13, 2024

What changes were proposed in this pull request?

Introduce a new SQL conf OPTIMIZER_INSET_MEMBER_LOGGING_LIMIT that allows to limit the number of elements logged for In and InSet in toString. The current implementation does log arbitrarily many elements and in the case of InSet even sorts them before, which can lead to a lot of unwanted log volumes and have significant performance impact.
If clients require all elements, they can set to conf to a very high value or preferably use the sql function.

Why are the changes needed?

I have observed performance impact of the logging

Does this PR introduce any user-facing change?

Teh logging string changes. Given that there is also the sql function, clients should not depend on the output of the logging function.

How was this patch tested?

New tests have been added

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Dec 13, 2024
@olaky
Copy link
Contributor Author

olaky commented Dec 13, 2024

cc @cloud-fan

@@ -487,7 +487,14 @@ case class In(value: Expression, list: Seq[Expression]) extends Predicate {
}
}

override def toString: String = s"$value IN ${list.mkString("(", ",", ")")}"
override def toString: String = {
val limit = SQLConf.get.optimizerInSetMemberLoggingLimit
Copy link
Member

@MaxGekk MaxGekk Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the config confuses slightly. How is it related to optimizer and Logging?

@@ -291,6 +291,15 @@ object SQLConf {
"for using switch statements in InSet must be non-negative and less than or equal to 600")
.createWithDefault(400)

val OPTIMIZER_INSET_MEMBER_LOGGING_LIMIT =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we reuse the config MAX_TO_STRING_FIELDS?

}

override def toString: String = simpleString(SQLConf.get.maxToStringFields)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk @cloud-fan I guess the question is now if we want this or expect clients to use simpleString.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's risky to access a SQLConf in such a basic API toString. I'd refer to call simpleString(Int.Max) in toString and then ask the clients to use simpleString if they care about verbose strings.

Copy link
Contributor Author

@olaky olaky Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I made the change

@HyukjinKwon HyukjinKwon changed the title [SPARK-50558] Add configurable logging limits for the number of elements in InSet and In [SPARK-50558][SQL] Add configurable logging limits for the number of elements in InSet and In Dec 16, 2024
Comment on lines 491 to 496
val limit = SQLConf.get.maxToStringFields
if (list.length <= limit) {
s"$value IN ${list.mkString("(", ",", ")")}"
} else {
s"$value IN ${list.take(limit).mkString("(", ",", ", ...)")}"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val limit = SQLConf.get.maxToStringFields
if (list.length <= limit) {
s"$value IN ${list.mkString("(", ",", ")")}"
} else {
s"$value IN ${list.take(limit).mkString("(", ",", ", ...)")}"
}
s"$value IN ${truncatedString(list, "(", ",", ")", SQLConf.get.maxToStringFields)}"

override def toString: String = s"$value IN ${list.mkString("(", ",", ")")}"
override def simpleString(maxFields: Int): String = {
if (list.length <= maxFields) {
s"$value IN ${list.mkString("(", ",", ")")}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s"$value IN ${list.mkString("(", ",", ")")}"
s"$value IN ${list.mkString("(", ", ", ")")}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we consider the plan output stable API (which I would do) then adding a comma here changes that and in contrast to my truncation change there is no flag to disable it. If we want to change the output format, which is nice but less valuable because not performance relevant, we should really do this separately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we consider the plan output stable API

We don't

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not stable API, but I don't see why we should update it in an unrelated PR. This PR is for truncation, shall we open a new PR to add the space?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olaky How many places do you have to update? If just a couple in explain.sql.out, it is ok I believe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely quite some golden files. Let's do it in followup to not block this PR.

if (list.length <= maxFields) {
s"$value IN ${list.mkString("(", ",", ")")}"
} else {
s"$value IN ${list.take(maxFields).mkString("(", ",", ", ...)")}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
s"$value IN ${list.take(maxFields).mkString("(", ",", ", ...)")}"
s"$value IN ${list.take(maxFields).mkString("(", ", ", ", ...)")}"

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in f4c88ca Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants