[SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework #46022

panbingkun · 2024-04-12T03:12:34Z

What changes were proposed in this pull request?

The pr aims to migrate logInfo in module Connector with variables to structured logging framework.

Why are the changes needed?

To enhance Apache Spark's logging system by implementing structured logging.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

…uctured logging framework

panbingkun · 2024-04-15T08:43:23Z

.../connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectServer.scala

-            s"Spark Connect server started at: " +
-              s"${isa.getAddress.getHostAddress}:${isa.getPort}")
+            log"Spark Connect server started at: " +
+              log"${MDC(RPC_ADDRESS, isa.getAddress.getHostAddress)}:${MDC(PORT, isa.getPort)}")


I'm not sure if calling it HOST_PORT would be more appropriate.

Either way seems fine. Or, we can consider unifying them.

panbingkun · 2024-04-15T08:51:05Z

connector/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala

@@ -70,7 +70,7 @@ class KafkaTestUtils(
  private val JAVA_AUTH_CONFIG = "java.security.auth.login.config"

  private val localHostNameForURI = Utils.localHostNameForURI()
-  logInfo(s"Local host name is $localHostNameForURI")
+  logInfo(log"Local host name is ${MDC(LogKey.URI, localHostNameForURI)}")


Because the LogKey URI is duplicated with the Java class, we will write it directly as LogKey.XXX
At the same time, we will delete the import

import org.apache.spark.internal.LogKey.ERROR

panbingkun · 2024-04-15T09:06:42Z

cc @gengliangwang

gengliangwang · 2024-04-15T21:16:04Z

...erver/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala

@@ -95,7 +95,7 @@ private[connect] class SparkConnectExecutionManager() extends Logging {
      sessionHolder.addExecuteHolder(executeHolder)
      executions.put(executeHolder.key, executeHolder)
      lastExecutionTimeMs = None
-      logInfo(s"ExecuteHolder ${executeHolder.key} is created.")
+      logInfo(log"ExecuteHolder ${MDC(LogKey.EXECUTE_HOLDER_KEY, executeHolder.key)} is created.")


Suggested change

logInfo(log"ExecuteHolder ${MDC(LogKey.EXECUTE_HOLDER_KEY, executeHolder.key)} is created.")

logInfo(log"ExecuteHolder ${MDC(LogKey.EXECUTE_KEY, executeHolder.key)} is created.")

gengliangwang · 2024-04-15T21:16:11Z

...erver/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectExecutionManager.scala

@@ -122,7 +122,7 @@ private[connect] class SparkConnectExecutionManager() extends Logging {
      if (executions.isEmpty) {
        lastExecutionTimeMs = Some(System.currentTimeMillis())
      }
-      logInfo(s"ExecuteHolder $key is removed.")
+      logInfo(log"ExecuteHolder ${MDC(LogKey.EXECUTE_HOLDER_KEY, key)} is removed.")


Suggested change

logInfo(log"ExecuteHolder ${MDC(LogKey.EXECUTE_HOLDER_KEY, key)} is removed.")

logInfo(log"ExecuteHolder ${MDC(LogKey.EXECUTE_KEY, key)} is removed.")

gengliangwang · 2024-04-15T21:19:07Z

...er/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectStreamingQueryCache.scala

-          logInfo(s"Adding new query to the cache. Query Id ${query.id}, value $value.")
+          logInfo(
+            log"Adding new query to the cache. Query Id ${MDC(QUERY_ID, query.id)}, " +
+              log"value ${MDC(QUERY_CACHE, value)}.")


Suggested change

log"value ${MDC(QUERY_CACHE, value)}.")

log"value ${MDC(QUERY_CACHE_VALUE, value)}.")

gengliangwang · 2024-04-15T21:27:48Z

...kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/KafkaDataConsumer.scala

-      s"$walTime nanos."
+    logInfo(log"From Kafka ${MDC(CONSUMER, kafkaMeta)} read " +
+      log"${MDC(TOTAL_RECORDS_READ, totalRecordsRead)} records through " +
+      log"${MDC(COUNT_POLL, numPolls)} polls " +


KAFKA_PULLS_COUNT?

gengliangwang · 2024-04-15T21:28:04Z

...kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/consumer/KafkaDataConsumer.scala

+    logInfo(log"From Kafka ${MDC(CONSUMER, kafkaMeta)} read " +
+      log"${MDC(TOTAL_RECORDS_READ, totalRecordsRead)} records through " +
+      log"${MDC(COUNT_POLL, numPolls)} polls " +
+      log"(polled out ${MDC(COUNT_RECORDS_POLL, numRecordsPolled)} records), " +


KAFKA_RECORDS_PULLED_COUNT

gengliangwang · 2024-04-15T21:31:47Z

.../kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala

@@ -325,7 +327,8 @@ private[spark] class DirectKafkaInputDStream[K, V](

    override def restore(): Unit = {
      batchForTime.toSeq.sortBy(_._1)(Time.ordering).foreach { case (t, b) =>
-         logInfo(s"Restoring KafkaRDD for time $t ${b.mkString("[", ", ", "]")}")
+         logInfo(log"Restoring KafkaRDD for time ${MDC(TIME, t)} " +


QQ: is the time here using ms?

Yeah, Here t is an instance of the Time, and Time defaults to outputting a time unit of ms, as follows:

spark/streaming/src/main/scala/org/apache/spark/streaming/Time.scala

Line 87 in e815012

override def toString: String = (millis.toString + " ms")

gengliangwang · 2024-04-15T21:34:11Z

@panbingkun Thanks for the works. LGTM except for some minor comments.

panbingkun · 2024-04-16T01:51:48Z

@panbingkun Thanks for the works. LGTM except for some minor comments.

@gengliangwang
Updated. Thank you for your review!

gengliangwang · 2024-04-16T20:25:35Z

Thanks, merging to master

dongjoon-hyun · 2024-05-04T01:20:23Z

connector/profiler/src/main/scala/org/apache/spark/executor/profiler/ExecutorJVMProfiler.scala

@@ -25,7 +25,8 @@ import org.apache.hadoop.fs.{FileSystem, FSDataOutputStream, Path}

 import org.apache.spark.SparkConf
 import org.apache.spark.deploy.SparkHadoopUtil
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.LogKey.PATH
+import org.apache.spark.internal.{Logging, MDC}


Unfortunately, this import ordering issue was missed because dev/scalastyle didn't include this module.

Here is the fix for dev/scalastyle and this.

[SPARK-48127][INFRA] Fix dev/scalastyle to check hadoop-cloud and jvm-profiler modules #46376

dongjoon-hyun · 2024-05-04T01:20:37Z

...ctor/profiler/src/main/scala/org/apache/spark/executor/profiler/ExecutorProfilerPlugin.scala

@@ -23,7 +23,8 @@ import scala.util.Random

 import org.apache.spark.SparkConf
 import org.apache.spark.api.plugin.{DriverPlugin, ExecutorPlugin, PluginContext, SparkPlugin}
-import org.apache.spark.internal.Logging
+import org.apache.spark.internal.LogKey.EXECUTOR_ID
+import org.apache.spark.internal.{Logging, MDC}


ditto. Import ordering issue.

Thank you very much for fix it.

… `jvm-profiler` modules ### What changes were proposed in this pull request? This PR aims to fix `dev/scalastyle` to check `hadoop-cloud` and `jam-profiler` modules. Also, the detected scalastyle issues are fixed. ### Why are the changes needed? To prevent future scalastyle issues. Scala style violation was introduced here, but we missed because we didn't check all optional modules. - #46022 `jvm-profiler` module was added newly at Apache Spark 4.0.0 but we missed to add this to `dev/scalastyle`. Note that there was no scala style issues in that `module` at that time. - #44021 `hadoop-cloud` module was added at Apache Spark 2.3.0. - #17834 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with newly revised `dev/scalastyle`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46376 from dongjoon-hyun/SPARK-48127. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

… `jvm-profiler` modules ### What changes were proposed in this pull request? This PR aims to fix `dev/scalastyle` to check `hadoop-cloud` and `jam-profiler` modules. Also, the detected scalastyle issues are fixed. ### Why are the changes needed? To prevent future scalastyle issues. Scala style violation was introduced here, but we missed because we didn't check all optional modules. - apache#46022 `jvm-profiler` module was added newly at Apache Spark 4.0.0 but we missed to add this to `dev/scalastyle`. Note that there was no scala style issues in that `module` at that time. - apache#44021 `hadoop-cloud` module was added at Apache Spark 2.3.0. - apache#17834 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs with newly revised `dev/scalastyle`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46376 from dongjoon-hyun/SPARK-48127. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

[SPARK-47594] Connector module: Migrate logInfo with variables to str…

ecdd06c

…uctured logging framework

github-actions bot added SQL STRUCTURED STREAMING DSTREAM AVRO CONNECT labels Apr 12, 2024

panbingkun added 4 commits April 12, 2024 13:36

Merge branch 'master' into SPARK-47594

fd4cdc1

fix style

d99f60e

fix code style

83ae49a

Merge branch 'master' into SPARK-47594

6790e13

panbingkun commented Apr 15, 2024

View reviewed changes

panbingkun added 2 commits April 15, 2024 17:05

fix

a1d4d8c

Merge branch 'master' into SPARK-47594

821c8e3

panbingkun changed the title ~~[WIP][SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework~~ [SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework Apr 15, 2024

panbingkun marked this pull request as ready for review April 15, 2024 09:06

Merge branch 'master' into SPARK-47594

236f520

gengliangwang reviewed Apr 15, 2024

View reviewed changes

panbingkun added 3 commits April 16, 2024 09:43

fix

41542f2

fix

c5deb02

fix

f5ae3bd

github-actions bot added the CORE label Apr 16, 2024

fix

05546fc

fix

53f56e0

github-actions bot added the YARN label Apr 16, 2024

panbingkun and others added 2 commits April 16, 2024 13:37

fix

1ed7090

Merge branch 'master' into SPARK-47594

6685880

gengliangwang approved these changes Apr 16, 2024

View reviewed changes

gengliangwang closed this in 6919feb Apr 16, 2024

dongjoon-hyun mentioned this pull request May 4, 2024

[SPARK-48127][INFRA] Fix dev/scalastyle to check hadoop-cloud and jvm-profiler modules #46376

Closed

dongjoon-hyun reviewed May 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework #46022

[SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework #46022

panbingkun commented Apr 12, 2024 •

edited

Loading

panbingkun Apr 15, 2024

gengliangwang Apr 15, 2024

panbingkun Apr 15, 2024

panbingkun commented Apr 15, 2024

gengliangwang Apr 15, 2024

gengliangwang Apr 15, 2024

gengliangwang Apr 15, 2024

gengliangwang Apr 15, 2024

gengliangwang Apr 15, 2024

gengliangwang Apr 15, 2024

panbingkun Apr 16, 2024

gengliangwang commented Apr 15, 2024

panbingkun commented Apr 16, 2024 •

edited

Loading

gengliangwang commented Apr 16, 2024

dongjoon-hyun May 4, 2024

dongjoon-hyun May 4, 2024

panbingkun May 6, 2024

	logInfo(log"ExecuteHolder ${MDC(LogKey.EXECUTE_HOLDER_KEY, executeHolder.key)} is created.")
	logInfo(log"ExecuteHolder ${MDC(LogKey.EXECUTE_KEY, executeHolder.key)} is created.")

	log"value ${MDC(QUERY_CACHE, value)}.")
	log"value ${MDC(QUERY_CACHE_VALUE, value)}.")

[SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework #46022

[SPARK-47594] Connector module: Migrate logInfo with variables to structured logging framework #46022

Conversation

panbingkun commented Apr 12, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panbingkun commented Apr 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gengliangwang commented Apr 15, 2024

panbingkun commented Apr 16, 2024 • edited Loading

gengliangwang commented Apr 16, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

panbingkun commented Apr 12, 2024 •

edited

Loading

panbingkun commented Apr 16, 2024 •

edited

Loading