-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix duplicate counted metrics like op time for GpuCoalesceBatches #11062
fix duplicate counted metrics like op time for GpuCoalesceBatches #11062
Conversation
… dup Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuExec.scala
Outdated
Show resolved
Hide resolved
@@ -27,31 +29,75 @@ object NvtxWithMetrics { | |||
} | |||
} | |||
|
|||
object ThreadLocalMetrics { | |||
val addressOrdering: Ordering[GpuMetric] = Ordering.by(System.identityHashCode(_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler to have each GpuMetric track itself? This feels like we are adding in a lot of overhead with the ThreadLocal values.
sealed abstract class GpuMetric extends Serializable {
def value: Long
def set(v: Long): Unit
def +=(v: Long): Unit
def add(v: Long): Unit
private var isTimerActive = false
final def tryActivateTimer(): Bool = {
if (!isTimerActive) {
isTimerActive = true
true
} else {
// output a warning if this is not NoopMetric
false
}
}
final def deactivateTimer(duration: Long): Unit = {
if (isActive) {
isActive = false
add(duration)
}
}
final def ns[T](f: => T): T = {
if (tryActivateTimer()) {
val start = System.nanoTime()
try {
f
} finally {
deactivateTimer(System.nanoTime() - start)
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add that this is mostly about code complexity. I don't think the performance difference will be significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Bobby, refined the PR accordingly. Threadlocal removed
Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
build |
* with call site print, not good because some test cases by design will dup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * done Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * add file Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix comiple Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * address review comments Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
* with call site print, not good because some test cases by design will dup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * done Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * add file Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix comiple Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * address review comments Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
…IDIA#11062) * with call site print, not good because some test cases by design will dup Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * done Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * add file Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * fix comiple Signed-off-by: Hongbin Ma (Mahone) <[email protected]> * address review comments Signed-off-by: Hongbin Ma (Mahone) <[email protected]> --------- Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
We observed that
op time
metrics for GpuCoalesceBatches is often larger than expected.After some digging I found that in some cases GpuMetrics will be duplicately counted. (a test case example is shown in the PR)
I tried to to fix the duplicate issue GpuCoalesceBatches, but I have no idea how many other duplicates are there for other operators. So I refined NvtxWithMetrics, MetricRange and GpuMetric#ns a little bit to avoid duplicate counting elapsed time.
closes #11063