-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance performance tool #710
Enhance performance tool #710
Conversation
@harryteng9527 這隻PR可否等一下,接下來要先把 |
學長好,目前執行這個 PR 時有遇到一個問題 : 我目前想到回傳 sticky partition 數量的方法都蠻醜的,例如 : 從 main function 直接把一個儲存sticky partition數量的物件丟下去給 consumer,同時把這個物件也丟給 tracker 跟 file writer。 想問學長,是不是用 |
請問一下這句話是什麼意思?如果是各個consumer 所訂閱的 partitions 數量,那可以直接從 metrics 取得 |
想紀錄 consumer 在 re-balance 之前跟之後 sticky 住的 partition 數量 |
所以是要記錄“明確”的 partitions 資訊?例如 topica-0, topicc-22 之類的,而不是單純只有“數量"? |
目前只想紀錄 |
如果是觀察需要的話,可以在 ConsumerThread 內部放置一個 global static member,例如 |
common/src/main/java/org/astraea/common/consumer/TopicsBuilder.java
Outdated
Show resolved
Hide resolved
@harryteng9527 麻煩看一下 #796 ,用一個簡單的手法保存必要的資訊 |
好的,謝謝學長的 review ,我在去看一下 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harryteng9527 感謝更新,有一些想法請看一下
common/src/main/java/org/astraea/common/consumer/TopicsBuilder.java
Outdated
Show resolved
Hide resolved
common/src/main/java/org/astraea/common/consumer/TopicsBuilder.java
Outdated
Show resolved
Hide resolved
app/src/main/java/org/astraea/app/performance/TrackerThread.java
Outdated
Show resolved
Hide resolved
@@ -141,9 +142,13 @@ boolean tryToPrint(Duration duration) { | |||
for (var i = 0; i < reports.size(); ++i) { | |||
var report = reports.get(i); | |||
var ms = metrics.stream().filter(m -> m.clientId().equals(report.clientId())).findFirst(); | |||
var clientId = report.clientId() == null ? "forTest" : report.clientId(); | |||
var stickyNumber = | |||
ConsumerThread.CLIENT_ID_STICKY_PARTITIONS.getOrDefault(clientId, Set.of()).size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
所以這邊不是追蹤有哪些partitions正在消費中?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
這邊是用 client id 來找 sticky 的 partition 數量。
這邊不是追蹤有哪些partitions正在消費中?
追蹤每個 consumer 消費哪些 partitions 的是另外一個 Map : CLIENT_ID_PARTITIONS
Utils.sleep(Duration.ofSeconds(1)); | ||
continue; | ||
} | ||
if (!assignments.containsAll(CLIENT_ID_PARTITIONS.get(clientId)) | ||
|| generationId < consumer.generationId()) { | ||
// check whether re-balance has occurred or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
這個動機我想再討論一下,如果是為了確定“re-balance"有沒有發生,透過listener
是否比較方便?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
透過listener是否比較方便?
對,我本來是在 listener 紀錄 sticky 的 partitions 數量,這樣方便去知道哪時候 re-balance
但後來我改成在 consumer 裡計算,因為
- 想說 Consumer 都會紀錄 assigned partition 了,就把紀錄 sticky partitions 的工作拿出來給 Consumer 了
- 在 listener 中紀錄 sticky partitions 的話,邏輯可能會複雜一點,因為要先判斷三種狀況,再去計算 sticky partitions ,三種狀況如下:
- consumer leave group => 只會呼叫
onPartitionsRevoked
- consumer join group => 只會呼叫
onPartitionAssigned
- consumer 單純 re-balance => 會呼叫
onPartitionAssigned
、onPartitionsRevoked
- consumer leave group => 只會呼叫
為什麼要判斷 三種狀況 是因為 consumer 會因為 leave、join、單純 re-balance 而決定怎麼呼叫 listener 的 callback function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
對,我本來是在 listener 紀錄 sticky 的 partitions 數量,這樣方便去知道哪時候 re-balance
這應該是要抓 re-balance 的起始時間和結束時間嗎?然後去推算整個re-balance的耗費時間嗎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前沒有要計算 re-balance 的時間,因為我看 tracker 裏面已經有 beanObject
在撈 re-balance 時間了,所以就沒有要做計算詳細 re-balance 時間的功能
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前沒有要計算 re-balance 的時間,因為我看 tracker 裏面已經有 beanObject 在撈 re-balance 時間了,所以就沒有要做計算詳細 re-balance 時間的功能
那樣的話這次新增的收集 partitions 的目的是什麼?我的直覺目的有兩種:
- 知道現在各個 consumer 在處理哪些 partitions,但這個已經有收集了
- 知道 re-balance 的時間,但這可以從 metrics 拿到
所以是不是有第三種用法?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果使用 pattern or partitions 動態增加的話呢?
可是 consumer sticky 的 partition 不是應該是 上一個 generation 被 assign 的 partitions
與 下一個 generation 被 assign 的 partitions
做交集嗎?
就算下次被分配較多的 partitions ,也是第一次被分配
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我覺得重點應該要放在 "效率變糟“ 和 "subscribed partitions的變化",兩件事情上,不用特別執著在 sticky partitions,也就是說我們要講的是"如果 assignor 演算法會導致大量的 partitions 訂閱變化,就會影響效率"
所以其實只要關心上下兩次分配 partition 的數量差異就好嗎? 而不需要在意 consumer sticky 多少 partition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
而不需要在意 consumer sticky 多少 partition
是的,或是可否分享你想透過這個數據表達什麼事情?搞不好有什麼東西我忽略掉了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可否分享你想透過這個數據表達什麼事情?
當初是想紀錄 assigned partitions shuffle 的程度而已
那現在只要看上下兩次分配 partitions 的差異,那我就不紀錄 sticky partition 的數量
我再修改一下這隻 PR ,只新增 consumer 支援 pattern 訂閱就好
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
麻煩也看一下我前面提到的 partitions 變化率可否做到?就是我們想觀察 partitions 訂閱的變化程度是否會明確影響效能
@harryteng9527 請問狀況如何? |
這隻 PR 本來要關掉,分開來做:
目前 partition 變化率已經做好了 - #958 |
@harryteng9527 那麻煩你視情況關掉此PR |
@chia7712 好的 不好意思 |
#689 (comment)
此 PR 為 performance tool 新增以下功能 :