Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate selfmon to eru-core #528

Merged
merged 2 commits into from
Jan 5, 2022

Conversation

DuodenumL
Copy link
Contributor

做了以下事情:

  1. 把selfmon迁移到core里了,保留了之前分布式锁的设计。
  2. node.Available不再从node metadata里读取,而是通过有无node status来动态判断。
  3. SetNode的时候如果!node.Available,不再删除node status。

@DuodenumL DuodenumL force-pushed the feature/selfmon branch 2 times, most recently from 1100709 to 1b77385 Compare December 30, 2021 09:44
@jschwinger233
Copy link
Member

怎么上线? 先上 core, 然后把旧的 selfmon 都关掉, 然后再上 agent, 就算上 core 和关闭 selfmon agent 之间有一点小间隔应该也没事.

@DuodenumL
Copy link
Contributor Author

怎么上线? 先上 core, 然后把旧的 selfmon 都关掉, 然后再上 agent, 就算上 core 和关闭 selfmon agent 之间有一点小间隔应该也没事.

这个顺序应该没问题

@DuodenumL DuodenumL force-pushed the feature/selfmon branch 2 times, most recently from 8734c78 to c3dc08c Compare December 31, 2021 05:19
store/etcdv3/node.go Outdated Show resolved Hide resolved
log.Errorf(ctx, "[doGetNodes] failed to get node status of %v, err: %v", node.Name, err)
}
node.Available = err == nil

if (!node.IsDown() || all) && utils.FilterWorkload(node.Labels, labels) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsDown 的名字要不改改, 里面的判断其实包含了 bypass, 所以返回 true 的时候并不一定是 down

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsDown 还改吗

cluster/calcium/selfmon.go Show resolved Hide resolved
cluster/calcium/selfmon.go Show resolved Hide resolved
cluster/calcium/selfmon.go Show resolved Hide resolved
cluster/calcium/selfmon.go Outdated Show resolved Hide resolved
cluster/calcium/selfmon.go Show resolved Hide resolved
cluster/calcium/selfmon.go Outdated Show resolved Hide resolved
for _, n := range nodes {
node := n
pool.Go(ctx, func() {
_, err := m.GetNodeStatus(ctx, node.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

。。。这样就别整2行了

logger.Errorf(ctx, "[ExecuteWorkload] Failed to get workload: %+v", err)
return
}
if workload.Engine == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个实在是太危险了, 过两个月恐怕自己都要忘记, 你想做成 Engine() error 这种 getter 吗, 然后把 engine 改为 internal member. 要是不想改就这样也行.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

暂时先放成这样吧,先把这个功能上上去。要做成getter这种需要比较大的改动,因为node里没有store,得想办法把创建engine这个事儿挪到外面才行。

目前的想法是getNode的时候把ca / cert / key一并取了,应该是可以实现的。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我说的是

func (w *Workload) Engine() (engine.API, error) {
  if w.engine == nil { return Err }
  return w.engine
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

比如 calcium/helper.go 里的 node.Engine 就忘了做判断, 我看了一下那里的调用栈, 完全是可能产生 panic 的, 用 getter 就可以杜绝.

@CMGS
Copy link
Contributor

CMGS commented Jan 4, 2022

41 files changed……

@jschwinger233
Copy link
Member

自测的时候发现了一些问题, 经过反复讨论决定继续改善一下 cache, 把 engine cache 封装成一个自检活的 client 连接池, 这样可以无脑取 cache 来用而不必再检活.

Copy link
Member

@jschwinger233 jschwinger233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

虽然感觉还有一些历史遗留没有捋顺 但是这次这样应该可以 只要别有什么大bug..

@jschwinger233
Copy link
Member

那就合并啦

@jschwinger233 jschwinger233 merged commit dfef999 into projecteru2:master Jan 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants