Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor wal stage 1 #561

Merged
merged 3 commits into from
Mar 24, 2022
Merged

refactor wal stage 1 #561

merged 3 commits into from
Mar 24, 2022

Conversation

CMGS
Copy link
Contributor

@CMGS CMGS commented Mar 11, 2022

No description provided.

wal2/hydro_test.go Outdated Show resolved Hide resolved
@anrs
Copy link
Contributor

anrs commented Mar 11, 2022

LGTM

@CMGS CMGS changed the title [WIP] refactor wal2 refactor wal stage 1 Mar 11, 2022
}
}

// Event .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

打错了...

@@ -329,3 +319,7 @@ func (h *ProcessingCreatedHandler) Handle(ctx context.Context, raw interface{})
logger.Infof(ctx, "obsolete processing deleted")
return
}

func getReplayContext(ctx context.Context) (context.Context, context.CancelFunc) {
return context.WithTimeout(ctx, time.Second*32) // TODO why 32?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx这块确实有点乱了,以后要花点精力重整一下...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这种我都已经……弃疗了,你整理下吧

@DuodenumL
Copy link
Contributor

跟这个PR无关的一个疑问:Calcium.RemoveWorkload里有个step参数,是干啥用的...看上去没有用

@DuodenumL
Copy link
Contributor

跟这个PR无关的一些其它疑问:

  1. Calcium.doRemoveWorkloadSync里没有处理删除失败的情况
  2. cluster/calcium/remove.go里L48似乎应该是if err = c.doRemoveWorkload(ctx, workload, force); err == nil?感觉写反了。
  3. CreateWorkloadHandler.Handle里的部分逻辑看起来跟doRemoveWorkloadSync是重合的,应该不需要重新写一下。
  4. CreateWorkloadHandler.Handle这个方法里L182有一句log: logger.Infof(ctx, "workload with meta removed"),但是似乎这里并不能保证workload meta被删除了。(而且日志里也应该有workload ID会更好些)

@CMGS CMGS changed the title refactor wal stage 1 [WIP] refactor wal stage 1 Mar 13, 2022
@jschwinger233
Copy link
Member

跟这个PR无关的一个疑问:Calcium.RemoveWorkload里有个step参数,是干啥用的...看上去没有用

曾经的语义是批处理的粒度, 防止对 docker 的并发请求太大挂掉, 不过后来改成了先按照 node 聚类并且引入了 goroutine pool 限制并发就没有使用这个参数了.

@jschwinger233
Copy link
Member

而且日志里也应该有workload ID会更好些

logger := log.WithField("WAL.Handle", "CreateWorkload").WithField("ID", wrk.ID).WithField("nodename", wrk.Nodename) 保证了 logger.Infof 里有 ID

@CMGS
Copy link
Contributor Author

CMGS commented Mar 14, 2022

Calcium.doRemoveWorkloadSync 里没有处理删除失败的情况

这里只能根据 Success 来判断是否 remove 成功,感觉不太好处理

@CMGS
Copy link
Contributor Author

CMGS commented Mar 14, 2022

cluster/calcium/remove.go里L48似乎应该是if err = c.doRemoveWorkload(ctx, workload, force); err == nil?感觉写反了。

对的

@CMGS
Copy link
Contributor Author

CMGS commented Mar 14, 2022

我改了4点钟的一部分 @DuodenumL

@CMGS CMGS changed the title [WIP] refactor wal stage 1 refactor wal stage 1 Mar 14, 2022
@DuodenumL
Copy link
Contributor

发现现在wal的覆盖范围不是很完整,例如remove / dissociate / realloc里都可能会导致资源状态不一致。所以应该把eventWorkloadResourceAllocated类型改成eventWorkloadResourceChanged,然后加到所有这些地方去。

@DuodenumL
Copy link
Contributor

今天发现wal还有一个问题:create里的wal commit是在defer里面做的,假设下面的代码panic了,defer依然会被执行,导致wal commit也被执行了,下次重启的时候不会做相关的处理。

目前想到的解决方法就是在defer里加一个判断,类似于这样,可以防止在panic的时候提交wal:

defer func() {
        if err := recover(); err != nil {
                panic(err)
        }
	if resourceCommit != nil {
		if err := resourceCommit(); err != nil {
			logger.Errorf(ctx, "commit wal failed: %s, %+v", eventWorkloadResourceAllocated, err)
		} else {
			logger.Errorf(ctx, "commit wal success: %s", eventWorkloadResourceAllocated)
		}
	}
}()

不过这样做虽然work,但是看起来很丑,不知道有没有优雅的做法。

@CMGS
Copy link
Contributor Author

CMGS commented Mar 23, 2022

今天发现wal还有一个问题:create里的wal commit是在defer里面做的,假设下面的代码panic了,defer依然会被执行,导致wal commit也被执行了,下次重启的时候不会做相关的处理。

目前想到的解决方法就是在defer里加一个判断,类似于这样,可以防止在panic的时候提交wal:

defer func() {
        if err := recover(); err != nil {
                panic(err)
        }
	if resourceCommit != nil {
		if err := resourceCommit(); err != nil {
			logger.Errorf(ctx, "commit wal failed: %s, %+v", eventWorkloadResourceAllocated, err)
		} else {
			logger.Errorf(ctx, "commit wal success: %s", eventWorkloadResourceAllocated)
		}
	}
}()

不过这样做虽然work,但是看起来很丑,不知道有没有优雅的做法。

想了下,目前只能这样写……

@CMGS
Copy link
Contributor Author

CMGS commented Mar 24, 2022

我先合了,后续 wal 改动再搞新 PR

@CMGS CMGS merged commit 43d7381 into master Mar 24, 2022
@DuodenumL
Copy link
Contributor

我先合了,后续 wal 改动再搞新 PR

嗯嗯

@CMGS CMGS deleted the wal2 branch March 24, 2022 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants