Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul wal for create workload #534

Merged
merged 6 commits into from
Jan 29, 2022

Conversation

jschwinger233
Copy link
Member

No description provided.

@jschwinger233
Copy link
Member Author

测试的时候发现有比较大的疏漏, 挑出其中一个来说, workload 可能的状态表:

实例状态 数据状态
存在 不存在
不存在 存在
不存在 不存在
存在, 但没 start 存在
存在, started 存在

仔细分析代码可以发现上面五种状态都有可能产生, 完全取决于 crash 的时机;
其中第三和第五状态是不需处理的状态, 直接 skip;
另外三种状态里在处理的时候比较复杂, 因为取状态的时候还要判断是否有 runtime 异常导致的 error, 而且 recover 是个多步骤 API 调用, 一旦发生了部分成功还会发生状态转移, 比如从状态4变为状态1, 都是需要仔细处理的情况, 需要再改改.

@CMGS
Copy link
Contributor

CMGS commented Jan 19, 2022

1 怎么出来的。。。

@jschwinger233
Copy link
Member Author

1 怎么出来的。。。

回滚步骤里, 会先删数据, 再删实例, 如果在中间 crash 了就是状态 1 了. 不过我想到了一个简单点的做法.

@jschwinger233
Copy link
Member Author

jschwinger233 commented Jan 20, 2022

针对 create WAL 的问题:

  1. 把状态收敛了, 所有的状态都要删光 workload 数据和实例, 因为没有 commit WAL 意味着没有成功返回给 API 调用方, 就算保持了一致性(container started, meta exists, node resource allocated), 用户也认为没有成功, 所以应该删除.
  2. 改善了 engine.VirtualizationRemove 的语义, 如果 id 不存在不会返回 error, 而是通过新增的返回参数来指示删除的数目. 这样 WAL recover 的时候容易得多, 而且 calcium.RemoveWorkload 里的行为也能更加准确.
  3. WAL recover 的时候区分了 meta 存在和不存在的情况, 因为 calcium.RemoveWorkload 只能在 workload meta 存在的时候正常工作.

@jschwinger233 jschwinger233 changed the title [WIP] overhaul wal for create workload Overhaul wal for create workload Jan 21, 2022
@jschwinger233 jschwinger233 requested a review from anrs January 21, 2022 08:43
@@ -55,14 +55,33 @@ func (c *Calcium) doCreateWorkloads(ctx context.Context, opts *types.DeployOptio
defer func() {
cctx, cancel := context.WithTimeout(utils.InheritTracingInfo(ctx, context.TODO()), c.config.GlobalTimeout)
for nodename := range deployMap {
if e := c.store.DeleteProcessing(cctx, opts.GetProcessing(nodename)); e != nil {
processing := opts.GetProcessing(nodename)
if e := c.store.DeleteProcessing(cctx, processing); e != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err 统一

engine/engine.go Outdated
@@ -41,7 +41,7 @@ type API interface {
VirtualizationCopyTo(ctx context.Context, ID, target string, content []byte, uid, gid int, mode int64) error
VirtualizationStart(ctx context.Context, ID string) error
VirtualizationStop(ctx context.Context, ID string, gracefulTimeout time.Duration) error
VirtualizationRemove(ctx context.Context, ID string, volumes, force bool) error
VirtualizationRemove(ctx context.Context, ID string, volumes, force bool) (int, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

夹带私货了啊,其实我觉得吧你搞个常量定义可能直观些

@jschwinger233 jschwinger233 changed the title Overhaul wal for create workload [WIP] Overhaul wal for create workload Jan 27, 2022
@jschwinger233 jschwinger233 changed the title [WIP] Overhaul wal for create workload Overhaul wal for create workload Jan 27, 2022
@CMGS
Copy link
Contributor

CMGS commented Jan 28, 2022

LGTM

var processingCommits map[string]wal.Commit
defer func() {
for nodename := range processingCommits {
if err := processingCommits[nodename](); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make sure the processingCommits[nodename]is callable or nil while c.wal.Log returns an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加了一个判断

@anrs
Copy link
Contributor

anrs commented Jan 29, 2022

LGTM

@CMGS CMGS merged commit 10b0708 into projecteru2:master Jan 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants