Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

Land experimental "newstore" as formal feature #2239

Merged
merged 11 commits into from
Jan 14, 2020

Conversation

WeiZhang555
Copy link
Member

Fixes #803

The "newstore" feature has had been a "experimental" feature for long time, it aims to re-organize the persistent data on disk and merging separate files into one "persist.json", and also better guarantee the compatibility.

I have been working on this for a long time and preparing its landing before Kata 2.0, and now I think it's the time. Let's land it as a formal feature!

I'm not sure when is the exact date of release 2.0, but since I can't put enough energy on this work, I hope to merge it ASAP in case nobody else would like to pick up.

You can see I removed lots of code, the "newstore" will make code clearer and easier to read, I hope everyone likes it. :-)

Signed-off-by: Wei Zhang [email protected]

@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555
Copy link
Member Author

Depends on: kata-containers/tests#2113

CI failed since I remove "/var/lib/vc" dir but test cases always check this dir for active pod info.

"/var/lib/vc" isn't necessary, we can put all persist informations in "/var/run". Suppose the server meets a reboot, all qemu and kata processes will be dead, but garbage information will be left in "/var/lib/vc". Since "/run/vc" is tmpfs, it will be automatically cleaned up after reboot.

@WeiZhang555
Copy link
Member Author

Ping @kata-containers/runtime for more eyes. Please help me merge the test PR first 😄

@WeiZhang555

This comment has been minimized.

@codecov

This comment has been minimized.

@WeiZhang555

This comment has been minimized.

@WeiZhang555 WeiZhang555 force-pushed the persist-storage branch 2 times, most recently from e153e7c to 98edf03 Compare November 24, 2019 15:41
@WeiZhang555

This comment has been minimized.

@WeiZhang555 WeiZhang555 changed the title Land experimental "newstore" as formal feature [WIP] Land experimental "newstore" as formal feature Nov 25, 2019
@WeiZhang555

This comment has been minimized.

2 similar comments
@WeiZhang555

This comment has been minimized.

@WeiZhang555

This comment has been minimized.

@WeiZhang555

This comment has been minimized.

@WeiZhang555

This comment has been minimized.

@devimc
Copy link

devimc commented Nov 26, 2019

@bergwolf @lifupan @WeiZhang555 @egernst @chavafg this PR breaks backward compatibility, I ran a container with master, then applied this patch and tried to stop the container, but I got the following error:

Error response from daemon: cannot stop container: 8be8ef9e3058: Cannot kill container 8be8ef9e3058228dda9ace31f0ae1942d1c72b7eb400aed484bd18ffa8e61461: OCI runtime state failed: open /run/vc/sbs/8be8ef9e3058228dda9ace31f0ae1942d1c72b7eb400aed484bd18ffa8e61461/persist.json: no such file or directory: unknown

I'm working in a test to check backward compatibility in the CI

@egernst egernst self-assigned this Nov 26, 2019
@WeiZhang555
Copy link
Member Author

@devimc Yes, this will break backward compatibility, AS expected. This PR will finally remove all VCStore and replace it with experimental "newstore" feature which changed persistent data structure totally.

And I'm happy you're adding the test, we should add backward compatibility test cases from long time ago.

@WeiZhang555

This comment has been minimized.

@WeiZhang555

This comment has been minimized.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see code being removed! 😄

First pass review.

virtcontainers/acrn.go Outdated Show resolved Hide resolved
virtcontainers/acrn.go Show resolved Hide resolved
virtcontainers/acrn.go Show resolved Hide resolved
virtcontainers/acrn.go Outdated Show resolved Hide resolved
virtcontainers/acrn.go Outdated Show resolved Hide resolved
virtcontainers/acrn.go Outdated Show resolved Hide resolved
virtcontainers/acrn.go Outdated Show resolved Hide resolved
virtcontainers/persist.go Outdated Show resolved Hide resolved
virtcontainers/persist/fs/fs.go Show resolved Hide resolved
virtcontainers/sandbox.go Outdated Show resolved Hide resolved
@WeiZhang555
Copy link
Member Author

WeiZhang555 commented Jan 8, 2020

@kata-containers/architecture-committee @egernst @gnawux @bergwolf

Add one commit 2270c13 for fixing the backward compatibility with some redundant codes, now it should be safe to move from old store to new store, it won't break any running workloads any more.

@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555
Copy link
Member Author

/test

Keep old store restore functions for keeping backward compatibility, if
old store files are found from disk, restore them with old store first.

Signed-off-by: Wei Zhang <[email protected]>
@WeiZhang555
Copy link
Member Author

/test

@devimc
Copy link

devimc commented Jan 10, 2020

@WeiZhang555 compatibility test is ready kata-containers/tests#2135
I tested this PR manually and it's not backward compatible,
do we really want this PR backward compatible? next release can be 2.0

@WeiZhang555
Copy link
Member Author

WeiZhang555 commented Jan 10, 2020

@devimc It's not? I tried in my local machine, new kata-runtime can clean old containers started with old kata-runtime. Are you testing this PR based on latest (With commit
290339d)?

My opinion is the new store is added for better keep backward compatibility, because it's so easy to break with old store, so the earlier we merge this, the more benefit we can enjoy.

I did a lots of cleanup work to remove old storage, but if we want to keep the compatibility, we have to add them back, this could make the code base kind of messy and hard to maintain.

@WeiZhang555
Copy link
Member Author

@devimc I can see there is foreward compatibility test in your PR, it's good but I have no idea how to pass it, because that means you need to let old kata-runtime recognize new persist storage structure. I believe it will fail for this one at least.

One solution I can find is that we add the feature in one release but not enable it, then enable it in the following release...But I hope we can get this in earlier :-(

@egernst
Copy link
Member

egernst commented Jan 10, 2020

Forward compat. is a bit harder to handle, and I'm not sure in a production environment when you'd realistically hit this. Based on your local tests, I'm happy with this @WeiZhang555, but I want to wait on an ACK from @devimc (Julio - thanks for putting these tests in).

@devimc
Copy link

devimc commented Jan 10, 2020

@WeiZhang555 @egernst I'm using the latest version of this PR

INFO: Running backward compatibility test
+ runtime_path=/usr/local/bin/kata-runtime
+ master_runtime_path=/tmp/kata/master/bin/kata-runtime
+ runtime_backup_path=/usr/local/bin/kata-runtime.backup
+ sudo cp /usr/local/bin/kata-runtime /usr/local/bin/kata-runtime.backup
+ cont_name=backward_test
+ docker run -d --name backward_test --runtime=kata-runtime busybox tail -f /dev/null
Unable to find image 'busybox:latest' locally
latest: Pulling from library/busybox
bdbbaa22dec6: Pull complete 
Digest: sha256:6915be4043561d64e0ab0f8f098dc2ac48e077fe23f488ac24b665166898115a
Status: Downloaded newer image for busybox:latest
e01741b12998f9088e6cebae7d4e266023f5f76bc4f2ad9e54b471c6f16d9754
+ sudo cp -a /tmp/kata/master/bin/kata-runtime /usr/local/bin/kata-runtime
+ sync
+ docker exec backward_test true
OCI runtime state failed: open /run/vc/sbs/e01741b12998f9088e6cebae7d4e266023f5f76bc4f2ad9e54b471c6f16d9754/lock: no such file or directory: unknown
++ handle_error 75
++ local exit_code=126
++ local line_number=75
++ echo 'Failed at 75: docker exec "${cont_name}" true'
Failed at 75: docker exec "${cont_name}" true

@devimc
Copy link

devimc commented Jan 10, 2020

/test-ubuntu

@WeiZhang555
Copy link
Member Author

WeiZhang555 commented Jan 13, 2020

@devimc I think you're testing the backward compatibility in wrong direction(correct me if I'm wrong).

The backward compatibility testing process should be:

  1. start container with "master" kata-runtime
  2. replace kata-runtime from PR
  3. docker exec and docker rm the container with PR kata-runtime to check if it breaks master.

I think your "backward compatibility" test is actually "forward compatibility" and vice versa.

I modified the test case according to my environment and run it locally, the script:

#!/bin/bash

test_backward_compatibility() {
	echo "Running backward compatibility test"

	runtime_path="$1"
	master_runtime_path="$2"

	# start container with old runtime
	cp $runtime_path kata-runtime && sync

	# run a container with the current runtime
	cont_name="backward_test"
	docker run -d --name "${cont_name}" --net none --runtime=kata busybox tail -f /dev/null

	# debug info
	# print kata-runtime version
	kata-runtime version

	# print dir content
	sudo tree /run/vc/sbs/

	# switch to master runtime
	cp $master_runtime_path kata-runtime && sync
	# print kata-runtime version
	kata-runtime version

	# exec
	docker exec "${cont_name}" true

	# stop and remove container
	docker rm -f "${cont_name}"

	# start new container  and check dir content
	docker run -d --name "${cont_name}" --net none --runtime=kata busybox tail -f /dev/null
	sudo tree /run/vc/sbs/
	docker rm -f "${cont_name}"
}

test_backward_compatibility ./kata-runtime-old ./kata-runtime-new

result:

Running backward compatibility test
8a3260687b5b3ddf032d0bd92e3ada321eefaeb0c3e563ca57217e1aff6cf2cd
kata-runtime  : 1.10.0-rc0
   commit   : 3ea3d3201b3a0b6fc35dea48192e73a021b4734b
   OCI specs: 1.0.1-dev
/run/vc/sbs/
└── 8a3260687b5b3ddf032d0bd92e3ada321eefaeb0c3e563ca57217e1aff6cf2cd
    ├── 8a3260687b5b3ddf032d0bd92e3ada321eefaeb0c3e563ca57217e1aff6cf2cd
    │   ├── devices.json
    │   ├── lock
    │   ├── mounts.json
    │   ├── process.json
    │   ├── raw
    │   └── state.json
    ├── agent.json
    ├── devices.json
    ├── hypervisor.json
    ├── lock
    ├── network.json
    ├── proxy.sock
    ├── raw
    └── state.json

4 directories, 12 files
kata-runtime  : 1.10.0-rc0
   commit   : 290339da6b9d094a8c73e1252ccafa42ca75c6b1
   OCI specs: 1.0.1-dev
backward_test
2e5df935b811a68313791814863eb777f79119836251b6d2dd8ff52c3775c8a9
/run/vc/sbs/
└── 2e5df935b811a68313791814863eb777f79119836251b6d2dd8ff52c3775c8a9
    ├── 2e5df935b811a68313791814863eb777f79119836251b6d2dd8ff52c3775c8a9
    │   └── persist.json
    ├── persist.json
    └── proxy.sock

2 directories, 3 files
backward_test

PR kata-runtime 290339da6b9d094a8c73e1252ccafa42ca75c6b1 is backward compatible to master 3ea3d3201b3a0b6fc35dea48192e73a021b4734b

@WeiZhang555
Copy link
Member Author

/test-fc

@devimc
Copy link

devimc commented Jan 13, 2020

@WeiZhang555

I think your "backward compatibility" test is actually "forward compatibility" and vice versa.

yes, but what about forward compatibility? should kata support this? what happen when users run containers with latest kata but they rollback because something went wrong?

@egernst
Copy link
Member

egernst commented Jan 13, 2020

If its passing backward, i'm okay with this. Handling the forward upgrade is a very niche problem, and if we're fighting that, we're in a more serious issue imo.

If it passes backward, let's plan on merging this. @devimc can we update the test for just backward compat (at least for now)?

@devimc
Copy link

devimc commented Jan 13, 2020

/test-ubuntu

@devimc
Copy link

devimc commented Jan 13, 2020

@WeiZhang555 @egernst test fixed, ubuntu CI is passing but FC and initrd CIs are failing

@WeiZhang555
Copy link
Member Author

WeiZhang555 commented Jan 14, 2020

@devimc I think initrd and firecracker CI are unstable, for FC test, it fails quite randomly, besides I don't have a complete env for testing and debugging this.

Is there anyone can help?

@Pennyzct Any idea on FC test failure? Found similar error log from your PR: #2379 (comment)

@WeiZhang555
Copy link
Member Author

WeiZhang555 commented Jan 14, 2020

/test-fc

FC CI passed with nothing modified, randomly 😓

@bergwolf
Copy link
Member

Nice work @WeiZhang555 !

@bergwolf bergwolf merged commit c3629d3 into kata-containers:master Jan 14, 2020
@WeiZhang555
Copy link
Member Author

Wow 🎆 👏

Thanks everyone for help review!

@WeiZhang555 WeiZhang555 deleted the persist-storage branch January 14, 2020 05:28
@egernst
Copy link
Member

egernst commented Jan 14, 2020

WEEEEEE IT MADE IT!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] baseline persist data for live-upgrade
6 participants