Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

persist: baseline persist data format #883

Merged

Conversation

WeiZhang555
Copy link
Member

@WeiZhang555 WeiZhang555 commented Nov 5, 2018

Fixes #803

This demonstrate how to use new virtcontainers/persist/api/ package brought by #874

This PR is only for demo, please don't merge it.

This is ready for review and merge now :-)

  • state.json is removed, contents moved to persist.json
  • remove network.json
  • remove agent.json
  • remove devices.json
  • remove hypervisor.json
  • remove mounts.json
  • remove process.json

@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch 3 times, most recently from c2c472c to 06728ae Compare November 6, 2018 02:58
@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch from 06728ae to 5f57ee5 Compare November 26, 2018 01:34
@raravena80
Copy link
Member

@WeiZhang555 any updates?

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch 3 times, most recently from 209428c to 80827f9 Compare January 7, 2019 03:32
@WeiZhang555

This comment has been minimized.

1 similar comment
@WeiZhang555

This comment has been minimized.

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch 2 times, most recently from df17755 to e58557b Compare January 7, 2019 14:34
@WeiZhang555

This comment has been minimized.

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch from e58557b to f47580b Compare January 8, 2019 10:59
@WeiZhang555

This comment has been minimized.

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @WeiZhang555 - This is interesting! A few initial comments (more tomorrow probably).

A couple of other thoughts:

  • We'll need super-careful tests for this.

    If we do use CurPersistVersion, we need set of valid and invalid state files for every value of CurPersistVersion so that we can assert the expected behaviour in unit tests.

  • Features like this are hard to test

    Hence, it would be very useful to be able to request the runtime dump the state at any time (to the journal or an arbitrary file) either by:

    • adding a new sub-command (kata-runtime kata-state show or something).
    • adding a SIGUSR2 signal handler (killall -USR2 kata-runtime).

virtcontainers/persist/fs/fs.go Show resolved Hide resolved
)

// PersistVersion set persist data version to current version in runtime
func (s *Sandbox) PersistVersion() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names of these functions suggest that as soon as you call them, the data will be persisted to disk. But that doesn't appear to be what is happening - persistence only occurs when Dump() is called. This is potentially misleading.

How about:

  • Renaming Dump() to Save() (clearer as Dump almost suggests some sort of debug operation to me).
  • Renaming Persist*() to AddToTransaction() or something like that (AddToTrn()?) as that makes it clear this isn't actually saving any data (yet).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice suggestion! I'll rename the functions, thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reworked the whole PR, and renamed PersistVersion to versionSaveCallback, and Dump to ToDisk. Not sure if this is clear enough for users.
Anyway, comments addressed

virtcontainers/sandbox.go Outdated Show resolved Hide resolved
virtcontainers/sandbox.go Outdated Show resolved Hide resolved
virtcontainers/persist/fs/fs.go Outdated Show resolved Hide resolved

fs.containerState[cid] = cstate
}
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate this is a demo only, but:

  • There is no validation performed on the decoded values.
  • There is no check on whether CurPersistVersion looks reasonable (what if it was set to 99 in the on-disk file)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CurPersistVersion is hardcoded, so it will always be valid. Restoring from disk can get a 99 version number, I can add some validation during restoring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure currently. Let's keep it as it is for now :-)

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments.

Ping @kata-containers/runtime - this is a big PR which needs more eyes on it!

You can start by looking just at virtcontainers/persist/api/ (or just heading over to the corresponding #874).

virtcontainers/persist/manager.go Outdated Show resolved Hide resolved
virtcontainers/persist/manager.go Outdated Show resolved Hide resolved
virtcontainers/persist/api/interface.go Outdated Show resolved Hide resolved
@WeiZhang555
Copy link
Member Author

@jodh-intel Thanks for reviewing!

Recently I was busy on other stuffs, the PR can work but test cases would need some fixes. I will try to rebase and rework to make it work soon.

@raravena80
Copy link
Member

@WeiZhang555 ping, any updates? Thx

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch from a373ba2 to 122fe74 Compare February 8, 2019 14:52
@WeiZhang555 WeiZhang555 requested a review from a team as a code owner February 8, 2019 14:52
@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch 3 times, most recently from 78aad02 to 3a4a50f Compare February 8, 2019 18:22
Save and restore state from persist.json instead of state.json

Signed-off-by: Wei Zhang <[email protected]>
Address some comments:
* fix persist driver func names for better understanding
* modify some logic, add some returned error etc

Signed-off-by: Wei Zhang <[email protected]>
Set new persist storage driver "virtcontainers/persist/" as "experimental"
feature.
One day when this can fully work and we're ready to move to 2.0, we'll move
it from "experimental" feature to formal feature.
At that time, the "virtcontainers/filesystem_resource_storage.go" can be removed
completely.

Signed-off-by: Wei Zhang <[email protected]>
For experimental features, state.json won't be updated, so modify some
unit test to skip.

Signed-off-by: Wei Zhang <[email protected]>
add more unit tests.

Signed-off-by: Wei Zhang <[email protected]>
Address review comments

Signed-off-by: Wei Zhang <[email protected]>
@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch from 5ae8dd5 to 5c384f3 Compare April 19, 2019 07:38
* Fix potential panic by nil pointer.
* Address comments.

Signed-off-by: Wei Zhang <[email protected]>
@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch from 5c384f3 to 40bc2ca Compare April 19, 2019 08:05
@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555
Copy link
Member Author

All tests are green now 🍾 🎆

image

The last commit is for testing the experimental new store driver, which must be deprecated before merging. I'll remove the last commit for testing compatibility with original store

Last commit:

diff --git a/Makefile b/Makefile
index 072aca5..13c81fa 100644
--- a/Makefile
+++ b/Makefile
@@ -158,7 +158,7 @@ DEFMEMSLOTS := 10
 DEFBRIDGES := 1
 DEFDISABLEGUESTSECCOMP := true
 #Default experimental features enabled
-DEFAULTEXPFEATURES := []
+DEFAULTEXPFEATURES := "[\"newstore\"]"

@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch from 40bc2ca to 3262da0 Compare April 19, 2019 14:05
@WeiZhang555
Copy link
Member Author

/test

Copy link

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WeiZhang555 The PR looks pretty good (two comments in the code), but I have two concerns:

  • You use the same store wording that is already part of virtcontainers to store sandbox and containers data already with the store API located at virtcontainers/store.
  • This PR duplicates the way to store data, which should rely on the virtcontainers/store code instead. Is it something that you're planning fix as follow up PR?

errContainerPersistNotExist = errors.New("container doesn't exist in persist data")
)

func (s *Sandbox) dumpState(ss *persistapi.SandboxState, cs map[string]persistapi.ContainerState) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global comment for this file:
Having new Sandbox methods out of the sandbox.go file sounds pretty confusing to me.

Copy link
Member Author

@WeiZhang555 WeiZhang555 Apr 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move them to sandbox.go, I am worried that sandbox.go is already so large, the single file has over 1800 lines, that's why I put the store part to one single file.

I can merge this file into sandbox.go in a following PR if you insist .

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok course you can do that in a follow up PR :)
But I'd like to have input from @bergwolf @amshinde on this too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @bergwolf @amshinde again, let's finish the discussion :)

virtcontainers/persist/manager.go Outdated Show resolved Hide resolved
@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555
Copy link
Member Author

@sboeuf

You use the same store wording that is already part of virtcontainers to store sandbox and containers data already with the store API located at virtcontainers/store.
This PR duplicates the way to store data, which should rely on the virtcontainers/store code instead. Is it something that you're planning fix as follow up PR?

No, the PR is meant to replace virtcontainers/store, they store data in different ways so can't share interface and logics, the final goal is to remove virtcontainers/store totally and rename newStore to store in release 2.0

@sboeuf
Copy link

sboeuf commented Apr 20, 2019

@WeiZhang555

No, the PR is meant to replace virtcontainers/store, they store data in different ways so can't share interface and logics, the final goal is to remove virtcontainers/store totally and rename newStore to store in release 2.0

Ah that makes sense, thanks for the clarification. Make sure this is highly documented as this might create some confusion from a code base perspective.

@sboeuf
Copy link

sboeuf commented Apr 20, 2019

LGTM

Modify lisense header from 2018 to 2019.

Signed-off-by: Wei Zhang <[email protected]>
@WeiZhang555 WeiZhang555 force-pushed the rfc-persist-data-standard-wip branch from 5ce73fa to 989b373 Compare April 20, 2019 02:05
@WeiZhang555
Copy link
Member Author

/test

@gnawux gnawux merged commit b218229 into kata-containers:master Apr 20, 2019
@WeiZhang555
Copy link
Member Author

ping @kata-containers/ci

We need a separate CI job for testing experimental features since experimental feature can run in different code path as this PR does.

I think we can reuse the exisiting job, such as jenkins-ci-ubuntu-18-04, the chosing Job must be "required" since breaking experimental features are also not allowed.

The modification for this experimental feature is adding one single line to configuration.toml

experimental=["newstore"]

Thanks!

@WeiZhang555 WeiZhang555 deleted the rfc-persist-data-standard-wip branch April 20, 2019 04:35
@chavafg
Copy link
Contributor

chavafg commented Apr 25, 2019

@WeiZhang555 I have added an envar KATA_EXPERIMENTAL_FEATURES=true to the Fedora jobs, which I will set as "required" in github.
I opened PR kata-containers/tests#1501 in the tests repo to make the necessary changes on the configuration.toml, ptal.

@WeiZhang555
Copy link
Member Author

WeiZhang555 commented Apr 26, 2019

Thanks @chavafg !

Cross refs to kata-containers/tests#1501 and copying comments here:

I checked the CI job metrics: https://github.com/kata-containers/ci#ci-job-matrix and found that fedora didn't cover all test cases. Is that possible to also enable K8s test in Fedora Job? Since I found that my WIP PR (#1575) about newstore feature was failing the K8s test and Fedora test doesn't cover it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] baseline persist data for live-upgrade
10 participants