Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

persist: manage "hypervisor.json" with new store #1889

Merged
merged 4 commits into from
Jul 23, 2019

Conversation

WeiZhang555
Copy link
Member

@WeiZhang555 WeiZhang555 commented Jul 18, 2019

Fixes #803

This PR moved 3 files to new store, now new store can manage more parts:

  • hypervisor.json
  • agent.json
  • network.json

Signed-off-by: Wei Zhang [email protected]

@WeiZhang555
Copy link
Member Author

/test

@codecov
Copy link

codecov bot commented Jul 18, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@d987a30). Click here to learn what that means.
The diff coverage is 43.08%.

@@            Coverage Diff            @@
##             master    #1889   +/-   ##
=========================================
  Coverage          ?   52.32%           
=========================================
  Files             ?      108           
  Lines             ?    14102           
  Branches          ?        0           
=========================================
  Hits              ?     7379           
  Misses            ?     5848           
  Partials          ?      875

@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555
Copy link
Member Author

/test

@codecov
Copy link

codecov bot commented Jul 19, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@688732a). Click here to learn what that means.
The diff coverage is 29.84%.

@@            Coverage Diff            @@
##             master    #1889   +/-   ##
=========================================
  Coverage          ?   52.19%           
=========================================
  Files             ?      108           
  Lines             ?    14192           
  Branches          ?        0           
=========================================
  Hits              ?     7408           
  Misses            ?     5905           
  Partials          ?      879

@WeiZhang555
Copy link
Member Author

/test

}

func (q *qemu) load(s persistapi.HypervisorState) {
q.state.UUID = s.UUID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qemu pid is not restored. Then why do we need to persist it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fc and ACRN use it.

@@ -574,8 +572,20 @@ func newSandbox(ctx context.Context, sandboxConfig SandboxConfig, factory Factor
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the defer function should be combined with the above one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, nice catch!

s.devManager = deviceManager.NewDeviceManager(sandboxConfig.HypervisorConfig.BlockDeviceDriver, nil)

// Ignore the error. Restore can fail for a new sandbox
s.Restore()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it safe to ignore errors here? The old store relies on restore to determine if it should really create new guest. What is the workflow of newstore here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're almost same. s.Restore() always fail when it's first sandbox.

https://github.com/kata-containers/runtime/pull/1889/files#diff-bca2c2b5b04e0e74533b492fd23a3cc8R478

this part will check if s.state.State=="", if state is empty, then we need to create a new sandbox; else this is an existing sandbox.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about capturing the error and adding it to a logger.WithError(err).Debug("found expected restore error")?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jodh-intel
Actually I think it can be expected or unexpected, we can't know so we can't print debug message saying "it's expected"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack ;)

@WeiZhang555 WeiZhang555 force-pushed the persist-data branch 2 times, most recently from e747833 to 3dad0b1 Compare July 22, 2019 04:02
@WeiZhang555
Copy link
Member Author

/test

ping @kata-containers/runtime @bergwolf @sboeuf @egernst

Please help review it, before I make the PR too huge...

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WeiZhang555.

This is a large PR so if you can add some more unit tests to assert the expected behaviour, that would be great.

ID: endpoint.NetPair.TapInterface.ID,
Name: endpoint.NetPair.TapInterface.Name,
TAPIface: persistapi.NetworkInterface{
Name: endpoint.NetPair.TapInterface.TAPIface.Name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be clearer if you create a variable to refer to endpoint.NetPair (and maybe others for referring to endpoint.NetPair.TapInterface and endpoint.NetPair.VirtIface) I think.

Copy link
Member Author

@WeiZhang555 WeiZhang555 Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copy is deep so it's hard to be clear, I don't think separate some variables would help 😂

Fixing now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've separated common parts to several functions, and added unit tests.
Please take another look. @jodh-intel


func (endpoint *BridgedMacvlanEndpoint) save() (s persistapi.NetworkEndpoint) {
s.Type = string(endpoint.Type())
s.BridgedMacvlan = &persistapi.BridgedMacvlanEndpoint{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would be clearer to add a blank line after s.Type = ....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

func (k *kataAgent) save() (s persistapi.AgentState) {
s.ProxyPid = k.state.ProxyPid
s.URL = k.state.URL
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is of course correct, but would look more "conventional" as:

return persistapi.AgentState{
    ProxyPid: k.state.ProxyPid,
    URL: k.state.URL,
}

Same comment for load() below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, let me do this.


func (endpoint *MacvtapEndpoint) save() (s persistapi.NetworkEndpoint) {
s.Type = string(endpoint.Type())
s.Macvtap = &persistapi.MacvtapEndpoint{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, maybe just:

return &persistapi.MacvtapEndpoint{
  // ...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

case IPVlanEndpointType:
ep = &IPVlanEndpoint{}
default:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worth adding atleast a warning log message here as this scenario shouldn't occur I think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This should probably be a Warn() though since we can recover from it (by ignoring it ;)

@@ -0,0 +1,43 @@
// Copyright (c) 2016 Intel Corporation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect copyright header?

}
} else if q.state.UUID == "" { // new store
create = true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This set of tets could do with a unit test if possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how, let me take a shot.

s.devManager = deviceManager.NewDeviceManager(sandboxConfig.HypervisorConfig.BlockDeviceDriver, nil)

// Ignore the error. Restore can fail for a new sandbox
s.Restore()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about capturing the error and adding it to a logger.WithError(err).Debug("found expected restore error")?

@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555 WeiZhang555 force-pushed the persist-data branch 3 times, most recently from 3d12efe to 4c972f5 Compare July 23, 2019 06:25
@WeiZhang555
Copy link
Member Author

/test

@WeiZhang555
Copy link
Member Author

/test

Fixes kata-containers#803

Merge "hypervisor.json" into "persist.json", so the new store can take
care of hypervisor data now.

Signed-off-by: Wei Zhang <[email protected]>
Manage "agent.json" with new store.

Signed-off-by: Wei Zhang <[email protected]>
Merge "network.json" into "persist.json" so that new store can manage
network part.

Signed-off-by: Wei Zhang <[email protected]>
Address some comments for code readability, also add some unit tests.

Signed-off-by: Wei Zhang <[email protected]>
@WeiZhang555
Copy link
Member Author

/test

Copy link
Member

@bergwolf bergwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WeiZhang555 ! Nice work!

Copy link
Contributor

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests. It would be good to add that qemu one if possible but this...

lgtm

case IPVlanEndpointType:
ep = &IPVlanEndpoint{}
default:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This should probably be a Warn() though since we can recover from it (by ignoring it ;)

s.devManager = deviceManager.NewDeviceManager(sandboxConfig.HypervisorConfig.BlockDeviceDriver, nil)

// Ignore the error. Restore can fail for a new sandbox
s.Restore()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack ;)

@egernst egernst merged commit 6ce5f30 into kata-containers:master Jul 23, 2019
@WeiZhang555 WeiZhang555 deleted the persist-data branch July 23, 2019 15:30
@WeiZhang555
Copy link
Member Author

Thanks! @egernst @jodh-intel @bergwolf

egernst pushed a commit to egernst/runtime that referenced this pull request Feb 9, 2021
This updates grpc-go vendor package to v1.11.3 release, to fix server.Stop()
handling so that server.Serve() does not wait blindly.

Full commit list:
d11072e (tag: v1.11.3) Change version to 1.11.3
d06e756 clientconn: add support for unix network in DialContext. (kata-containers#1883)
452c2a7 Change version to 1.11.3-dev
d89cded (tag: v1.11.2) Change version to 1.11.2
98ac976 server: add grpc.Method function for extracting method from context (kata-containers#1961)
0f5fa28 Change version to 1.11.2-dev
1e2570b (tag: v1.11.1) Change version to 1.11.1
d28faca client: Fix race when using both client-side default CallOptions and per-call CallOptions (kata-containers#1948)
48b7669 Change version to 1.11.1-dev
afc05b9 (tag: v1.11.0) Change version to 1.11.0
f2620c3 resolver: keep full unparsed target string if scheme in parsed target is not registered (kata-containers#1943)
9d2250f status: rename Status to GRPCStatus to avoid name conflicts (kata-containers#1944)
2756956 status: Allow external packages to produce status-compatible errors (kata-containers#1927)
0ff1b76 routeguide: reimplement distance calculation
dfbefc6 service reflection can lookup enum, enum val, oneof, and field symbols (kata-containers#1910)
32d9ffa Documentation: Fix broken link in rpc-errors.md (kata-containers#1935)
d5126f9 Correct Go 1.6 support policy (kata-containers#1934)
5415d18 Add documentation and example of adding details to errors (kata-containers#1915)
57640c0 Allow storing alternate transport.ServerStream implementations in context (kata-containers#1904)
031ee13 Fix Test: Update the deadline since small deadlines are prone to flakes on Travis. (kata-containers#1932)
2249df6 gzip: Add ability to set compression level (kata-containers#1891)
8124abf credentials/alts: Remove the enable_untrusted_alts flag (kata-containers#1931)
b96718f metadata: Fix bug where AppendToOutgoingContext could modify another context's metadata (kata-containers#1930)
738eb6b fix minor typos and remove grpc.Codec related code in TestInterceptorCanAccessCallOptions (kata-containers#1929)
211a7b7 credentials/alts: Update ALTS "New" APIs (kata-containers#1921)
fa28bef client: export types implementing CallOptions for access by interceptors (kata-containers#1902)
ec9275b travis: add Go 1.10 and run vet there instead of 1.9 (kata-containers#1913)
13975c0 stream: split per-attempt data from clientStream (kata-containers#1900)
2c2d834 stats: add BeginTime to stats.End (kata-containers#1907)
3a9e1ba Reset ping strike counter right before sending out data. (kata-containers#1905)
90dca43 resolver: always fall back to default resolver when target does not follow URI scheme (kata-containers#1889)
9aba044 server: Convert all non-status errors to codes.Unknown (kata-containers#1881)
efcc755 credentials/alts: change ALTS protos to match the golden version (kata-containers#1908)
0843fd0 credentials/alts: fix infinite recursion bug [in custom error type] (kata-containers#1906)
207e276 Fix test race: Atomically access minConnecTimout in testing environment. (kata-containers#1897)
3ae2a61 interop: Add use_alts flag to client and server binaries (kata-containers#1896)
5190b06 ALTS: Simplify "New" APIs (kata-containers#1895)
7c5299d Fix flaky test: TestCloseConnectionWhenServerPrefaceNotReceived (kata-containers#1870)
f0a1202 examples: Replace context.Background with context.WithTimeout (kata-containers#1877)
a1de3b2 alts: Change ALTS proto package name (kata-containers#1886)
2e7e633 Add ALTS code (kata-containers#1865)
583a630 Expunge error codes that shouldn't be returned from library (kata-containers#1875)
2759199 Small spelling fixes (unknow -> unknown) (kata-containers#1868)
12da026 clientconn: fix a typo in GetMethodConfig documentation (kata-containers#1867)
dfa1834 Change version to 1.11.0-dev (kata-containers#1863)
46fd263 benchmarks: add flag to benchmain to use bufconn instead of network (kata-containers#1837)
3926816 addrConn: Report underlying connection error in RPC error (kata-containers#1855)
445b728 Fix data race in TestServerGoAwayPendingRPC (kata-containers#1862)
e014063 addrConn: keep retrying even on non-temporary errors (kata-containers#1856)
484b3eb transport: fix race causing flow control discrepancy when sending messages over server limit (kata-containers#1859)
6c48c7f interop test: Expect io.EOF from stream.Send() (kata-containers#1858)
08d6261 metadata: provide AppendToOutgoingContext interface (kata-containers#1794)
d50734d Add status.Convert convenience function (kata-containers#1848)
365770f streams: Stop cleaning up after orphaned streams (kata-containers#1854)
7646b53 transport: support stats.Handler in serverHandlerTransport (kata-containers#1840)
104054a Fix connection drain error message (kata-containers#1844)
d09ec43 Implement unary functionality using streams (kata-containers#1835)
37346e3 Revert "Add WithResolverUserOptions for custom resolver build options" (kata-containers#1839)
424e3e9 Stream: do not cancel ctx created with service config timeout (kata-containers#1838)
f9628db Fix lint error and typo (kata-containers#1843)
0bd008f stats: Fix bug causing trailers-only responses to be reported as headers (kata-containers#1817)
5769e02 transport: remove unnecessary rstReceived (kata-containers#1834)
0848a09 transport: remove redundant check of stream state in Write (kata-containers#1833)
c22018a client: send RST_STREAM on client-side errors to prevent server from blocking (kata-containers#1823)
82e9f61 Use keyed fields for struct initializers (kata-containers#1829)
5ba054b encoding: Introduce new method for registering and choosing codecs (kata-containers#1813)
4f7a2c7 compare atomic and mutex performance in case of contention. (kata-containers#1788)
b71aced transport: Fix a data race when headers are received while the stream is being closed (kata-containers#1814)
46bef23 Write should fail when the stream was done but context wasn't cancelled. (kata-containers#1792)
10598f3 Explain target format in DialContext's documentation (kata-containers#1785)
08b7bd3 gzip: add Name const to avoid typos in usage (kata-containers#1804)
8b02d69 remove .please-update (kata-containers#1800)
1cd2346 Documentation: update broken wire.html link in metadata package. (kata-containers#1791)
6913ad5 Document that all errors from RPCs are status errors (kata-containers#1782)
8a8ac82 update const order (kata-containers#1770)
e975017 Don't set reconnect parameters when the server has already responded. (kata-containers#1779)
7aea499 credentials: return Unavailable instead of Internal for per-RPC creds errors (kata-containers#1776)
c998149 Avoid copying headers/trailers in unary RPCs unless requested by CallOptions (kata-containers#1775)
8246210 Update version to 1.10.0-dev (kata-containers#1777)
17c6e90 compare atomic and mutex performance for incrementing/storing one variable (kata-containers#1757)
65c901e Fix flakey test. (kata-containers#1771)
7f2472b grpclb: Remove duplicate init() (kata-containers#1764)
09fc336 server: fix bug preventing Serve from exiting when Listener is closed (kata-containers#1765)
035eb47 Fix TestGracefulStop flakiness (kata-containers#1767)
2720857 server: fix race between GracefulStop and new incoming connections (kata-containers#1745)
0547980 Notify parent ClientConn to re-resolve in grpclb (kata-containers#1699)
e6549e6 Add dial option to set balancer (kata-containers#1697)
6610f9a Fix test: Data race while resetting global var. (kata-containers#1748)
f4b5237 status: add Code convenience function (kata-containers#1754)
47bddd7 vet: run golint on _string files (kata-containers#1749)
45088c2 examples: fix concurrent map accesses in route_guide server (kata-containers#1752)
4e393e0 grpc: fix deprecation comments to conform to standard (kata-containers#1691)
0b24825 Adjust keepalive paramenters in the test such that scheduling delays don't cause false failures too often. (kata-containers#1730)
f9390a7 fix typo (kata-containers#1746)
6ef45d3 fix stats flaky test (kata-containers#1740)
98b17f2 relocate check for shutdown in ac.tearDown() (kata-containers#1723)
5ff10c3 fix flaky TestPickfirstOneAddressRemoval (kata-containers#1731)
2625f03 bufconn: allow readers to receive data after writers close (kata-containers#1739)
b0e0950 After sending second goaway close conn if idle. (kata-containers#1736)
b8cf13e Make sure all goroutines have ended before restoring global vars. (kata-containers#1732)
4742c42 client: fix race between server response and stream context cancellation (kata-containers#1729)
8fba5fc In gracefull stop close server transport only after flushing status of the last stream. (kata-containers#1734)
d1fc8fa Deflake tests that rely on Stop() then Dial() not reconnecting (kata-containers#1728)
dba60db Switch balancer to grpclb when at least one address is grpclb address (kata-containers#1692)
ca1b23b Update CONTRIBUTING.md to CNCF CLA
2941ee1 codes: Add UnmarshalJSON support to Code type (kata-containers#1720)
ec61302 naming: Fix build constraints for go1.6 and go1.7 (kata-containers#1718)
b8191e5 remove stringer and go generate (kata-containers#1715)
ff1be3f Add WithResolverUserOptions for custom resolver build options (kata-containers#1711)
580defa Fix grpc basics link in route_guide example (kata-containers#1713)
b7dc71e Optimize codes.String() method using a switch instead of a slice of indexes (kata-containers#1712)
1fc873d Disable ccBalancerWrapper when it is closed (kata-containers#1698)
bf35f1b Refactor roundrobin to support custom picker (kata-containers#1707)
4308342 Change parseTimeout to not handle non-second durations (kata-containers#1706)
be07790 make load balancing policy name string case-insensitive (kata-containers#1708)
cd563b8 protoCodec: avoid buffer allocations if proto.Marshaler/Unmarshaler (kata-containers#1689)
61c6740 Add comments to ClientConn/SubConn interfaces to indicate new methods may be added (kata-containers#1680)
ddbb27e client: backoff before reconnecting if an HTTP2 server preface was not received (kata-containers#1648)
a4bf341 use the request context with net/http handler (kata-containers#1696)
c6b4608 transport: fix race sending RPC status that could lead to a panic (kata-containers#1687)
00383af Fix misleading default resolver scheme comments (kata-containers#1703)
a62701e Eliminate data race in ccBalancerWrapper (kata-containers#1688)
1e1a47f Re-resolve target when one connection becomes TransientFailure (kata-containers#1679)
2ef021f New grpclb implementation (kata-containers#1558)
10873b3 Fix panics on balancer and resolver updates (kata-containers#1684)
646f701 Change version to 1.9.0-dev (kata-containers#1682)

Fixes: kata-containers#307

Signed-off-by: Peng Tao <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] baseline persist data for live-upgrade
4 participants