Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prepare for nydusd daemonless #540

Merged
merged 5 commits into from
Jul 14, 2022
Merged

Conversation

kevinXYin
Copy link
Contributor

This patchset is used for support nydusd daemonless for fscache scenario. After all blob data has been loaded to local storage, data flow is handled in kernel, nydusd daemon can exit.

Thit patchset basically does these following things:

  1. improve blob prefetch, add retry mechanism to make sure all prefetch requests can be handled correctly.
  2. add a  periodic task to check all BlobCacheMgr's status , if all data is ready stops the prefetch workers, and mark this BlobCacheMgr is "data ready".
  3. add data_all_ready field in metrics, then user can get data status of BlobCacheMgr, and kill nydusd after all data is ready.

It should be noted that, when cachefiles devfd released , the fscache volume will become DEAD state , then local cache can not be accessed by erofs. For now I hold the devfd in other user daemon via usd to walk around this limitation. I will figure out in-kernel solution to support this case later.

@yqleng1987
Copy link
Contributor

@kevinXYin , a new test job has been submitted. Please wait in patience.

@yqleng1987
Copy link
Contributor

@kevinXYin , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅SUCCESS
build-docker-image✅SUCCESS
compile-nydus❌FAIL

Sorry, your test job failed. Please get the details in the link.

@yqleng1987
Copy link
Contributor

@kevinXYin , your pull request has been updated. A new test job will be submitted. Please wait in patience.

@yqleng1987
Copy link
Contributor

@kevinXYin , the test job has been submitted. Please wait in patience.

@yqleng1987
Copy link
Contributor

@kevinXYin , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅SUCCESS
build-docker-image✅SUCCESS
compile-nydus❌FAIL

Sorry, your test job failed. Please get the details in the link.

@yqleng1987
Copy link
Contributor

@kevinXYin , your pull request has been updated. A new test job will be submitted. Please wait in patience.

@yqleng1987
Copy link
Contributor

@kevinXYin , the test job has been submitted. Please wait in patience.

@yqleng1987
Copy link
Contributor

@kevinXYin , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅SUCCESS
build-docker-image✅SUCCESS
compile-nydus✅SUCCESS
compile-ctr-remote✅SUCCESS
compile-nydus-snapshotter✅SUCCESS
start-nydus-snapshotter-config-containerd✅SUCCESS
run-container-with-nydus-image✅SUCCESS

Congratulations, your test job passed!

@xujihui1985
Copy link
Contributor

@kevinXYin if nydusdaemon was deleted after blob downloaded complete, how can you garbage collect blobcache? for example, when diskusage is high and you need to release some blobcache to reduce the diskusage.

@kevinXYin
Copy link
Contributor Author

Yes , indeed in this case we will lose gc capacity for blob cache, so whether to kill nydusd depends on the needs of the user.

On the other hand , I think we need logic to restart nydusd, for example, if we need to pull a new image after nydusd exit. Maybe we should also restart it when we need gc.

@xujihui1985
Copy link
Contributor

@kevinXYin it's complicate that we need to decide when to restart the daemon, it's before the gc or after gc, probably before gc other wise there is chance EIO will happened, also the snapshotter should have a mechanism to know which daemon to restart, this is also complicate because you can hardly tell from the blobcache which daemon it belongs to.

@kevinXYin
Copy link
Contributor Author

@kevinXYin it's complicate that we need to decide when to restart the daemon, it's before the gc or after gc, probably before gc other wise there is chance EIO will happened, also the snapshotter should have a mechanism to know which daemon to restart, this is also complicate because you can hardly tell from the blobcache which daemon it belongs to.
For fscache scenario , only one daemon is allowed , this is a limitation of kernel. what do you mean "which daemon it belongs to"? did I miss some here?

Xin Yin added 2 commits July 8, 2022 14:09
when get error during blob prefetch request processing, resend a new
prefetch request. Make sure all the request can be handled correctly.

Signed-off-by: Xin Yin <[email protected]>
Add a new fn check_stat() for BlobCacheMgr, this is used to check data
chunk status. If all data chunks are ready stop prefetch workers
belonging to the BlobCacheMgr, and mark all data ready in metrics.

So far, only implement this func for fscache.

Signed-off-by: Xin Yin <[email protected]>
@yqleng1987
Copy link
Contributor

@kevinXYin , your pull request has been updated. A new test job will be submitted. Please wait in patience.

@yqleng1987
Copy link
Contributor

@kevinXYin , the test job has been submitted. Please wait in patience.

@yqleng1987
Copy link
Contributor

@kevinXYin , The CI test is completed, please check result:

Test CaseTest Result
merge-target-branch✅SUCCESS
build-docker-image✅SUCCESS
compile-nydus✅SUCCESS
compile-ctr-remote✅SUCCESS
compile-nydus-snapshotter✅SUCCESS
start-nydus-snapshotter-config-containerd✅SUCCESS
run-container-with-nydus-image✅SUCCESS

Congratulations, your test job passed!

@kevinXYin
Copy link
Contributor Author

@changweige sorry for replying late~. I have updated the pull request which includes the following changes as we discussed.

  1. use filter as rust style in get_blobs_num().
  2. make blob prefetch retry asynchronous.
  3. use store instead of compare_exchange in check_stat().
  4. use an async context to fill bootstrap cache file on open cmd.

@changweige
Copy link
Contributor

Shall we remove WIP telling reviewers it is ready to be merged or do we have further works?

@kevinXYin kevinXYin changed the title [WIP]: prepare for nydusd daemonless prepare for nydusd daemonless Jul 13, 2022
@kevinXYin
Copy link
Contributor Author

Shall we remove WIP telling reviewers it is ready to be merged or do we have further works?

yeah , updated , I have tested the current changes does not affect normal workflow. The further work will introduce an in-kernel solution for walk around fscache volume DEAD limitation. But I'm busy with other things now , so it may be delayed for a while.

Xin Yin added 2 commits July 14, 2022 15:10
We need a periodic task to check all the BlobCacheMgr's data cache
status for fscache scenario, reuse the cache-flusher thread.

Signed-off-by: Xin Yin <[email protected]>
whan handle open requests for bootstrap, feed all data at once.For
supporting daemonless we need fill all cache files before nydusd exit.
but bootstrap does not support prefetch. Also it may improve
performance.

Signed-off-by: Xin Yin <[email protected]>
@yqleng1987
Copy link
Contributor

@kevinXYin , your pull request has been updated. A new test job will be submitted. Please wait in patience.

@yqleng1987
Copy link
Contributor

@kevinXYin , your test job has passed, and no need to test again.

Copy link
Contributor

@changweige changweige left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@changweige changweige merged commit 91d196c into dragonflyoss:master Jul 14, 2022
@kevinXYin kevinXYin deleted the daemonless branch August 1, 2022 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants