Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for CRIU Project #14620

Open
dsouzai opened this issue Mar 1, 2022 · 6 comments
Open

Documentation for CRIU Project #14620

dsouzai opened this issue Mar 1, 2022 · 6 comments
Assignees
Labels
beta Used to track items that will be included in a feature beta release criu Used to track CRIU snapshot related work

Comments

@dsouzai
Copy link
Contributor

dsouzai commented Mar 1, 2022

This issue is to track the creation of various topics around which documentation in the form of markdown in the project or blogs over at https://blog.openj9.org/.

The following is a list, separated into high level categories, of some of the work that probably deserves mention in one or more blogs.

Basic Implementation

JAVA API

Security Considerations and Changes

Rootless CRIU

Container Engine Considerations

Docker

  • Changes in Docker needed to pass CAP_CHECKPOINT_RESTORE
  • Running with --security-opt systempaths=unconfined --security-opt apparmor=unconfined in order to ensure docker mounts the proc filesystem as r/w.

Podman

OCP

Testing Environment Challenges

RAS

@dsouzai dsouzai added the criu Used to track CRIU snapshot related work label Mar 1, 2022
@dsouzai
Copy link
Contributor Author

dsouzai commented Mar 1, 2022

fyi @vijaysun-omr @tajila

@vijaysun-omr
Copy link
Contributor

vijaysun-omr commented Mar 2, 2022

Thanks for getting this started @dsouzai. I agree that the list of items that you linked offers a good view into the areas we need to address. Getting into the specifics of what the hooks are being designed to handle, e.g. Random, Timers, environment variables and JCE and how they are each handled may be a further sub-topics under the "hook" main topic.

Taking a step back, I think we may need to establish the goal of the documentation clearly. In my mind, the goal is to ease a user of OpenJ9 or Java into this very different world of CRIU and snapshot/restore slowly so that they do not have to read N documents strewn all over the web about CRIU, the operating system details etc. In other words, we probably want to have a sequence and flow to the documentation that is along the lines of what a good discussion ought to follow : "why", "what", "when" and then finally the "how" of the topic.

Most of the items in your starting list above goes in the "how" part of the flow. In the "how" section, we need information clearly called out about "prerequisites" (supported OS versions, supported platforms etc.) and we also need to have a page on "limitations" as well as "what will behave differently" so that these are handy pages for a user to get the information that they might be interested in directly rather than derive it from the other articles that we have.

The "what" section probably needs thought about the topics we would mention there, i.e. we are offering an API and hooks for frameworks or other applications to use, but that is'nt going to be usable on its own, i.e. higher level code has to call into our API. This may be a point that perhaps gets made with a simple example program that you show calling in to our APIs to generate a snapshot and the command for how to restore from it. You could use the same example program to make various further points, i.e. why you need hooks or different arguments/versions of OS etc. as you make it more and more complicated, i.e. point out problems and then introduce the solution (link to the article that describes it) and how it allows you to proceed further with your small example.

@vijaysun-omr
Copy link
Contributor

If you guys feel this sort of a flow would be good to have, then we may be able to divide up the work such that we focus on the main flow that works with the "why", "what" and the running example, and in parallel ask some of the folks who implemented the different features/hooks/fixups/prerequisites to describe their own problem/solution via standalone articles.

@vijaysun-omr
Copy link
Contributor

A different organization could be as some set of "how to" articles, but I'll let you express your preferences.

@dsouzai
Copy link
Contributor Author

dsouzai commented Mar 2, 2022

I think the sort of standard layout of writing about the motivation (why) followed by description of the capability (what) followed by a simple example is probably a good first entry to have, especially as a landing point. This entry would be quite broadly consumable by all levels of the software stack. Possibly can also talk about when we expect to have this available for people to try out.

After that, it probably makes sense to have a single entry for a broad description of the implementation of this capability (how) but without getting too deep into the details. This would make it as consumable as possible because it's a high level description (to the extent possible) how the JVM implements this while possibly alluding to sort of the ultimate goal which is to have this work in the container workflow and as such follow the standard best practices .

The next logical step then is to focus on the various efforts to get rootless checkpoint/restore, from changes required both within the JVM and the environment (CRIU, Docker, etc.).

I think only after these three does it makes sense to have the various low level entries because at the very least, the previous three entries will have provided, to the broadest possible audience, the "why", "what", "how" and anyone then interested in the nitty gritty details is free to read the various blogs/entries that pop up over time. These could be, I suppose, organized as a set of "how to" articles, talking about the problem and how we decided to tackle it.

@vijaysun-omr
Copy link
Contributor

vijaysun-omr commented Mar 2, 2022

Okay, that sounds like a practical approach. We need to get the broad articles done before the beta and then the rest roll in afterwards or in parallel (at least they are not as critical). @tajila there may be some text from disclosure documents and such that we have written internally that perhaps could be used to serve as a starting point for these broad articles ?

@tajila tajila added the beta Used to track items that will be included in a feature beta release label Mar 2, 2022
@dsouzai dsouzai self-assigned this Mar 9, 2022
@dsouzai dsouzai moved this to In Progress in J9 CRIU Support Mar 9, 2022
@tajila tajila self-assigned this Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta Used to track items that will be included in a feature beta release criu Used to track CRIU snapshot related work
Projects
Status: In Progress
Development

No branches or pull requests

3 participants