Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support time namespace #3876

Merged
merged 1 commit into from
Aug 10, 2023
Merged

Conversation

chethanah
Copy link
Contributor

@chethanah chethanah commented May 22, 2023

Background

Support time ns in runc

Unit test

  • Launch container with boottime offsets
# jq '.linux.timeOffsets["boottime"]' config.json 
{
  "secs": 999999999,
  "nanosecs": 0
}

# jq '.linux.namespaces[5]' config.json 
{
  "type": "time"
}

# jq '.process.args' config.json 
[
  "uptime",
  "-s"
]

# uptime -s; ./runc run test 
2023-05-22 17:05:57
1991-09-13 09:49:18

Discussion points

  • The time offset is updated to /proc/self/timens_offsets
    • is it important to set stage1_pid/stage2_pid like done in update_uidmap?
  • Is it advisable to set offsets elsewhere apart from STAGE_CHILD?

@chethanah
Copy link
Contributor Author

CC: @KentaTada
Create PR for supporting time ns in runc

@chethanah
Copy link
Contributor Author

  • Kindly review and suggest for changes or additional discussion points.
  • Test cases and documentation will be added if core implementation is finalized

@chethanah
Copy link
Contributor Author

CC: @AkihiroSuda

@chethanah chethanah changed the title [WIP] Support time namespace Support time namespace Jun 2, 2023
@AkihiroSuda
Copy link
Member

CI is failing

@chethanah chethanah force-pushed the support-time-ns branch 2 times, most recently from c87c035 to d6c0d52 Compare July 11, 2023 10:42
@chethanah
Copy link
Contributor Author

The CI is now passing after removing logrus print statement.

@chethanah
Copy link
Contributor Author

Additional merge commit added while rebasing changes from main. I'll update my branch by removing the merge commit.

@AkihiroSuda AkihiroSuda added this to the 1.2.0 milestone Jul 16, 2023
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not looking at this earlier, only one somewhat significant thing needs to be changed. And this unfortunately needs a rebase due to the just-merged id-mapped-mounts work.

This could also really use some integration tests. If you're not comfortable setting those up, I can do them in a follow-up PR.

@@ -212,6 +212,9 @@ type Config struct {
// RootlessCgroups is set when unlikely to have the full access to cgroups.
// When RootlessCgroups is set, cgroups errors are ignored.
RootlessCgroups bool `json:"rootless_cgroups,omitempty"`

// TimeOffsets specifies the offset for supporting time namespaces.
TimeOffsets map[string]specs.LinuxTimeOffset `json:"timeOffsets,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the rest of the state json:

Suggested change
TimeOffsets map[string]specs.LinuxTimeOffset `json:"timeOffsets,omitempty"`
TimeOffsets map[string]specs.LinuxTimeOffset `json:"time_offsets,omitempty"`

Comment on lines 2253 to 2335
// write boottime and monotonic offsets
// 0 is default value in /proc/PID/timens_offsets
// if 0, do not bootstrap data
if c.config.TimeOffsets != nil {

if c.config.TimeOffsets["boottime"].Nanosecs != 0 || c.config.TimeOffsets["boottime"].Secs != 0 {
b, err := encodeTimeNs(c.config.TimeOffsets["boottime"])
if err != nil {
return nil, err
}
r.AddData(&Bytemsg{
Type: BootTimeNsAttr,
Value: append([]byte("boottime "), b...),
})
}

if c.config.TimeOffsets["monotonic"].Nanosecs != 0 || c.config.TimeOffsets["monotonic"].Secs != 0 {
b, err := encodeTimeNs(c.config.TimeOffsets["monotonic"])
if err != nil {
return nil, err
}
r.AddData(&Bytemsg{
Type: MonotonicNsAttr,
Value: append([]byte("monotonic "), b...),
})
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can be written with one string by separating them with newlines, so there's no need to add multiple netlink entries for this. Just generate one string into a TimeOffsetsAttr and write it in one shot in nsexec.c.

* Update timens offsets
* set boottime and monotonic offsets
*/
write_log(DEBUG, "set timens offsets %s", config.boottime);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're outputting this debug information twice (both here and in update_timens). You can remove this one.

Comment on lines 2258 to 2259
if c.config.TimeOffsets["boottime"].Nanosecs != 0 || c.config.TimeOffsets["boottime"].Secs != 0 {
b, err := encodeTimeNs(c.config.TimeOffsets["boottime"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use 3 map lookups here, while 1 should be enough. Make encodeTimeNs check if the fields are all 0 and return empty string in such case, add a check for empty string below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also looks like this would be a good place to write something like

for k, v := c.Config.TimeOffsets {
...
}

making the code more generic and avoiding to use specific strings.

@@ -95,6 +95,13 @@ struct nlconfig_t {
/* Mount sources opened outside the container userns. */
char *mountsources;
size_t mountsources_len;

/* Time NS settings for boottime and monotonic */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing period.

Comment on lines 1087 to 1088
* Update timens offsets
* set boottime and monotonic offsets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing periods. Please write complete sentences in comments, otherwise it's harder to read (and maintain).

Comment on lines 439 to 441
if spec.Linux.TimeOffsets != nil {
config.TimeOffsets = spec.Linux.TimeOffsets
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here if is not needed -- worst case scenario is you assign nil to config.TimeOffsets.

(In other places if is needed to avoid nil dereferences, but not here).

@chethanah
Copy link
Contributor Author

Thank you for detailed review. I'll update as per comments.

Comment on lines 2324 to 2334
// write boottime and monotonic time ns offsets.
if c.config.TimeOffsets != nil {
timeOffset := append([]byte("monotonic"), []byte(fmt.Sprintf(" %d %d\n", c.config.TimeOffsets["monotonic"].Secs, c.config.TimeOffsets["monotonic"].Nanosecs))...)
timeOffset = append(timeOffset, []byte("boottime")...)
r.AddData(&Bytemsg{
Type: TimeOffsetsAttr,
Value: append(timeOffset, []byte(fmt.Sprintf(" %d %d", c.config.TimeOffsets["boottime"].Secs, c.config.TimeOffsets["boottime"].Nanosecs))...),
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little difficult to read. Maybe something more like this (untested):

Suggested change
// write boottime and monotonic time ns offsets.
if c.config.TimeOffsets != nil {
timeOffset := append([]byte("monotonic"), []byte(fmt.Sprintf(" %d %d\n", c.config.TimeOffsets["monotonic"].Secs, c.config.TimeOffsets["monotonic"].Nanosecs))...)
timeOffset = append(timeOffset, []byte("boottime")...)
r.AddData(&Bytemsg{
Type: TimeOffsetsAttr,
Value: append(timeOffset, []byte(fmt.Sprintf(" %d %d", c.config.TimeOffsets["boottime"].Secs, c.config.TimeOffsets["boottime"].Nanosecs))...),
})
}
// write boottime and monotonic time ns offsets.
if c.config.TimeOffsets != nil {
var offsetSpec bytes.Buffer
for clock, offset := range c.config.TimeOffsets {
fmt.Fprintf(&offsetSpec, "%s %d %d\n", clock, offset.Secs, offset.Nanosecs)
}
r.AddData(&Bytemsg{
Type: TimeOffsetsAttr,
Value: offsetSpec.Bytes(),
})
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Go, printf-like functions are notoriously slow for some reason. I'd prefer string concatenation, but it's not a big deal since it's only done once during the container start here.

"time" namespace was introduced in Linux v5.6
support new time namespace to set boottime and monotonic time offset

Example runtime spec

"timeOffsets": {
    "monotonic": {
        "secs": 172800,
        "nanosecs": 0
    },
    "boottime": {
        "secs": 604800,
        "nanosecs": 0
    }
}

Signed-off-by: Chethan Suresh <[email protected]>
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

return;
write_log(DEBUG, "update /proc/self/timens_offsets to '%s'", map);
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0) {
if (errno != EPERM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EPERM means The caller does not have the CAP_SYS_TIME capability.
Why ignore this error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think this was mistakenly copied from the /proc/self/uid_map and gid_map helpers (which call the setuid newuidmap/newgidmap helper programs in case of -EPERM).

return;
write_log(DEBUG, "update /proc/self/timens_offsets to '%s'", map);
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0) {
if (errno != EPERM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think this was mistakenly copied from the /proc/self/uid_map and gid_map helpers (which call the setuid newuidmap/newgidmap helper programs in case of -EPERM).

Comment on lines +766 to +769
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0) {
if (errno != EPERM)
bail("failed to update /proc/self/timens_offsets");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0) {
if (errno != EPERM)
bail("failed to update /proc/self/timens_offsets");
}
if (write_file(map, map_len, "/proc/self/timens_offsets") < 0)
bail("failed to update /proc/self/timens_offsets");

@@ -2321,6 +2321,18 @@ func (c *Container) bootstrapData(cloneFlags uintptr, nsMaps map[configs.Namespa
})
}

// write boottime and monotonic time ns offsets.
if c.config.Namespaces.Contains(configs.NEWTIME) && c.config.TimeOffsets != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we want this check to be done in the validator (at is stands, the code will silently ignore timensOffsets being configured without CLONE_NEWTIME enabled). But I can do this in a separate PR when I add the necessary integration tests for this feature.

Comment on lines +439 to +441

// update timens offsets
config.TimeOffsets = spec.Linux.TimeOffsets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: perhaps you can move this to be right after these lines:

                  config.MaskPaths = spec.Linux.MaskedPaths
                  config.ReadonlyPaths = spec.Linux.ReadonlyPaths
                  config.MountLabel = spec.Linux.MountLabel
                  config.Sysctl = spec.Linux.Sysctl

and drop the comment since it's kind of obvious what we do here.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly OK to me, left a couple of nits.

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just merge this, I'll send a PR to fix up the remaining nits (including the incorrect EPERM ignore logic) so we don't need to do more rounds of review.

Thanks @chethanah!

@akhilerm
Copy link
Contributor

Any reason why the timenamespace was not added into the config.json that is generated by runc spec. Also found that we are adding the ns into the config during tests here instead of using it directly from the config.

cc: @cyphar @chethanah

@cyphar
Copy link
Member

cyphar commented Dec 30, 2024

The default config is more of an example than a source of best practices (it doesn't have seccomp rules, for instance) so we don't always update it to include every new setting but I suspect that the PR author simply forgot to add it.

That being said, adding it would've resulted in errors on older kernels so there is a slight argument for not including it by default at the time, but we added support for this quite late and pre-4.18 kernels are very old at this point. Maybe we should just include it now.

@akhilerm
Copy link
Contributor

I have an initial set of changes in my fork https://github.com/akhilerm/runc/tree/add-time-ns-example, but still need to fix a few more tests. Will create the PR once the CI is passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support time namespaces
6 participants