Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
[in vfs.git#fixes] copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied. For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to. The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe. Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor torvalds#128, despite torvalds#64 being observably not open. The best way to fix that is in dup_fd() - there we both can easily check whether we might need to fix anything up and see which word needs the upper bits cleared. Reproducer follows: #define __GNU_SOURCE #include <linux/close_range.h> #include <unistd.h> #include <fcntl.h> #include <signal.h> #include <sched.h> #include <stdio.h> #include <stdbool.h> #include <sys/mman.h> void is_open(int fd) { printf("#%d is %s\n", fd, fcntl(fd, F_GETFD) >= 0 ? "opened" : "not opened"); } int child(void *unused) { while(1) { } return 0; } int main(void) { char *stack; pid_t pid; stack = mmap(NULL, 1024*1024, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK, -1, 0); if (stack == MAP_FAILED) { perror("mmap"); return -1; } pid = clone(child, stack + 1024*1024, CLONE_FILES | SIGCHLD, NULL); if (pid == -1) { perror("clone"); return -1; } for (int i = 2; i < 128; i++) dup2(0, i); close_range(64, ~0U, CLOSE_RANGE_UNSHARE); is_open(64); printf("dup(0) => %d, expected 64\n", dup(0)); kill(pid, 9); return 0; } Cc: [email protected] Signed-off-by: Al Viro <[email protected]>
- Loading branch information