New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add system calls lecture #13

Merged

tavip merged 8 commits into linux-kernel-labs:master from tavip:lktp-syscalls

Feb 27, 2018

Member

tavip commented Feb 27, 2018

This PR also adds an asciicast directive for asciiname playback.

tavip added 7 commits

February 26, 2018 03:15


          Documentation: teaching: convert TABS to spaces in ditaa directives

a16e12f

This makes it easier to edit ditaa directives since insertion will
always move the rest of the row by one.

Signed-off-by: Octavian Purdila <[email protected]>


          Documentation: teaching: intro lecture: fix list unindent

1c39ca7

Fixes the following warning:

Documentation/teaching/lectures/intro.rst:737: WARNING: Bullet list
ends without a blank line; unexpected unindent

Signed-off-by: Octavian Purdila <[email protected]>


          Documentation: conf.py: use add_stylesheet instead of html_context

8a4b0a4

As noted in Sphinx #2442 new CSS added by extensions are rendered
innefective if html_context its changed. So, instead, use add_stylesheet
to add theme_overridesc.css

Signed-off-by: Octavian Purdila <[email protected]>


          Documentation: add asciicast directive

Add asciicast directive that allows inserting asciinema "asciicasts" in
docs. The directive accepts a single mandatory parameter which is the
filename that stores the asciicast:

.. asciicast:: ascii.cast

Signed-off-by: Octavian Purdila <[email protected]>


          Documentation: ditaa: stop on errors

4cdab6d

Don't catch ditaa errors, let the user see them so that it is easier to
understand the root cause of failures.

Signed-off-by: Octavian Purdila <[email protected]>


          Documentation: teaching: conf.py: add non breakable space substitution

3689baa

Signed-off-by: Octavian Purdila <[email protected]>


          tools: labs: install gdb scripts

0f38c30

Signed-off-by: Octavian Purdila <[email protected]>

tavip requested review from dbaluta and razvand

February 27, 2018 04:34

dbaluta reviewed

View reviewed changes

Documentation/teaching/lectures/syscalls.rst Outdated

    
              For security reasons, when a user to kernel mode transitions occurs,

              the CPU's PC is set at specific kernel entry points. For system calls,

              this entry point will push the parameters on stack so that they are

              accesible by the system call functions and then it will run the system

Member

dbaluta Feb 27, 2018

so, it is not clear. parameters are passed in registers and then saved to kernel stack?

Documentation/teaching/lectures/syscalls.rst Outdated

    
                 * The kernel entry point saves registers on the kernel stack

                 * The system call dispatches identifies the system call function and

Member

dbaluta Feb 27, 2018

s/dispatches/dispatcher

Documentation/teaching/lectures/syscalls.rst Outdated

    
                 * When the system call function returns the userspace registers are

                   restored, execution is switched back to user mode and the

                   userspace application resumes

Member

dbaluta Feb 27, 2018

So, to get to kernel mode we use a trap. How do we switch back to userspace? Just manually change the CPL?

Documentation/teaching/lectures/syscalls.rst

    
                    __SYSCALL_I386(2, sys_fork, )

                    #else

                    __SYSCALL_I386(2, sys_fork, )

                    #endif

Member

dbaluta Feb 27, 2018

The entries for sys_fork look identical to me on both branches of #ifdef. Is this intended?

Member Author

tavip Feb 27, 2018

yes. This code is automatically generate and that why maybe it looks like this.

Documentation/teaching/lectures/syscalls.rst Outdated

    
              For example, lets consider the case where such a check is not made for

              the read or write system calls. If the user passes a kernel-space

              pointer to a write sysytem call then it can get access to kernel data

Member

dbaluta Feb 27, 2018

s/sysytem/system

Documentation/teaching/lectures/syscalls.rst Outdated

    
                    * Invalid syscall pointer: the faulting address is in kernel

              	space; the fault address is in userspace and it is invalid

Member

dbaluta Feb 27, 2018

I also don't get "invalid syscall pointer" bullet. :)

So do you mean invalid user space pointer that is accessed from kernel.

Documentation/teaching/lectures/syscalls.rst Outdated

    
              	space; the fault address is in userspace and it is invalid

                    * Kernel bug: same as above

Member

dbaluta Feb 27, 2018

also here we should add words even if it means copying thing said above. What is the difference between kernel bug and invalid userspace pointer.

Documentation/teaching/lectures/syscalls.rst Outdated

    
                 * The exact instructions that accesses user space are recorded in

                   a table (exception table)

Member

dbaluta Feb 27, 2018

accesses -> access

Documentation/teaching/lectures/syscalls.rst Outdated

    
                 +------------------+-----------------------+------------------------+

                 | Cost             |  Pointer checks       | Fault handling         |

                 +==================+=======================+========================+

                 | Valid address    | address space search  | 0                      |

Member

dbaluta Feb 27, 2018

[valid address - fault handling] I think it is O(1) because we still need to use copy_from_user which does some limits checking.

Documentation/teaching/lectures/syscalls.rst

    
                 | Invalid address  | address space search  | exception table search |

                 +------------------+-----------------------+------------------------+

Member

dbaluta Feb 27, 2018

It is not clear up till now which is the preferred method used by the kernel.

Member Author

tavip Feb 27, 2018

It does say so in the paragraph above


          Documentation: teaching: add system calls lecture

bdb7300

Signed-off-by: Octavian Purdila <[email protected]>

tavip force-pushed the lktp-syscalls branch from 3d89da2 to bdb7300 Compare

February 27, 2018 12:45

tavip pushed a commit to linux-kernel-labs/linux-kernel-labs.github.io that referenced this pull request


          Publish lktp-syscalls (built from linux-kernel-labs/linux#13)

1b15232

dbaluta approved these changes

View reviewed changes

tavip merged commit 1c634de into linux-kernel-labs:master

dbaluta pushed a commit to dbaluta/linux that referenced this pull request


          IB/core: Fix querying total rdma stats

a379d1c

rdma_counter_init() may fail for a device. In such case while calculating
total sum, ignore NULL hstats.

This fixes below observed call trace.

BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 8000001009b30067 P4D 8000001009b30067 PUD 10549c9067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 55 PID: 20887 Comm: cat Kdump: loaded Not tainted 5.2.0-rc6-jdc+ linux-kernel-labs#13
RIP: 0010:rdma_counter_get_hwstat_value+0xf2/0x150 [ib_core]
Call Trace:
 show_hw_stats+0x5e/0x130 [ib_core]
 dev_attr_show+0x15/0x50
 sysfs_kf_seq_show+0xc6/0x1a0
 seq_read+0x132/0x370
 vfs_read+0x89/0x140
 ksys_read+0x5c/0xd0
 do_syscall_64+0x5a/0x240
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: f34a55e ("RDMA/core: Get sum value of all counters when perform a sysfs stat read")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>

dbaluta pushed a commit to dbaluta/linux that referenced this pull request


          security: Fix the default value of secid_to_secctx hook

625236b

security_secid_to_secctx is called by the bpf_lsm hook and a successful
return value (i.e 0) implies that the parameter will be consumed by the
LSM framework. The current behaviour return success when the pointer
isn't initialized when CONFIG_BPF_LSM is enabled, with the default
return from kernel/bpf/bpf_lsm.c.

This is the internal error:

[ 1229.341488][ T2659] usercopy: Kernel memory exposure attempt detected from null address (offset 0, size 280)!
[ 1229.374977][ T2659] ------------[ cut here ]------------
[ 1229.376813][ T2659] kernel BUG at mm/usercopy.c:99!
[ 1229.378398][ T2659] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1229.380348][ T2659] Modules linked in:
[ 1229.381654][ T2659] CPU: 0 PID: 2659 Comm: systemd-journal Tainted: G    B   W         5.7.0-rc5-next-20200511-00019-g864e0c6319b8-dirty linux-kernel-labs#13
[ 1229.385429][ T2659] Hardware name: linux,dummy-virt (DT)
[ 1229.387143][ T2659] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[ 1229.389165][ T2659] pc : usercopy_abort+0xc8/0xcc
[ 1229.390705][ T2659] lr : usercopy_abort+0xc8/0xcc
[ 1229.392225][ T2659] sp : ffff000064247450
[ 1229.393533][ T2659] x29: ffff000064247460 x28: 0000000000000000
[ 1229.395449][ T2659] x27: 0000000000000118 x26: 0000000000000000
[ 1229.397384][ T2659] x25: ffffa000127049e0 x24: ffffa000127049e0
[ 1229.399306][ T2659] x23: ffffa000127048e0 x22: ffffa000127048a0
[ 1229.401241][ T2659] x21: ffffa00012704b80 x20: ffffa000127049e0
[ 1229.403163][ T2659] x19: ffffa00012704820 x18: 0000000000000000
[ 1229.405094][ T2659] x17: 0000000000000000 x16: 0000000000000000
[ 1229.407008][ T2659] x15: 0000000000000000 x14: 003d090000000000
[ 1229.408942][ T2659] x13: ffff80000d5b25b2 x12: 1fffe0000d5b25b1
[ 1229.410859][ T2659] x11: 1fffe0000d5b25b1 x10: ffff80000d5b25b1
[ 1229.412791][ T2659] x9 : ffffa0001034bee0 x8 : ffff00006ad92d8f
[ 1229.414707][ T2659] x7 : 0000000000000000 x6 : ffffa00015eacb20
[ 1229.416642][ T2659] x5 : ffff0000693c8040 x4 : 0000000000000000
[ 1229.418558][ T2659] x3 : ffffa0001034befc x2 : d57a7483a01c6300
[ 1229.420610][ T2659] x1 : 0000000000000000 x0 : 0000000000000059
[ 1229.422526][ T2659] Call trace:
[ 1229.423631][ T2659]  usercopy_abort+0xc8/0xcc
[ 1229.425091][ T2659]  __check_object_size+0xdc/0x7d4
[ 1229.426729][ T2659]  put_cmsg+0xa30/0xa90
[ 1229.428132][ T2659]  unix_dgram_recvmsg+0x80c/0x930
[ 1229.429731][ T2659]  sock_recvmsg+0x9c/0xc0
[ 1229.431123][ T2659]  ____sys_recvmsg+0x1cc/0x5f8
[ 1229.432663][ T2659]  ___sys_recvmsg+0x100/0x160
[ 1229.434151][ T2659]  __sys_recvmsg+0x110/0x1a8
[ 1229.435623][ T2659]  __arm64_sys_recvmsg+0x58/0x70
[ 1229.437218][ T2659]  el0_svc_common.constprop.1+0x29c/0x340
[ 1229.438994][ T2659]  do_el0_svc+0xe8/0x108
[ 1229.440587][ T2659]  el0_svc+0x74/0x88
[ 1229.441917][ T2659]  el0_sync_handler+0xe4/0x8b4
[ 1229.443464][ T2659]  el0_sync+0x17c/0x180
[ 1229.444920][ T2659] Code: aa1703e2 aa1603e1 910a8260 97ecc860 (d4210000)
[ 1229.447070][ T2659] ---[ end trace 400497d91baeaf51 ]---
[ 1229.448791][ T2659] Kernel panic - not syncing: Fatal exception
[ 1229.450692][ T2659] Kernel Offset: disabled
[ 1229.452061][ T2659] CPU features: 0x240002,20002004
[ 1229.453647][ T2659] Memory Limit: none
[ 1229.455015][ T2659] ---[ end Kernel panic - not syncing: Fatal exception ]---

Rework the so the default return value is -EOPNOTSUPP.

There are likely other callbacks such as security_inode_getsecctx() that
may have the same problem, and that someone that understand the code
better needs to audit them.

Thank you Arnd for helping me figure out what went wrong.

Fixes: 98e828a ("security: Refactor declaration of LSM hooks")
Signed-off-by: Anders Roxell <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: James Morris <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]

dbaluta pushed a commit to dbaluta/linux that referenced this pull request


          mtd: Fix mtd not registered due to nvmem name collision

7b01b72

When the nvmem framework is enabled, a nvmem device is created per mtd
device/partition.

It is not uncommon that a device can have multiple mtd devices with
partitions that have the same name. Eg, when there DT overlay is allowed
and the same device with mtd is attached twice.

Under that circumstances, the mtd fails to register due to a name
duplication on the nvmem framework.

With this patch we use the mtdX name instead of the partition name,
which is unique.

[    8.948991] sysfs: cannot create duplicate filename '/bus/nvmem/devices/Production Data'
[    8.948992] CPU: 7 PID: 246 Comm: systemd-udevd Not tainted 5.5.0-qtec-standard linux-kernel-labs#13
[    8.948993] Hardware name: AMD Dibbler/Dibbler, BIOS 05.22.04.0019 10/26/2019
[    8.948994] Call Trace:
[    8.948996]  dump_stack+0x50/0x70
[    8.948998]  sysfs_warn_dup.cold+0x17/0x2d
[    8.949000]  sysfs_do_create_link_sd.isra.0+0xc2/0xd0
[    8.949002]  bus_add_device+0x74/0x140
[    8.949004]  device_add+0x34b/0x850
[    8.949006]  nvmem_register.part.0+0x1bf/0x640
...
[    8.948926] mtd mtd8: Failed to register NVMEM device

Fixes: c4dfa25 ("mtd: add support for reading MTD devices via the nvmem API")
Signed-off-by: Ricardo Ribalda Delgado <[email protected]>
Acked-by: Miquel Raynal <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>

dbaluta pushed a commit to dbaluta/linux that referenced this pull request


          highmem: fix checks in __kmap_local_sched_{in,out}

66f133c

When CONFIG_DEBUG_KMAP_LOCAL is enabled __kmap_local_sched_{in,out} check
that even slots in the tsk->kmap_ctrl.pteval are unmapped.  The slots are
initialized with 0 value, but the check is done with pte_none.  0 pte
however does not necessarily mean that pte_none will return true.  e.g.
on xtensa it returns false, resulting in the following runtime warnings:

 WARNING: CPU: 0 PID: 101 at mm/highmem.c:627 __kmap_local_sched_out+0x51/0x108
 CPU: 0 PID: 101 Comm: touch Not tainted 5.17.0-rc7-00010-gd3a1cdde80d2-dirty linux-kernel-labs#13
 Call Trace:
   dump_stack+0xc/0x40
   __warn+0x8f/0x174
   warn_slowpath_fmt+0x48/0xac
   __kmap_local_sched_out+0x51/0x108
   __schedule+0x71a/0x9c4
   preempt_schedule_irq+0xa0/0xe0
   common_exception_return+0x5c/0x93
   do_wp_page+0x30e/0x330
   handle_mm_fault+0xa70/0xc3c
   do_page_fault+0x1d8/0x3c4
   common_exception+0x7f/0x7f

 WARNING: CPU: 0 PID: 101 at mm/highmem.c:664 __kmap_local_sched_in+0x50/0xe0
 CPU: 0 PID: 101 Comm: touch Tainted: G        W         5.17.0-rc7-00010-gd3a1cdde80d2-dirty linux-kernel-labs#13
 Call Trace:
   dump_stack+0xc/0x40
   __warn+0x8f/0x174
   warn_slowpath_fmt+0x48/0xac
   __kmap_local_sched_in+0x50/0xe0
   finish_task_switch$isra$0+0x1ce/0x2f8
   __schedule+0x86e/0x9c4
   preempt_schedule_irq+0xa0/0xe0
   common_exception_return+0x5c/0x93
   do_wp_page+0x30e/0x330
   handle_mm_fault+0xa70/0xc3c
   do_page_fault+0x1d8/0x3c4
   common_exception+0x7f/0x7f

Fix it by replacing !pte_none(pteval) with pte_val(pteval) != 0.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5fbda3e ("sched: highmem: Store local kmaps in task struct")
Signed-off-by: Max Filippov <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet