Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add system calls lecture #13

Merged
merged 8 commits into from
Feb 27, 2018
Merged

Conversation

tavip
Copy link
Member

@tavip tavip commented Feb 27, 2018

This PR also adds an asciicast directive for asciiname playback.

This makes it easier to edit ditaa directives since insertion will
always move the rest of the row by one.

Signed-off-by: Octavian Purdila <[email protected]>
Fixes the following warning:

Documentation/teaching/lectures/intro.rst:737: WARNING: Bullet list
ends without a blank line; unexpected unindent

Signed-off-by: Octavian Purdila <[email protected]>
As noted in Sphinx #2442 new CSS added by extensions are rendered
innefective if html_context its changed. So, instead, use add_stylesheet
to add theme_overridesc.css

Signed-off-by: Octavian Purdila <[email protected]>
Add asciicast directive that allows inserting asciinema "asciicasts" in
docs. The directive accepts a single mandatory parameter which is the
filename that stores the asciicast:

.. asciicast:: ascii.cast

Signed-off-by: Octavian Purdila <[email protected]>
Don't catch ditaa errors, let the user see them so that it is easier to
understand the root cause of failures.

Signed-off-by: Octavian Purdila <[email protected]>
Signed-off-by: Octavian Purdila <[email protected]>
@tavip tavip requested review from dbaluta and razvand February 27, 2018 04:34
For security reasons, when a user to kernel mode transitions occurs,
the CPU's PC is set at specific kernel entry points. For system calls,
this entry point will push the parameters on stack so that they are
accesible by the system call functions and then it will run the system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, it is not clear. parameters are passed in registers and then saved to kernel stack?


* The kernel entry point saves registers on the kernel stack

* The system call dispatches identifies the system call function and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/dispatches/dispatcher


* When the system call function returns the userspace registers are
restored, execution is switched back to user mode and the
userspace application resumes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to get to kernel mode we use a trap. How do we switch back to userspace? Just manually change the CPL?

__SYSCALL_I386(2, sys_fork, )
#else
__SYSCALL_I386(2, sys_fork, )
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entries for sys_fork look identical to me on both branches of #ifdef. Is this intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. This code is automatically generate and that why maybe it looks like this.


For example, lets consider the case where such a check is not made for
the read or write system calls. If the user passes a kernel-space
pointer to a write sysytem call then it can get access to kernel data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/sysytem/system


* Invalid syscall pointer: the faulting address is in kernel
space; the fault address is in userspace and it is invalid

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't get "invalid syscall pointer" bullet. :)

So do you mean invalid user space pointer that is accessed from kernel.

space; the fault address is in userspace and it is invalid

* Kernel bug: same as above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also here we should add words even if it means copying thing said above. What is the difference between kernel bug and invalid userspace pointer.


* The exact instructions that accesses user space are recorded in
a table (exception table)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accesses -> access

+------------------+-----------------------+------------------------+
| Cost | Pointer checks | Fault handling |
+==================+=======================+========================+
| Valid address | address space search | 0 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[valid address - fault handling] I think it is O(1) because we still need to use copy_from_user which does some limits checking.

| Invalid address | address space search | exception table search |
+------------------+-----------------------+------------------------+


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear up till now which is the preferred method used by the kernel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does say so in the paragraph above

tavip pushed a commit to linux-kernel-labs/linux-kernel-labs.github.io that referenced this pull request Feb 27, 2018
@tavip tavip merged commit 1c634de into linux-kernel-labs:master Feb 27, 2018
dbaluta pushed a commit to dbaluta/linux that referenced this pull request Aug 9, 2019
rdma_counter_init() may fail for a device. In such case while calculating
total sum, ignore NULL hstats.

This fixes below observed call trace.

BUG: kernel NULL pointer dereference, address: 00000000000000a0
PGD 8000001009b30067 P4D 8000001009b30067 PUD 10549c9067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 55 PID: 20887 Comm: cat Kdump: loaded Not tainted 5.2.0-rc6-jdc+ linux-kernel-labs#13
RIP: 0010:rdma_counter_get_hwstat_value+0xf2/0x150 [ib_core]
Call Trace:
 show_hw_stats+0x5e/0x130 [ib_core]
 dev_attr_show+0x15/0x50
 sysfs_kf_seq_show+0xc6/0x1a0
 seq_read+0x132/0x370
 vfs_read+0x89/0x140
 ksys_read+0x5c/0xd0
 do_syscall_64+0x5a/0x240
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: f34a55e ("RDMA/core: Get sum value of all counters when perform a sysfs stat read")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Parav Pandit <[email protected]>
Reviewed-by: Mark Zhang <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
dbaluta pushed a commit to dbaluta/linux that referenced this pull request Jun 17, 2020
security_secid_to_secctx is called by the bpf_lsm hook and a successful
return value (i.e 0) implies that the parameter will be consumed by the
LSM framework. The current behaviour return success when the pointer
isn't initialized when CONFIG_BPF_LSM is enabled, with the default
return from kernel/bpf/bpf_lsm.c.

This is the internal error:

[ 1229.341488][ T2659] usercopy: Kernel memory exposure attempt detected from null address (offset 0, size 280)!
[ 1229.374977][ T2659] ------------[ cut here ]------------
[ 1229.376813][ T2659] kernel BUG at mm/usercopy.c:99!
[ 1229.378398][ T2659] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
[ 1229.380348][ T2659] Modules linked in:
[ 1229.381654][ T2659] CPU: 0 PID: 2659 Comm: systemd-journal Tainted: G    B   W         5.7.0-rc5-next-20200511-00019-g864e0c6319b8-dirty linux-kernel-labs#13
[ 1229.385429][ T2659] Hardware name: linux,dummy-virt (DT)
[ 1229.387143][ T2659] pstate: 80400005 (Nzcv daif +PAN -UAO BTYPE=--)
[ 1229.389165][ T2659] pc : usercopy_abort+0xc8/0xcc
[ 1229.390705][ T2659] lr : usercopy_abort+0xc8/0xcc
[ 1229.392225][ T2659] sp : ffff000064247450
[ 1229.393533][ T2659] x29: ffff000064247460 x28: 0000000000000000
[ 1229.395449][ T2659] x27: 0000000000000118 x26: 0000000000000000
[ 1229.397384][ T2659] x25: ffffa000127049e0 x24: ffffa000127049e0
[ 1229.399306][ T2659] x23: ffffa000127048e0 x22: ffffa000127048a0
[ 1229.401241][ T2659] x21: ffffa00012704b80 x20: ffffa000127049e0
[ 1229.403163][ T2659] x19: ffffa00012704820 x18: 0000000000000000
[ 1229.405094][ T2659] x17: 0000000000000000 x16: 0000000000000000
[ 1229.407008][ T2659] x15: 0000000000000000 x14: 003d090000000000
[ 1229.408942][ T2659] x13: ffff80000d5b25b2 x12: 1fffe0000d5b25b1
[ 1229.410859][ T2659] x11: 1fffe0000d5b25b1 x10: ffff80000d5b25b1
[ 1229.412791][ T2659] x9 : ffffa0001034bee0 x8 : ffff00006ad92d8f
[ 1229.414707][ T2659] x7 : 0000000000000000 x6 : ffffa00015eacb20
[ 1229.416642][ T2659] x5 : ffff0000693c8040 x4 : 0000000000000000
[ 1229.418558][ T2659] x3 : ffffa0001034befc x2 : d57a7483a01c6300
[ 1229.420610][ T2659] x1 : 0000000000000000 x0 : 0000000000000059
[ 1229.422526][ T2659] Call trace:
[ 1229.423631][ T2659]  usercopy_abort+0xc8/0xcc
[ 1229.425091][ T2659]  __check_object_size+0xdc/0x7d4
[ 1229.426729][ T2659]  put_cmsg+0xa30/0xa90
[ 1229.428132][ T2659]  unix_dgram_recvmsg+0x80c/0x930
[ 1229.429731][ T2659]  sock_recvmsg+0x9c/0xc0
[ 1229.431123][ T2659]  ____sys_recvmsg+0x1cc/0x5f8
[ 1229.432663][ T2659]  ___sys_recvmsg+0x100/0x160
[ 1229.434151][ T2659]  __sys_recvmsg+0x110/0x1a8
[ 1229.435623][ T2659]  __arm64_sys_recvmsg+0x58/0x70
[ 1229.437218][ T2659]  el0_svc_common.constprop.1+0x29c/0x340
[ 1229.438994][ T2659]  do_el0_svc+0xe8/0x108
[ 1229.440587][ T2659]  el0_svc+0x74/0x88
[ 1229.441917][ T2659]  el0_sync_handler+0xe4/0x8b4
[ 1229.443464][ T2659]  el0_sync+0x17c/0x180
[ 1229.444920][ T2659] Code: aa1703e2 aa1603e1 910a8260 97ecc860 (d4210000)
[ 1229.447070][ T2659] ---[ end trace 400497d91baeaf51 ]---
[ 1229.448791][ T2659] Kernel panic - not syncing: Fatal exception
[ 1229.450692][ T2659] Kernel Offset: disabled
[ 1229.452061][ T2659] CPU features: 0x240002,20002004
[ 1229.453647][ T2659] Memory Limit: none
[ 1229.455015][ T2659] ---[ end Kernel panic - not syncing: Fatal exception ]---

Rework the so the default return value is -EOPNOTSUPP.

There are likely other callbacks such as security_inode_getsecctx() that
may have the same problem, and that someone that understand the code
better needs to audit them.

Thank you Arnd for helping me figure out what went wrong.

Fixes: 98e828a ("security: Refactor declaration of LSM hooks")
Signed-off-by: Anders Roxell <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Acked-by: James Morris <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
dbaluta pushed a commit to dbaluta/linux that referenced this pull request Jun 17, 2020
When the nvmem framework is enabled, a nvmem device is created per mtd
device/partition.

It is not uncommon that a device can have multiple mtd devices with
partitions that have the same name. Eg, when there DT overlay is allowed
and the same device with mtd is attached twice.

Under that circumstances, the mtd fails to register due to a name
duplication on the nvmem framework.

With this patch we use the mtdX name instead of the partition name,
which is unique.

[    8.948991] sysfs: cannot create duplicate filename '/bus/nvmem/devices/Production Data'
[    8.948992] CPU: 7 PID: 246 Comm: systemd-udevd Not tainted 5.5.0-qtec-standard linux-kernel-labs#13
[    8.948993] Hardware name: AMD Dibbler/Dibbler, BIOS 05.22.04.0019 10/26/2019
[    8.948994] Call Trace:
[    8.948996]  dump_stack+0x50/0x70
[    8.948998]  sysfs_warn_dup.cold+0x17/0x2d
[    8.949000]  sysfs_do_create_link_sd.isra.0+0xc2/0xd0
[    8.949002]  bus_add_device+0x74/0x140
[    8.949004]  device_add+0x34b/0x850
[    8.949006]  nvmem_register.part.0+0x1bf/0x640
...
[    8.948926] mtd mtd8: Failed to register NVMEM device

Fixes: c4dfa25 ("mtd: add support for reading MTD devices via the nvmem API")
Signed-off-by: Ricardo Ribalda Delgado <[email protected]>
Acked-by: Miquel Raynal <[email protected]>
Signed-off-by: Richard Weinberger <[email protected]>
dbaluta pushed a commit to dbaluta/linux that referenced this pull request May 3, 2022
When CONFIG_DEBUG_KMAP_LOCAL is enabled __kmap_local_sched_{in,out} check
that even slots in the tsk->kmap_ctrl.pteval are unmapped.  The slots are
initialized with 0 value, but the check is done with pte_none.  0 pte
however does not necessarily mean that pte_none will return true.  e.g.
on xtensa it returns false, resulting in the following runtime warnings:

 WARNING: CPU: 0 PID: 101 at mm/highmem.c:627 __kmap_local_sched_out+0x51/0x108
 CPU: 0 PID: 101 Comm: touch Not tainted 5.17.0-rc7-00010-gd3a1cdde80d2-dirty linux-kernel-labs#13
 Call Trace:
   dump_stack+0xc/0x40
   __warn+0x8f/0x174
   warn_slowpath_fmt+0x48/0xac
   __kmap_local_sched_out+0x51/0x108
   __schedule+0x71a/0x9c4
   preempt_schedule_irq+0xa0/0xe0
   common_exception_return+0x5c/0x93
   do_wp_page+0x30e/0x330
   handle_mm_fault+0xa70/0xc3c
   do_page_fault+0x1d8/0x3c4
   common_exception+0x7f/0x7f

 WARNING: CPU: 0 PID: 101 at mm/highmem.c:664 __kmap_local_sched_in+0x50/0xe0
 CPU: 0 PID: 101 Comm: touch Tainted: G        W         5.17.0-rc7-00010-gd3a1cdde80d2-dirty linux-kernel-labs#13
 Call Trace:
   dump_stack+0xc/0x40
   __warn+0x8f/0x174
   warn_slowpath_fmt+0x48/0xac
   __kmap_local_sched_in+0x50/0xe0
   finish_task_switch$isra$0+0x1ce/0x2f8
   __schedule+0x86e/0x9c4
   preempt_schedule_irq+0xa0/0xe0
   common_exception_return+0x5c/0x93
   do_wp_page+0x30e/0x330
   handle_mm_fault+0xa70/0xc3c
   do_page_fault+0x1d8/0x3c4
   common_exception+0x7f/0x7f

Fix it by replacing !pte_none(pteval) with pte_val(pteval) != 0.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: 5fbda3e ("sched: highmem: Store local kmaps in task struct")
Signed-off-by: Max Filippov <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: "Peter Zijlstra (Intel)" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants