-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5.4.210 velinux #3
Closed
liuxuepinggit
wants to merge
5,345
commits into
openvelinux:5.4.143-velinux
from
liuxuepinggit:5.4.210-velinux
Closed
5.4.210 velinux #3
liuxuepinggit
wants to merge
5,345
commits into
openvelinux:5.4.143-velinux
from
liuxuepinggit:5.4.210-velinux
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Added support for batch requests, per crypto engine. A new callback is added, do_batch_requests, which executes a batch of requests. This has the crypto_engine structure as argument (for cases when more than one crypto-engine is used). The crypto_engine_alloc_init_and_set function, initializes crypto-engine, but also, sets the do_batch_requests callback. On crypto_pump_requests, if do_batch_requests callback is implemented in a driver, this will be executed. The link between the requests will be done in driver, if possible. do_batch_requests is available only if the hardware has support for multiple request. Signed-off-by: Iuliana Prodan <[email protected]> Signed-off-by: Herbert Xu <[email protected]> (cherry picked from commit 8d90822) Signed-off-by: lei he <[email protected]>
Fix wrong test data at testmgr.h, it seems to be caused by ignoring the last '\0' when calling sizeof. Signed-off-by: Lei He <[email protected]> Signed-off-by: Herbert Xu <[email protected]> (cherry picked from commit 39ef085) Signed-off-by: lei he <[email protected]>
According to the BER encoding rules, integer value should be encoded as two's complement, and if the highest bit of a positive integer is 1, should add a leading zero-octet. The kernel's built-in RSA algorithm cannot recognize negative numbers when parsing keys, so it can pass this test case. Export the key to file and run the following command to verify the fix result: openssl asn1parse -inform DER -in /path/to/key/file Signed-off-by: Lei He <[email protected]> Signed-off-by: Herbert Xu <[email protected]> (cherry picked from commit a988701) Signed-off-by: lei he <[email protected]>
Base on the lastest virtio crypto spec, define VIRTIO_CRYPTO_NOSPC. Reviewed-by: Gonglei <[email protected]> Signed-off-by: zhenwei pi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 13d640a) Signed-off-by: lei he <[email protected]>
Since crypto is a modern-only device, tag config space fields as having little endian-ness. Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> (cherry picked from commit 24bcf35) Signed-off-by: lei he <[email protected]>
Introduce asymmetric service definition, asymmetric operations and several well known algorithms. Co-developed-by: lei he <[email protected]> Signed-off-by: lei he <[email protected]> Signed-off-by: zhenwei pi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Gonglei <[email protected]> (cherry picked from commit 24e1959) Signed-off-by: lei he <[email protected]>
The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Cc: Gonglei <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Cc: Herbert Xu <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Herbert Xu <[email protected]> (cherry picked from commit d01a9f7) Signed-off-by: lei he <[email protected]>
Support rsa & pkcs1pad(rsa,sha1) with priority 150. Test with QEMU built-in backend, it works fine. 1, The self-test framework of crypto layer works fine in guest kernel 2, Test with Linux guest(with asym support), the following script test(note that pkey_XXX is supported only in a newer version of keyutils): - both public key & private key - create/close session - encrypt/decrypt/sign/verify basic driver operation - also test with kernel crypto layer(pkey add/query) All the cases work fine. rm -rf *.der *.pem *.pfx modprobe pkcs8_key_parser # if CONFIG_PKCS8_PRIVATE_KEY_PARSER=m rm -rf /tmp/data dd if=/dev/random of=/tmp/data count=1 bs=226 openssl req -nodes -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem -subj "/C=CN/ST=BJ/L=HD/O=qemu/OU=dev/CN=qemu/[email protected]" openssl pkcs8 -in key.pem -topk8 -nocrypt -outform DER -out key.der openssl x509 -in cert.pem -inform PEM -outform DER -out cert.der PRIV_KEY_ID=`cat key.der | keyctl padd asymmetric test_priv_key @s` echo "priv key id = "$PRIV_KEY_ID PUB_KEY_ID=`cat cert.der | keyctl padd asymmetric test_pub_key @s` echo "pub key id = "$PUB_KEY_ID keyctl pkey_query $PRIV_KEY_ID 0 keyctl pkey_query $PUB_KEY_ID 0 echo "Enc with priv key..." keyctl pkey_encrypt $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.priv echo "Dec with pub key..." keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.priv enc=pkcs1 >/tmp/dec cmp /tmp/data /tmp/dec echo "Sign with priv key..." keyctl pkey_sign $PRIV_KEY_ID 0 /tmp/data enc=pkcs1 hash=sha1 > /tmp/sig echo "Verify with pub key..." keyctl pkey_verify $PRIV_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1 echo "Enc with pub key..." keyctl pkey_encrypt $PUB_KEY_ID 0 /tmp/data enc=pkcs1 >/tmp/enc.pub echo "Dec with priv key..." keyctl pkey_decrypt $PRIV_KEY_ID 0 /tmp/enc.pub enc=pkcs1 >/tmp/dec cmp /tmp/data /tmp/dec echo "Verify with pub key..." keyctl pkey_verify $PUB_KEY_ID 0 /tmp/data /tmp/sig enc=pkcs1 hash=sha1 Conflicts & Solve: - drivers/crypto/virtio/Kconfig Except that 'select CRYPTO_SKCIPHER' is removed, all changes from HEAD and cherry-picked commit are both kept. - drivers/crypto/virito/virtio_crypto_core.c There is an upstream commit that changed macro 'virito_cread' to 'virito_cread_le'. Since this commit introduced a lot of changes (and conflicts), here just modified the cherry-picked commit, let it keep use 'virito_cread'. - drivers/crypto/virtio/virtio_crypto_akcipher_algs.c Replace kfree_sensitive with kzfree, see: https://patchwork.kernel.org/project/target-devel/patch/[email protected]/ [1 compiling warning during development] Reported-by: kernel test robot <[email protected]> Co-developed-by: lei he <[email protected]> Signed-off-by: lei he <[email protected]> Signed-off-by: zhenwei pi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Gonglei <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> #Kconfig tweaks Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 59ca6c9) Signed-off-by: lei he <[email protected]>
Suggested by Gonglei, rename virtio_crypto_algs.c to virtio_crypto_skcipher_algs.c. Also minor changes for function name. Thus the function of source files get clear: skcipher services in virtio_crypto_skcipher_algs.c and akcipher services in virtio_crypto_akcipher_algs.c. Signed-off-by: zhenwei pi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Gonglei <[email protected]> (cherry picked from commit ea993de) Signed-off-by: lei he <[email protected]>
Use temporary variable to make code easy to read and maintain. /* Pad cipher's parameters */ vcrypto->ctrl.u.sym_create_session.op_type = cpu_to_le32(VIRTIO_CRYPTO_SYM_OP_CIPHER); vcrypto->ctrl.u.sym_create_session.u.cipher.para.algo = vcrypto->ctrl.header.algo; vcrypto->ctrl.u.sym_create_session.u.cipher.para.keylen = cpu_to_le32(keylen); vcrypto->ctrl.u.sym_create_session.u.cipher.para.op = cpu_to_le32(op); --> sym_create_session = &ctrl->u.sym_create_session; sym_create_session->op_type = cpu_to_le32(VIRTIO_CRYPTO_SYM_OP_CIPHER); sym_create_session->u.cipher.para.algo = ctrl->header.algo; sym_create_session->u.cipher.para.keylen = cpu_to_le32(keylen); sym_create_session->u.cipher.para.op = cpu_to_le32(op); The new style shows more obviously: - the variable we want to operate. - an assignment statement in a single line. Conflicts & Solve: - drivers/crypto/virito/virtio_crypto_skcipher_algs.c Use 'kzfree' instead of 'kfree_sensitive' because currently kfree_sensitive is not imported, see: 'https://patchwork.kernel.org/project/target-devel/patch/[email protected]/' Cc: Michael S. Tsirkin <[email protected]> Cc: Jason Wang <[email protected]> Cc: Gonglei <[email protected]> Reviewed-by: Gonglei <[email protected]> Signed-off-by: zhenwei pi <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 6fd763d) Signed-off-by: lei he <[email protected]>
Originally, all of the control requests share a single buffer( ctrl & input & ctrl_status fields in struct virtio_crypto), this allows queue depth 1 only, the performance of control queue gets limited by this design. In this patch, each request allocates request buffer dynamically, and free buffer after request, so the scope protected by ctrl_lock also get optimized here. It's possible to optimize control queue depth in the next step. A necessary comment is already in code, still describe it again: /* * Note: there are padding fields in request, clear them to zero before * sending to host to avoid to divulge any information. * Ex, virtio_crypto_ctrl_request::ctrl::u::destroy_session::padding[48] */ So use kzalloc to allocate buffer of struct virtio_crypto_ctrl_request. Conflicts & Solve: - drivers/crypto/virtio/virtio_crypto_skcipher_algs.c 1. Use changes from cherry-picked commit 2. replace the kfree_sensitive with kzfree Potentially dereferencing uninitialized variables: Reported-by: kernel test robot <[email protected]> Reported-by: Dan Carpenter <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: Jason Wang <[email protected]> Cc: Gonglei <[email protected]> Reviewed-by: Gonglei <[email protected]> Signed-off-by: zhenwei pi <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 0756ad1) Signed-off-by: lei he <[email protected]>
Originally, after submitting request into virtio crypto control queue, the guest side polls the result from the virt queue. This works like following: CPU0 CPU1 ... CPUx CPUy | | | | \ \ / / \--------spin_lock(&vcrypto->ctrl_lock)-------/ | virtqueue add & kick | busy poll virtqueue | spin_unlock(&vcrypto->ctrl_lock) ... There are two problems: 1, The queue depth is always 1, the performance of a virtio crypto device gets limited. Multi user processes share a single control queue, and hit spin lock race from control queue. Test on Intel Platinum 8260, a single worker gets ~35K/s create/close session operations, and 8 workers get ~40K/s operations with 800% CPU utilization. 2, The control request is supposed to get handled immediately, but in the current implementation of QEMU(v6.2), the vCPU thread kicks another thread to do this work, the latency also gets unstable. Tracking latency of virtio_crypto_alg_akcipher_close_session in 5s: usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 7 | | 4 -> 7 : 72 | | 8 -> 15 : 186485 |************************| 16 -> 31 : 687 | | 32 -> 63 : 5 | | 64 -> 127 : 3 | | 128 -> 255 : 1 | | 256 -> 511 : 0 | | 512 -> 1023 : 0 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 0 | | 8192 -> 16383 : 2 | | This means that a CPU may hold vcrypto->ctrl_lock as long as 8192~16383us. To improve the performance of control queue, a request on control queue waits completion instead of busy polling to reduce lock racing, and gets completed by control queue callback. CPU0 CPU1 ... CPUx CPUy | | | | \ \ / / \--------spin_lock(&vcrypto->ctrl_lock)-------/ | virtqueue add & kick | ---------spin_unlock(&vcrypto->ctrl_lock)------ / / \ \ | | | | wait wait wait wait Test this patch, the guest side get ~200K/s operations with 300% CPU utilization. Cc: Michael S. Tsirkin <[email protected]> Cc: Jason Wang <[email protected]> Cc: Gonglei <[email protected]> Reviewed-by: Gonglei <[email protected]> Signed-off-by: zhenwei pi <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 977231e) Signed-off-by: lei he <[email protected]>
For some akcipher operations(eg, decryption of pkcs1pad(rsa)), the length of returned result maybe less than akcipher_req->dst_len, we need to recalculate the actual dst_len through the virt-queue protocol. Cc: Michael S. Tsirkin <[email protected]> Cc: Jason Wang <[email protected]> Cc: Gonglei <[email protected]> Reviewed-by: Gonglei <[email protected]> Signed-off-by: lei he <[email protected]> Signed-off-by: zhenwei pi <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit a36bd0a) Signed-off-by: lei he <[email protected]>
Enable retry for virtio-crypto-dev, so that crypto-engine can process cipher-requests parallelly. Cc: Michael S. Tsirkin <[email protected]> Cc: Jason Wang <[email protected]> Cc: Gonglei <[email protected]> Reviewed-by: Gonglei <[email protected]> Signed-off-by: lei he <[email protected]> Signed-off-by: zhenwei pi <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> (cherry picked from commit 4e0d352) Signed-off-by: lei he <[email protected]>
Now, in crypto-engine, if hardware queue is full (-ENOSPC), requeue request regardless of MAY_BACKLOG flag. If hardware throws any other error code (like -EIO, -EINVAL, -ENOMEM, etc.) only MAY_BACKLOG requests are enqueued back into crypto-engine's queue, since the others can be dropped. The latter case can be fatal error, so those cannot be recovered from. For example, in CAAM driver, -EIO is returned in case the job descriptor is broken, so there is no possibility to fix the job descriptor. Therefore, these errors might be fatal error, so we shouldn’t requeue the request. This will just be pass back and forth between crypto-engine and hardware. Fixes: 6a89f49 ("crypto: engine - support for parallel requests based on retry mechanism") Signed-off-by: Iuliana Prodan <[email protected]> Reported-by: Horia Geantă <[email protected]> Reviewed-by: Horia Geantă <[email protected]> Signed-off-by: Herbert Xu <[email protected]> (cherry picked from commit d1c72f6) Signed-off-by: lei he <[email protected]>
According to PKCS#1 standard, the 'otherPrimeInfos' field contains the information for the additional primes r_3, ..., r_u, in order. It shall be omitted if the version is 0 and shall contain at least one instance of OtherPrimeInfo if the version is 1, see: https://www.rfc-editor.org/rfc/rfc3447#page-44 Replace the version number '1' with 0, otherwise, some drivers may not pass the run-time tests. Signed-off-by: lei he <[email protected]> (cherry picked from commit 0bb8f12) Signed-off-by: lei he <[email protected]>
…stat of cgroup v2 commit 673520f upstream. There are already statistics of {pgscan,pgsteal}_kswapd and {pgscan,pgsteal}_direct of memcg event here, but now only the sum of the two is displayed in memory.stat of cgroup v2. In order to obtain more accurate information during monitoring and debugging, and to align with the display in /proc/vmstat, it better to display {pgscan,pgsteal}_kswapd and {pgscan,pgsteal}_direct separately. Also, for forward compatibility, we still display pgscan and pgsteal items so that it won't break existing applications. [[email protected]: add comment for memcg_vm_event_stat (suggested by Michal)] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: fix the doc, thanks to Johannes] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Qi Zheng <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Muchun Song <[email protected]> Acked-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
commit bcb1704 upstream. Two new statistics are introduced to show the internal of burst feature and explain why burst helps or not. nr_bursts: number of periods bandwidth burst occurs burst_time: cumulative wall-time (in nanoseconds) that any cpus has used above quota in respective periods Co-developed-by: Shanpei Chen <[email protected]> Signed-off-by: Shanpei Chen <[email protected]> Co-developed-by: Tianchen Ding <[email protected]> Signed-off-by: Tianchen Ding <[email protected]> Signed-off-by: Huaixin Chang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Daniel Jordan <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hao Jia <[email protected]>
commit 7ebc7dc upstream. Since commit d6a3b24 these three lines are merged into one by the RST processor, making it hard to read. Use bullet points to separate the entries, like it's done in other similar places. Signed-off-by: Kir Kolyshkin <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]> Signed-off-by: Hao Jia <[email protected]>
commit d73df88 upstream. Basic description of usage and effect for CFS Bandwidth Control Burst. Co-developed-by: Shanpei Chen <[email protected]> Signed-off-by: Shanpei Chen <[email protected]> Co-developed-by: Tianchen Ding <[email protected]> Signed-off-by: Tianchen Ding <[email protected]> Signed-off-by: Huaixin Chang <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Daniel Jordan <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hao Jia <[email protected]>
commit a6d210d upstream. Using bitops makes bit masks and shifts easier to read. Tested-by: Brad Campbell <[email protected]> Tested-by: Bernhard Gebetsberger <[email protected]> Tested-by: Holger Kiehl <[email protected]> Tested-by: Michael Larabel <[email protected]> Tested-by: Jonathan McDowell <[email protected]> Tested-by: Ken Moffat <[email protected]> Tested-by: Darren Salt <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit d547552 upstream. Convert driver to use devm_hwmon_device_register_with_info to simplify the code and to reduce its size. Old size (x86_64): text data bss dec hex filename 8247 4488 64 12799 31ff drivers/hwmon/k10temp.o New size: text data bss dec hex filename 6778 2792 64 9634 25a2 drivers/hwmon/k10temp.o Tested-by: Brad Campbell <[email protected]> Tested-by: Bernhard Gebetsberger <[email protected]> Tested-by: Holger Kiehl <[email protected]> Tested-by: Michael Larabel <[email protected]> Tested-by: Jonathan McDowell <[email protected]> Tested-by: Ken Moffat <[email protected]> Tested-by: Darren Salt <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit c757938 upstream. Zen2 reports reporting temperatures per CPU die (called Core Complex Dies, or CCD, by AMD). Add support for it to the k10temp driver. Tested-by: Brad Campbell <[email protected]> Tested-by: Bernhard Gebetsberger <[email protected]> Tested-by: Holger Kiehl <[email protected]> Tested-by: Michael Larabel <[email protected]> Tested-by: Jonathan McDowell <[email protected]> Tested-by: Ken Moffat <[email protected]> Tested-by: Darren Salt <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit b00647c upstream. Ryzen CPUs report core and SoC voltages and currents. Add support for it to the k10temp driver. For the time being, only report voltages and currents for Ryzen CPUs. Threadripper and EPYC appear to use a different mechanism. Tested-by: Brad Campbell <[email protected]> Tested-by: Bernhard Gebetsberger <[email protected]> Tested-by: Holger Kiehl <[email protected]> Tested-by: Michael Larabel <[email protected]> Tested-by: Jonathan McDowell <[email protected]> Tested-by: Ken Moffat <[email protected]> Tested-by: Darren Salt <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit 70831c8 upstream. The maximum Tdie or Tctl is not published for Ryzen CPUs. What is known, however, is that the traditional value of 70 degrees C is no longer correct. On top of that, the limit applies to Tctl, not to Tdie. Displaying it in either context is meaningless, confusing, and wrong. Stop doing it. Tested-by: Brad Campbell <[email protected]> Tested-by: Holger Kiehl <[email protected]> Tested-by: Michael Larabel <[email protected]> Tested-by: Jonathan McDowell <[email protected]> Tested-by: Ken Moffat <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit 9c4a38f upstream. Show thermal and SVI registers for Family 17h CPUs. Tested-by: Sebastian Reichel <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit fd8bdb2 upstream. In HWiNFO, we see support for Tccd1, Tccd3, Tccd5, and Tccd7 temperature sensors on Zen2 based Threadripper CPUs. Checking register maps on Threadripper 3970X confirms SMN register addresses and values for those sensors. Register values observed in an idle system: 0x059950: 00000000 00000abc 00000000 00000ad8 0x059960: 00000000 00000ade 00000000 00000ae4 Under load: 0x059950: 00000000 00000c02 00000000 00000c14 0x059960: 00000000 00000c30 00000000 00000c22 More analysis shows that EPYC CPUs support up to 8 CCD temperature sensors. EPYC 7601 supports three CCD temperature sensors. Unlike Zen2 CPUs, the register space in Zen1 CPUs supports a maximum of four sensors, so only search for a maximum of four sensors on Zen1 CPUs. On top of that, in thm_10_0_sh_mask.h in the Linux kernel, we find definitions for THM_DIE{1-3}_TEMP__VALID_MASK, set to 0x00000800, as well as matching SMN addresses. This lets us conclude that bit 11 of the respective registers is a valid bit. With this assumption, the temperature offset is now 49 degrees C. This conveniently matches the documented temperature offset for Tdie, again suggesting that above registers indeed report temperatures sensor values. Assume that bit 11 is indeed a valid bit, and add support for the additional sensors. With this patch applied, output from 3970X (idle) looks as follows: k10temp-pci-00c3 Adapter: PCI adapter Tdie: +55.9°C Tctl: +55.9°C Tccd1: +39.8°C Tccd3: +43.8°C Tccd5: +43.8°C Tccd7: +44.8°C Tested-by: Michael Larabel <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit b02c685 upstream. Traditionally, the temperature displayed by k10temp was Tctl. On Family 17h CPUs, Tdie was displayed instead. To reduce confusion, Tctl was added later as second temperature. This resulted in Tdie being reported as temp1_input, and Tctl as temp2_input. This is different to non-Ryzen CPUs, where Tctl is displayed as temp1_input. Swap temp1_input and temp2_input on Family 17h CPUs, such that Tctl is now reported as temp1_input and Tdie is reported as temp2_input, to align with other CPUs, streamline the code, and make it less confusing. Coincidentally, this also aligns the code with its documentation, which states that Tdie is reported as temp2_input. Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit 6046524 upstream. Use a bit map to describe if temperature channels are supported, and use it for all temperature channels. Use a separate flag, independent of Tdie support, to indicate if the system is running on a Ryzen CPU. Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit 0e786f3 upstream. Fix the following sparse warning: drivers/hwmon/k10temp.c:189:12: warning: symbol 'k10temp_temp_label' was not declared. Should it be static? drivers/hwmon/k10temp.c:202:12: warning: symbol 'k10temp_in_label' was not declared. Should it be static? drivers/hwmon/k10temp.c:207:12: warning: symbol 'k10temp_curr_label' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Xie Haocheng <[email protected]>
commit ce4b815 upstream. s_inodes is superblock-specific resource, which should be protected by sb's specific lock s_inode_list_lock. Link: https://lore.kernel.org/r/TYCP286MB23238380DE3B74874E8D78ABCA299@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM Fixes: 7d41963 ("erofs: Support sharing cookies in the same domain") Reviewed-by: Yue Hu <[email protected]> Reviewed-by: Jia Zhu <[email protected]> Reviewed-by: Jingbo Xu <[email protected]> Signed-off-by: Dawei Li <[email protected]> Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Jia Zhu <[email protected]>
commit 39bfcb8 upstream. When erofs instance is remounted with fsid or domain_id mount option specified, the original fsid and domain_id string pointer in sbi->opt is directly overridden with the fsid and domain_id string in the new fs_context, without freeing the original fsid and domain_id string. What's worse, when the new fsid and domain_id string is transferred to sbi, they are not reset to NULL in fs_context, and thus they are freed when remount finishes, while sbi is still referring to these strings. Reconfiguration for fsid and domain_id seems unusual. Thus clarify this restriction explicitly and dump a warning when users are attempting to do this. Besides, to fix the use-after-free issue, move fsid and domain_id from erofs_mount_opts to outside. Fixes: c6be2bd ("erofs: register fscache volume") Fixes: 8b7adf1 ("erofs: introduce fscache-based domain") Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Jia Zhu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Jia Zhu <[email protected]>
commit 37020bb upstream. The xarray iteration only holds the RCU read lock and thus may encounter XA_RETRY_ENTRY if there's process modifying the xarray concurrently. This will cause oops when referring to the invalid entry. Fix this by adding the missing xas_retry(), which will make the iteration wind back to the root node if XA_RETRY_ENTRY is encountered. Fixes: d435d53 ("erofs: change to use asynchronous io for fscache readpage/readahead") Suggested-by: David Howells <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Jia Zhu <[email protected]> Signed-off-by: Jingbo Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Jia Zhu <[email protected]>
commit 27f2a2d upstream. When shared domain is enabled, doing mount twice with the same fsid and domain_id will trigger sysfs warning as shown below: sysfs: cannot create duplicate filename '/fs/erofs/d0,meta.bin' CPU: 15 PID: 1051 Comm: mount Not tainted 6.1.0-rc6+ openvelinux#1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Call Trace: <TASK> dump_stack_lvl+0x38/0x49 dump_stack+0x10/0x12 sysfs_warn_dup.cold+0x17/0x27 sysfs_create_dir_ns+0xb8/0xd0 kobject_add_internal+0xb1/0x240 kobject_init_and_add+0x71/0xa0 erofs_register_sysfs+0x89/0x110 erofs_fc_fill_super+0x98c/0xaf0 vfs_get_super+0x7d/0x100 get_tree_nodev+0x16/0x20 erofs_fc_get_tree+0x20/0x30 vfs_get_tree+0x24/0xb0 path_mount+0x2fa/0xa90 do_mount+0x7c/0xa0 __x64_sys_mount+0x8b/0xe0 do_syscall_64+0x30/0x60 entry_SYSCALL_64_after_hwframe+0x46/0xb0 The reason is erofs_fscache_register_cookie() doesn't guarantee the primary data blob (aka fsid) is unique in the shared domain and erofs_register_sysfs() invoked by the second mount will fail due to the duplicated fsid in the shared domain and report warning. It would be better to check the uniqueness of fsid before doing erofs_register_sysfs(), so adding a new flags parameter for erofs_fscache_register_cookie() and doing the uniqueness check if EROFS_REG_COOKIE_NEED_NOEXIST is enabled. After the patch, the error in dmesg for the duplicated mount would be: erofs: ...: erofs_domain_register_cookie: XX already exists in domain YY Reviewed-by: Jia Zhu <[email protected]> Reviewed-by: Jingbo Xu <[email protected]> Reviewed-by: Chao Yu <[email protected]> Signed-off-by: Hou Tao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Fixes: 7d41963 ("erofs: Support sharing cookies in the same domain") Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Jia Zhu <[email protected]>
commit 0130e4e upstream. erofs over fscache doesn't support the compressed layout yet. It will cause NULL crash if there are compressed inodes contained when working in fscache mode. So far in the erofs based container image distribution scenarios (RAFS v6), the compressed RAFS v6 images are downloaded and then decompressed on demand as an uncompressed erofs image. Then the erofs image is mounted in fscache mode for containers to use. IOWs, currently compressed data is decompressed on the userspace side instead and uncompressed erofs images will be finally cached. The fscache support for the compressed layout is still under development and it will be used for runtime decompression feature. Anyway, to avoid the potential crash, let's leave the compressed inodes unsupported in fscache mode until we support it later. Fixes: 1442b02 ("erofs: implement fscache-based data read for non-inline layout") Signed-off-by: Jeffle Xu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Jia Zhu <[email protected]>
commit e6d9f9b upstream. For unmapped range, the returned map.m_llen is zero, and thus the calculated count is unexpected zero. Prior to the refactoring introduced by commit 1ae9470 ("erofs: clean up .read_folio() and .readahead() in fscache mode"), only the readahead routine suffers from this. With the refactoring of making .read_folio() and .readahead() calling one common routine, both read_folio and readahead have this issue now. Fix this by calculating count separately in unmapped condition. Fixes: c665b39 ("erofs: implement fscache-based data readahead") Fixes: 1ae9470 ("erofs: clean up .read_folio() and .readahead() in fscache mode") Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]> Signed-off-by: Jia Zhu <[email protected]>
…_lock commit 598ad0b upstream. Taking sb_writers whilst holding mmap_lock isn't allowed and will result in a lockdep warning like that below. The problem comes from cachefiles needing to take the sb_writers lock in order to do a write to the cache, but being asked to do this by netfslib called from readpage, readahead or write_begin[1]. Fix this by always offloading the write to the cache off to a worker thread. The main thread doesn't need to wait for it, so deadlock can be avoided. This can be tested by running the quick xfstests on something like afs or ceph with lockdep enabled. WARNING: possible circular locking dependency detected 5.15.0-rc1-build2+ #292 Not tainted ------------------------------------------------------ holetest/65517 is trying to acquire lock: ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5 but task is already holding lock: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> openvelinux#2 (&mm->mmap_lock#2){++++}-{3:3}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b __might_fault+0x87/0xb1 strncpy_from_user+0x25/0x18c removexattr+0x7c/0xe5 __do_sys_fremovexattr+0x73/0x96 do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> openvelinux#1 (sb_writers#10){.+.+}-{0:0}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b cachefiles_write+0x2b3/0x4bb netfs_rreq_do_write_to_cache+0x3b5/0x432 netfs_readpage+0x2de/0x39d filemap_read_page+0x51/0x94 filemap_get_pages+0x26f/0x413 filemap_read+0x182/0x427 new_sync_read+0xf0/0x161 vfs_read+0x118/0x16e ksys_read+0xb8/0x12e do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}: check_noncircular+0xe4/0x129 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b down_read+0x40/0x4a filemap_fault+0x276/0x7a5 __do_fault+0x96/0xbf do_fault+0x262/0x35a __handle_mm_fault+0x171/0x1b5 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c exc_page_fault+0x85/0xa5 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_lock#2); lock(sb_writers#10); lock(&mm->mmap_lock#2); lock(mapping.invalidate_lock#3); *** DEADLOCK *** 1 lock held by holetest/65517: #0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c stack backtrace: CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0xe4/0x129 ? print_circular_bug+0x207/0x207 ? validate_chain+0x461/0x4a8 ? add_chain_block+0x88/0xd9 ? hlist_add_head_rcu+0x49/0x53 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 ? check_prev_add+0x3a4/0x3a4 ? mark_lock+0xa5/0x1c6 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b ? filemap_fault+0x276/0x7a5 ? rcu_read_unlock+0x59/0x59 ? add_to_page_cache_lru+0x13c/0x13c ? lock_is_held_type+0x7b/0xd3 down_read+0x40/0x4a ? filemap_fault+0x276/0x7a5 filemap_fault+0x276/0x7a5 ? pagecache_get_page+0x2dd/0x2dd ? __lock_acquire+0x8bc/0x949 ? pte_offset_kernel.isra.0+0x6d/0xc3 __do_fault+0x96/0xbf ? do_fault+0x124/0x35a do_fault+0x262/0x35a ? handle_pte_fault+0x1c1/0x20d __handle_mm_fault+0x171/0x1b5 ? handle_pte_fault+0x20d/0x20d ? __lock_release+0x151/0x254 ? mark_held_locks+0x1f/0x78 ? rcu_read_unlock+0x3a/0x59 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c ? pgtable_bad+0x70/0x70 ? rcu_read_lock_bh_held+0xab/0xab exc_page_fault+0x85/0xa5 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x40192f Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b RSP: 002b:00007f9931867eb0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0 RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700 R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0 R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000 Fixes: 726218f ("netfs: Define an interface to talk to a cache") Signed-off-by: David Howells <[email protected]> Tested-by: Jeff Layton <[email protected]> cc: Jan Kara <[email protected]> cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ [1] Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Jia Zhu <[email protected]>
commit 1122f40 upstream. For now, enqueuing and dequeuing on-demand requests all start from idx 0, this makes request distribution unfair. In the weighty concurrent I/O scenario, the request stored in higher idx will starve. Searching requests cyclically in cachefiles_ondemand_daemon_read, makes distribution fairer. Fixes: c838305 ("cachefiles: notify the user daemon when looking up cookie") Reported-by: Yongqing Li <[email protected]> Signed-off-by: Xin Yin <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Jeffle Xu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ # v1 Link: https://lore.kernel.org/r/[email protected]/ # v2 Signed-off-by: Jia Zhu <[email protected]>
commit c93ccd6 upstream. The cache_size field of copen is specified by the user daemon. If cache_size < 0, then the OPEN request is expected to fail, while copen itself shall succeed. However, returning 0 is indeed unexpected when cache_size is an invalid error code. Fix this by returning error when cache_size is an invalid error code. Changes ======= v4: update the code suggested by Dan v3: update the commit log suggested by Jingbo. Fixes: c838305 ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Sun Ke <[email protected]> Suggested-by: Jeffle Xu <[email protected]> Suggested-by: Dan Carpenter <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Jingbo Xu <[email protected]> Reviewed-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ # v2 Link: https://lore.kernel.org/r/[email protected]/ # v3 Link: https://lore.kernel.org/r/[email protected]/ # v4 Signed-off-by: Jia Zhu <[email protected]>
... in prep for the following failover feature for the on-demand mode of Cachefiles. Signed-off-by: Jia Zhu <[email protected]>
The cachefiles framework use dio for local cache files filling and reading, which may affects the performance for on-demand IO path. Change to use buffer IO for cache files filling, and first try to find data in the pagecache during cache files reading. After the pagecache for cache files is recycled, we will not suffer from double caching issue. Signed-off-by: Xin Yin <[email protected]>
Prior to fscache refactoring, the volume key in fscache_volume is a string with trailing NUL; while after refactoring, the volume key in fscache_cookie is actually a string without trailing NUL. Thus the current volume key setup for cachefiles_open may cause oops since it attempts to access volume key from the bad address. This can be reproduced by specifying "fsid" with 10 characters, e.g. "-o fsid=abigdomain". Fix this by determining if volume key is stored in volume->key or volume->inline_key by checking volume->key_len, rather than volume_key_size (which is actually volume->key_len plus 1). Reported-by: Jia Zhu <[email protected]> Fixes: c838305 ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Jingbo Xu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Acked-by: Joseph Qi <[email protected]> Link: https://gitee.com/anolis/cloud-kernel/pulls/881 Signed-off-by: Jia Zhu <[email protected]>
Previously, @ondemand_id field was used not only to identify ondemand state of the object, but also to represent the index of the xarray. This commit introduces @State field to decouple the role of @ondemand_id and adds helpers to access it. Signed-off-by: Jia Zhu <[email protected]> Reviewed-by: Xin Yin <[email protected]> Reviewed-by: Jingbo Xu <[email protected]>
…ject We'll introduce a @work_struct field for @object in subsequent patches, it will enlarge the size of @object. As the result of that, this commit extracts ondemand info field from @object. Signed-off-by: Jia Zhu <[email protected]> Reviewed-by: Jingbo Xu <[email protected]>
…bject is closed When an anonymous fd is closed by user daemon, if there is a new read request for this file comes up, the anonymous fd should be re-opened to handle that read request rather than fail it directly. 1. Introduce reopening state for objects that are closed but have inflight/subsequent read requests. 2. No longer flush READ requests but only CLOSE requests when anonymous fd is closed. 3. Enqueue the reopen work to workqueue, thus user daemon could get rid of daemon_read context and handle that request smoothly. Otherwise, the user daemon will send a reopen request and wait for itself to process the request. Signed-off-by: Jia Zhu <[email protected]> Reviewed-by: Xin Yin <[email protected]> Reviewed-by: Jingbo Xu <[email protected]>
…in ondemand mode Don't trigger EPOLLIN when there are only reopening read requests in xarray. Suggested-by: Xin Yin <[email protected]> Signed-off-by: Jia Zhu <[email protected]> Reviewed-by: Jingbo Xu <[email protected]>
…nd read requests Previously, in ondemand read scenario, if the anonymous fd was closed by user daemon, inflight and subsequent read requests would return EIO. As long as the device connection is not released, user daemon can hold and restore inflight requests by setting the request flag to CACHEFILES_REQ_NEW. Suggested-by: Gao Xiang <[email protected]> Signed-off-by: Jia Zhu <[email protected]> Signed-off-by: Xin Yin <[email protected]> Reviewed-by: Jingbo Xu <[email protected]>
CONFIG_CACHEFILES_ONDEMAND=y CONFIG_EROFS_FS=m CONFIG_EROFS_FS_ONDEMAND=y Signed-off-by: Jia Zhu <[email protected]>
Func kthread_create returns ERR_PTR(-ENOMEM) or ERR_PTR(-EINTR) when it failed. Those values are not null. Fixes: a7f33d7e ("nbd: Don't use workqueue to create recv threads") Signed-off-by: Sheng Zhao <[email protected]>
There are some peripheral deivces of Intel and Qualcomm are working on sanqingshan 2nd board, such as I2C bus/uart/usb etc. We need some kernel modules to support them. Signed-off-by: Duan Yayong <[email protected]>
ssif_info->client is NULL, so there is a NULL pointer dereference. Fix it. Fixes: 4fff7b116321 ("bytedance: ipmi: ssif: add timeout for I2C client") Signed-off-by: Muchun Song <[email protected]>
Signed-off-by: huteng.ht <[email protected]>
enable selinux and apparmor configs, set kernel boot param selinux=0 and apparmor=0 by default. Signed-off-by: huteng.ht <[email protected]>
PvsNarasimha
pushed a commit
to PvsNarasimha/kernel
that referenced
this pull request
Dec 27, 2024
commit 99d4850 upstream Found by leak sanitizer: ``` ==1632594==ERROR: LeakSanitizer: detected memory leaks Direct leak of 21 byte(s) in 1 object(s) allocated from: #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 openvelinux#1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369 openvelinux#2 0x556701d70589 in perf_env__cpuid util/env.c:465 openvelinux#3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14 #4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83 #5 0x556701d8f78b in evsel__config util/evsel.c:1366 openvelinux#6 0x556701ef5872 in evlist__config util/record.c:108 openvelinux#7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112 openvelinux#8 0x556701cacd07 in run_test tests/builtin-test.c:236 openvelinux#9 0x556701cacfac in test_and_print tests/builtin-test.c:265 openvelinux#10 0x556701cadddb in __cmd_test tests/builtin-test.c:402 openvelinux#11 0x556701caf2aa in cmd_test tests/builtin-test.c:559 openvelinux#12 0x556701d3b557 in run_builtin tools/perf/perf.c:323 openvelinux#13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377 openvelinux#14 0x556701d3be90 in run_argv tools/perf/perf.c:421 openvelinux#15 0x556701d3c3f8 in main tools/perf/perf.c:537 openvelinux#16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). ``` Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD") Signed-off-by: Ian Rogers <[email protected]> Acked-by: Ravi Bangoria <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
guojinhui-liam
pushed a commit
that referenced
this pull request
Jan 6, 2025
…mory commit a7800aa80ea4d5356b8474c2302812e9d4926fa6 upstream Introduce an ioctl(), KVM_CREATE_GUEST_MEMFD, to allow creating file-based memory that is tied to a specific KVM virtual machine and whose primary purpose is to serve guest memory. A guest-first memory subsystem allows for optimizations and enhancements that are kludgy or outright infeasible to implement/support in a generic memory subsystem. With guest_memfd, guest protections and mapping sizes are fully decoupled from host userspace mappings. E.g. KVM currently doesn't support mapping memory as writable in the guest without it also being writable in host userspace, as KVM's ABI uses VMA protections to define the allow guest protection. Userspace can fudge this by establishing two mappings, a writable mapping for the guest and readable one for itself, but that’s suboptimal on multiple fronts. Similarly, KVM currently requires the guest mapping size to be a strict subset of the host userspace mapping size, e.g. KVM doesn’t support creating a 1GiB guest mapping unless userspace also has a 1GiB guest mapping. Decoupling the mappings sizes would allow userspace to precisely map only what is needed without impacting guest performance, e.g. to harden against unintentional accesses to guest memory. Decoupling guest and userspace mappings may also allow for a cleaner alternative to high-granularity mappings for HugeTLB, which has reached a bit of an impasse and is unlikely to ever be merged. A guest-first memory subsystem also provides clearer line of sight to things like a dedicated memory pool (for slice-of-hardware VMs) and elimination of "struct page" (for offload setups where userspace _never_ needs to mmap() guest memory). More immediately, being able to map memory into KVM guests without mapping said memory into the host is critical for Confidential VMs (CoCo VMs), the initial use case for guest_memfd. While AMD's SEV and Intel's TDX prevent untrusted software from reading guest private data by encrypting guest memory with a key that isn't usable by the untrusted host, projects such as Protected KVM (pKVM) provide confidentiality and integrity *without* relying on memory encryption. And with SEV-SNP and TDX, accessing guest private memory can be fatal to the host, i.e. KVM must be prevent host userspace from accessing guest memory irrespective of hardware behavior. Attempt #1 to support CoCo VMs was to add a VMA flag to mark memory as being mappable only by KVM (or a similarly enlightened kernel subsystem). That approach was abandoned largely due to it needing to play games with PROT_NONE to prevent userspace from accessing guest memory. Attempt #2 to was to usurp PG_hwpoison to prevent the host from mapping guest private memory into userspace, but that approach failed to meet several requirements for software-based CoCo VMs, e.g. pKVM, as the kernel wouldn't easily be able to enforce a 1:1 page:guest association, let alone a 1:1 pfn:gfn mapping. And using PG_hwpoison does not work for memory that isn't backed by 'struct page', e.g. if devices gain support for exposing encrypted memory regions to guests. Attempt #3 was to extend the memfd() syscall and wrap shmem to provide dedicated file-based guest memory. That approach made it as far as v10 before feedback from Hugh Dickins and Christian Brauner (and others) led to it demise. Hugh's objection was that piggybacking shmem made no sense for KVM's use case as KVM didn't actually *want* the features provided by shmem. I.e. KVM was using memfd() and shmem to avoid having to manage memory directly, not because memfd() and shmem were the optimal solution, e.g. things like read/write/mmap in shmem were dead weight. Christian pointed out flaws with implementing a partial overlay (wrapping only _some_ of shmem), e.g. poking at inode_operations or super_operations would show shmem stuff, but address_space_operations and file_operations would show KVM's overlay. Paraphrashing heavily, Christian suggested KVM stop being lazy and create a proper API. Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/all/20230418-anfallen-irdisch-6993a61be10b@brauner Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/linux-mm/20230306191944.GA15773@monkey Link: https://lore.kernel.org/linux-mm/[email protected] Cc: Fuad Tabba <[email protected]> Cc: Vishal Annapurve <[email protected]> Cc: Ackerley Tng <[email protected]> Cc: Jarkko Sakkinen <[email protected]> Cc: Maciej Szmigiero <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Quentin Perret <[email protected]> Cc: Michael Roth <[email protected]> Cc: Wang <[email protected]> Cc: Liam Merwick <[email protected]> Cc: Isaku Yamahata <[email protected]> Co-developed-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Co-developed-by: Yu Zhang <[email protected]> Signed-off-by: Yu Zhang <[email protected]> Co-developed-by: Chao Peng <[email protected]> Signed-off-by: Chao Peng <[email protected]> Co-developed-by: Ackerley Tng <[email protected]> Signed-off-by: Ackerley Tng <[email protected]> Co-developed-by: Isaku Yamahata <[email protected]> Signed-off-by: Isaku Yamahata <[email protected]> Co-developed-by: Paolo Bonzini <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Co-developed-by: Michael Roth <[email protected]> Signed-off-by: Michael Roth <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Reviewed-by: Fuad Tabba <[email protected]> Tested-by: Fuad Tabba <[email protected]> Reviewed-by: Xiaoyao Li <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Rui Qi <[email protected]> BP CAUTION: 1. Only changes to the kernel were merged! 2. Since we do not maintain the KVM module in the kernel tree, all fix patches involving KVM will not be merged. Here, only a mark is made to allow the CI search fix job to pass commit 2098acaf24455698c149b27f0347eb4ddc6d2058 upstream. commit c31745d2c508796a0996c88bf2e55f552d513f65 upstream. commit e563592224e02f87048edee3ce3f0da16cceee88 upstream.
PvsNarasimha
pushed a commit
to PvsNarasimha/kernel
that referenced
this pull request
Jan 7, 2025
commit 99d4850 upstream Found by leak sanitizer: ``` ==1632594==ERROR: LeakSanitizer: detected memory leaks Direct leak of 21 byte(s) in 1 object(s) allocated from: #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 openvelinux#1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369 openvelinux#2 0x556701d70589 in perf_env__cpuid util/env.c:465 openvelinux#3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14 #4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83 #5 0x556701d8f78b in evsel__config util/evsel.c:1366 openvelinux#6 0x556701ef5872 in evlist__config util/record.c:108 openvelinux#7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112 openvelinux#8 0x556701cacd07 in run_test tests/builtin-test.c:236 openvelinux#9 0x556701cacfac in test_and_print tests/builtin-test.c:265 openvelinux#10 0x556701cadddb in __cmd_test tests/builtin-test.c:402 openvelinux#11 0x556701caf2aa in cmd_test tests/builtin-test.c:559 openvelinux#12 0x556701d3b557 in run_builtin tools/perf/perf.c:323 openvelinux#13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377 openvelinux#14 0x556701d3be90 in run_argv tools/perf/perf.c:421 openvelinux#15 0x556701d3c3f8 in main tools/perf/perf.c:537 openvelinux#16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). ``` Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD") Signed-off-by: Ian Rogers <[email protected]> Acked-by: Ravi Bangoria <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
PvsNarasimha
pushed a commit
to PvsNarasimha/kernel
that referenced
this pull request
Jan 7, 2025
commit 99d4850 upstream Found by leak sanitizer: ``` ==1632594==ERROR: LeakSanitizer: detected memory leaks Direct leak of 21 byte(s) in 1 object(s) allocated from: #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 openvelinux#1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369 openvelinux#2 0x556701d70589 in perf_env__cpuid util/env.c:465 openvelinux#3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14 #4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83 #5 0x556701d8f78b in evsel__config util/evsel.c:1366 openvelinux#6 0x556701ef5872 in evlist__config util/record.c:108 openvelinux#7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112 openvelinux#8 0x556701cacd07 in run_test tests/builtin-test.c:236 openvelinux#9 0x556701cacfac in test_and_print tests/builtin-test.c:265 openvelinux#10 0x556701cadddb in __cmd_test tests/builtin-test.c:402 openvelinux#11 0x556701caf2aa in cmd_test tests/builtin-test.c:559 openvelinux#12 0x556701d3b557 in run_builtin tools/perf/perf.c:323 openvelinux#13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377 openvelinux#14 0x556701d3be90 in run_argv tools/perf/perf.c:421 openvelinux#15 0x556701d3c3f8 in main tools/perf/perf.c:537 openvelinux#16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). ``` Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD") Signed-off-by: Ian Rogers <[email protected]> Acked-by: Ravi Bangoria <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
PvsNarasimha
pushed a commit
to PvsNarasimha/kernel
that referenced
this pull request
Jan 7, 2025
commit 99d4850 upstream Found by leak sanitizer: ``` ==1632594==ERROR: LeakSanitizer: detected memory leaks Direct leak of 21 byte(s) in 1 object(s) allocated from: #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 openvelinux#1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369 openvelinux#2 0x556701d70589 in perf_env__cpuid util/env.c:465 openvelinux#3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14 #4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83 #5 0x556701d8f78b in evsel__config util/evsel.c:1366 openvelinux#6 0x556701ef5872 in evlist__config util/record.c:108 openvelinux#7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112 openvelinux#8 0x556701cacd07 in run_test tests/builtin-test.c:236 openvelinux#9 0x556701cacfac in test_and_print tests/builtin-test.c:265 openvelinux#10 0x556701cadddb in __cmd_test tests/builtin-test.c:402 openvelinux#11 0x556701caf2aa in cmd_test tests/builtin-test.c:559 openvelinux#12 0x556701d3b557 in run_builtin tools/perf/perf.c:323 openvelinux#13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377 openvelinux#14 0x556701d3be90 in run_argv tools/perf/perf.c:421 openvelinux#15 0x556701d3c3f8 in main tools/perf/perf.c:537 openvelinux#16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). ``` Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD") Signed-off-by: Ian Rogers <[email protected]> Acked-by: Ravi Bangoria <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: PvsNarasimha <[email protected]>
PvsNarasimha
pushed a commit
to PvsNarasimha/kernel
that referenced
this pull request
Jan 30, 2025
commit 99d4850 upstream Found by leak sanitizer: ``` ==1632594==ERROR: LeakSanitizer: detected memory leaks Direct leak of 21 byte(s) in 1 object(s) allocated from: #0 0x7f2953a7077b in __interceptor_strdup ../../../../src/libsanitizer/asan/asan_interceptors.cpp:439 openvelinux#1 0x556701d6fbbf in perf_env__read_cpuid util/env.c:369 openvelinux#2 0x556701d70589 in perf_env__cpuid util/env.c:465 openvelinux#3 0x55670204bba2 in x86__is_amd_cpu arch/x86/util/env.c:14 #4 0x5567020487a2 in arch__post_evsel_config arch/x86/util/evsel.c:83 #5 0x556701d8f78b in evsel__config util/evsel.c:1366 openvelinux#6 0x556701ef5872 in evlist__config util/record.c:108 openvelinux#7 0x556701cd6bcd in test__PERF_RECORD tests/perf-record.c:112 openvelinux#8 0x556701cacd07 in run_test tests/builtin-test.c:236 openvelinux#9 0x556701cacfac in test_and_print tests/builtin-test.c:265 openvelinux#10 0x556701cadddb in __cmd_test tests/builtin-test.c:402 openvelinux#11 0x556701caf2aa in cmd_test tests/builtin-test.c:559 openvelinux#12 0x556701d3b557 in run_builtin tools/perf/perf.c:323 openvelinux#13 0x556701d3bac8 in handle_internal_command tools/perf/perf.c:377 openvelinux#14 0x556701d3be90 in run_argv tools/perf/perf.c:421 openvelinux#15 0x556701d3c3f8 in main tools/perf/perf.c:537 openvelinux#16 0x7f2952a46189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 SUMMARY: AddressSanitizer: 21 byte(s) leaked in 1 allocation(s). ``` Fixes: f7b58cb ("perf mem/c2c: Add load store event mappings for AMD") Signed-off-by: Ian Rogers <[email protected]> Acked-by: Ravi Bangoria <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: PvsNarasimha <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.