Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kola: 36.20220906.20.0: mm/page_alloc.c kernel warning on aarch64/openstack for 5.19 kernels #1292

Closed
dustymabe opened this issue Sep 6, 2022 · 7 comments

Comments

@dustymabe
Copy link
Member

We're seeing this in our testing-devel and branched streams.

Here's what the warning looks like:

[    5.135660] ------------[ cut here ]------------                                     
[    5.139852] WARNING: CPU: 0 PID: 18 at mm/page_alloc.c:5402 __alloc_pages+0x1a0/0x290
[    5.146972] Modules linked in:                                                       
[    5.149828] CPU: 0 PID: 18 Comm: cpuhp/0 Not tainted 5.19.6-200.fc36.aarch64 #1      
[    5.156667] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015           
[    5.163162] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)          
[    5.169897] pc : __alloc_pages+0x1a0/0x290                                           
[    5.173875] lr : alloc_pages+0xb8/0x16c                                              
[    5.177528] sp : ffff80000810bb90                                                    
[    5.180671] x29: ffff80000810bb90 x28: 0000000000000000 x27: ffff30fe849c1000        
[    5.187576] x26: 0000000000000000 x25: ffff0001fef07940 x24: 000000000000001e        
[    5.195276] x23: 0000000000000dc0 x22: 0000000000000000 x21: 000000000000001e        
[    5.202055] x20: 000000000000001e x19: 0000000000040dc0 x18: ffffffffffffffff        
[    5.208899] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000        
[    5.215898] x14: 0000000000000001 x13: 0000000000000000 x12: 0000000000000001        
[    5.222655] x11: ffffcf037aaef6d0 x10: 0000000000001d90 x9 : ffffcf03789296ec        
[    5.229443] x8 : ffff0000c03b61f0 x7 : 7fffffffffffffff x6 : 000000036312036f        
[    5.236172] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000        
[    5.242961] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffffcf037ae76009        
[    5.249890] Call trace:                                                              
[    5.252240]  __alloc_pages+0x1a0/0x290                                               
[    5.255798]  alloc_pages+0xb8/0x16c                                                  
[    5.259083]  kmalloc_order+0x3c/0xc0                                                 
[    5.262558]  kmalloc_order_trace+0x38/0x144                                          
[    5.266504]  __kmalloc+0x308/0x370                                                   
[    5.269709]  cacheinfo_cpu_online+0x68/0x1d0                                         
[    5.273747]  cpuhp_invoke_callback+0x128/0x4e4                                       
[    5.278107]  cpuhp_thread_fun+0xe0/0x184                                             
[    5.281942]  smpboot_thread_fn+0x1e8/0x220                                           
[    5.285824]  kthread+0xf0/0x100                                                      
[    5.288818]  ret_from_fork+0x10/0x20                                                 
[    5.292210] ---[ end trace 0000000000000000 ]---                                     

It happens consistently in aarch64/openstack in our provider (VexxHost). It may be something specific about their infra (i.e. software versions of hypervisor, etc) that is causing us to see this.

It also happens across a large number of tests, which makes me think maybe it isn't specific to the test but just a general issue.

One full console log:
console.txt

@travier
Copy link
Member

travier commented Sep 6, 2022

Do we have a contact at Vexxhost that could take a look?
I don't think that should block us from releasing if this is only on aarch64 OpenStack
Reporting upstream / to the provider would be good.

@dustymabe
Copy link
Member Author

This doesn't appear to be happening with current rawhide (kernel-6.0.0-0.rc4.31.fc38.aarch64) so I assume there is a fix upstream somewhere.

@bgilbert
Copy link
Contributor

bgilbert commented Sep 6, 2022

This is fixed by torvalds/linux@e75d18c which is in 5.19.7. At a quick glance, I think the consequence of the bug is that some CPU cache information doesn't get populated into sysfs.

@dustymabe
Copy link
Member Author

override proposed in: coreos/fedora-coreos-config#1960

@dustymabe
Copy link
Member Author

The fix for this went into testing stream release 36.20220918.2.2. Please try out the new release and report issues.

@dustymabe
Copy link
Member Author

The fix for this went into next stream release 37.20220910.1.0. Please try out the new release and report issues.

@dustymabe dustymabe added status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. and removed status/pending-testing-release Fixed upstream. Waiting on a testing release. labels Oct 3, 2022
@dustymabe
Copy link
Member Author

dustymabe commented Oct 18, 2022

The fix for this went into stable stream release 36.20220918.3.0.

@dustymabe dustymabe removed the status/pending-stable-release Fixed upstream and in testing. Waiting on stable release. label Oct 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants