Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vc4_bo_create -> Failed to allocate from CMA. #1247

Open
DeviceIoControl opened this issue Sep 7, 2019 · 44 comments
Open

vc4_bo_create -> Failed to allocate from CMA. #1247

DeviceIoControl opened this issue Sep 7, 2019 · 44 comments

Comments

@DeviceIoControl
Copy link

DeviceIoControl commented Sep 7, 2019

Bug Description:
Usage of any type of Hardware accelerated application (VLC media player, Kodi, Google Chrome, etc.) for an extended period of time (between 30 mins to 1 hour), causes CMA to fail to allocate memory (out of memory) for the vc4-fkms-v3d driver. This causes all of the Hardware accelerated applications running at that point in time to freeze (turn black) or produce serious visual artifacting.

The only fix that I have found, is to disable the vc4-fkms-v3d driver in the /boot/config.txt file (or by switching the GL Driver to "Legacy" mode in raspi-config), but unfortunately, this results in the loss of Hardware acceleration.

To reproduce:
This typically occurs when watching a video for more than 30 mins in the VLC media player application. Eventually the video will freeze and any attempt to interact with the application will cause the application to freeze or produce visual artifacting.

Expected behaviour:
Expected to use Hardware accelerated applications for an extended period of time, without causing applications freezing or produce visual artifacting.

Actual behaviour:
See "Bug Description" section above.

System:
Device Model -> Raspberry Pi 4 Model B (4GB)
OS -> Raspbian GNU/Linux 10 (buster) armv7l
Firmware version -> a51b488198a8c0360b93351682e7432d89d70411
Kernel version -> 4.19.66-v7l+

Logs:
[3912.566190] [drm:vc4_bo_create [vc4]] ERROR Failed to allocate from CMA:
[3912.573209] [drm] kernel: 5120kb BOs (1)
[3912.579505] [drm] V3D: 15116kb BOs (15)
[3912.585862] [drm] V3D shader: 120kb BOs (29)
[3912.592241] [drm] dumb: 48kb BOs (3)

@popcornmix
Copy link
Contributor

popcornmix commented Sep 9, 2019

Does increasing cma help? Add cma=512M to config.txt and test again.
EDIT: cmdline.txt not config.txt

@DeviceIoControl
Copy link
Author

OK, I've adjusted my settings, and I am testing now. Thanks.
I'll comment again if the problem still persists!

@DeviceIoControl
Copy link
Author

Does increasing cma help? Add cma=512M to config.txt and test again.

Well, it seems like the problem is still there, but its a different message this time. (still seems to be a CMA allocation error though.)

Here are the logs from dmesg.

dmesg:
[ 1374.350936] cma: cma_alloc: alloc failed, req-size: 765 pages, ret: -16
[ 1374.350943] [vc_sm_cma_ioctl_alloc]: dma_alloc_coherent alloc of 3133440 bytes failed
[ 1374.350947] [vc_sm_cma_ioctl_alloc]: something failed - cleanup. ret -12
[ 1375.019739] cma: cma_alloc: alloc failed, req-size: 765 pages, ret: -16
[ 1375.019750] [vc_sm_cma_ioctl_alloc]: dma_alloc_coherent alloc of 3133440 bytes failed
[ 1375.019754] [vc_sm_cma_ioctl_alloc]: something failed - cleanup. ret -12

@popcornmix
Copy link
Contributor

What's the simplest way to reproduce?
You say you can see the problem in 30 minutes in VLC.
Is that just playing a single file?

e.g. with a clean raspbian buster image if you run vlc (and nothing else) and play a single video longer than 30m would you see this issue? Is there a freely available video file that exhibits this problem?

(this doesn't seem to be my experience - I've left VLC or chrome playing youtube videos overnight a number of times and it still seems happy in the morning - but perhaps there is something related to format a videos, or some other difference in our setups).

@DeviceIoControl
Copy link
Author

DeviceIoControl commented Sep 9, 2019

In my case, it occurs more often when I've switched in and out of Fullscreen mode a couple of times in VLC Media player. (3 - 5 times).

And Google chrome seems to have this problem when viewing websites that use a lot of transitions and effects like: apple.com. (But most times Google chrome seems to "Aww, Snap" (crash) when using Hardware "intensive" websites).

Here is my config if needed.

**/boot/config.txt:**
# For more options and information see
# http://rpf.io/configtxt
# Some settings may impact device functionality. See link above for details

# uncomment if you get no picture on HDMI for a default "safe" mode
#hdmi_safe=1

# uncomment this if your display has a black border of unused pixels visible
# and your display can output without overscan
disable_overscan=1

# uncomment the following to adjust overscan. Use positive numbers if console
# goes off screen, and negative if there is too much border
#overscan_left=16
#overscan_right=16
#overscan_top=16
#overscan_bottom=16

# uncomment to force a console size. By default it will be display's size minus
# overscan.
#framebuffer_width=1280
#framebuffer_height=720

# uncomment if hdmi display is not detected and composite is being output
hdmi_force_hotplug=1

# uncomment to force a specific HDMI mode (this will force VGA)
hdmi_group=1
hdmi_mode=16

# uncomment to force a HDMI mode rather than DVI. This can make audio work in
# DMT (computer monitor) modes
#hdmi_drive=2

# uncomment to increase signal to HDMI, if you have interference, blanking, or
# no display
#config_hdmi_boost=4

# uncomment for composite PAL
#sdtv_mode=2

#uncomment to overclock the arm. 700 MHz is the default.
#arm_freq=800

# Uncomment some or all of these to enable the optional hardware interfaces
dtparam=i2c_arm=on
#dtparam=i2s=on
dtparam=spi=on

# Uncomment this to enable the lirc-rpi module
#dtoverlay=lirc-rpi

# Additional overlays and parameters are documented /boot/overlays/README

# Enable audio (loads snd_bcm2835)
dtparam=audio=on

[pi4]
# Enable DRM VC4 V3D driver on top of the dispmanx display stack
dtoverlay=vc4-fkms-v3d
max_framebuffers=1
cma=512m

[all]
# NOOBS Auto-generated Settings:
hdmi_force_hotplug=1
gpu_mem=256
#hdmi_enable_4kp60=1
start_x=1

@popcornmix
Copy link
Contributor

Ah sorry - cma=512m should be added to end of cmdline.txt, not config.txt.

@DeviceIoControl
Copy link
Author

Alright, I've adjusted my settings again and I am testing now. Thanks.
I'll report back if the issues persist.

@DeviceIoControl
Copy link
Author

Ah sorry - cma=512m should be added to end of cmdline.txt, not config.txt.

The issue still persists (though it did take much longer to occur this time), the same CMA allocation error as mentioned in the previous comment.

It happened in VLC Media player, while watching a 1080p60 video (mp4). Specifically when trying to rewind 30 seconds back in the video.

@JamesH65
Copy link
Contributor

JamesH65 commented Sep 9, 2019

So, if you don't wind back, then the problem does not occur? Quite an important bit of information that.

@popcornmix
Copy link
Contributor

Let us know if you find a sequence of operations that makes the issue happen repeatably and ideally quickly.

We can get the vlc/chromium guy to investigate this, but specific instructions to reproduce make it a lot more likely he'll be able to find and fix the problem.

What is your display resolution? Default skin on VLC?

@DeviceIoControl
Copy link
Author

DeviceIoControl commented Sep 9, 2019

So, if you don't wind back, then the problem does not occur? Quite an important bit of information that.

It can still freeze even if you don't rewind VLC Media player.

Let us know if you find a sequence of operations that makes the issue happen repeatably and ideally quickly.

Will do! I will make sure to let you know as soon as possible, when I can do so.

EDIT: LOL, Found it! So, literally scrubbing the time-line on VLC Media player while the video is playing can cause this error. I am currently playing a 1920x1080p 60fps video (mp4). (My config has not changed since the last time I mentioned it btw.)

EDIT 2: So the "after-shock" of this error seems to cause other applications to crash when continuing to use them after that error has been produced.

In my case Chromium seems to "Aww, Snap" (crash) the tabs that are using Hardware accelerated rendering such as:

  • Tabs that are playing video (YouTube).
  • Tabs that use high-fidelity graphics / images (Apple.com)

NOTE: It doesn't crash it instantly, it only crashes when you try to interact with the website (after the error has been produced.) and will only stop crashing when you reboot the system (Restarting Chromium won't fix it).

EDIT 3: Making the Chromium tabs crash, seem to push more of the same error out when taking a look at the Kernel logs using "dmesg".

And I forgot to mention that VLC Media player becomes unresponsive when trying to close it.
Closing the window will make it "minimise" to system tray but, attempts to close it completely by right-clicking the VLC Media player icon and clicking "Quit" doesn't work. (Even attempting to close it using the "kill -9 " command in the terminal doesn't work either.)

What is your display resolution? Default skin on VLC?

Raspberry Pi is running @ 1920x1080p 60fps

VLC is using the "default skin" but, I am running the LXDE Desktop environment not the default RPD desktop environment. (but this issue has occurred multiple times before I changed my Desktop environment).

@sahib
Copy link

sahib commented Oct 1, 2019

I seem to have a similar issue, but the context is a bit different. If this counts as hijacking: Sorry, please tell me to open a new issue in this case.

Context

We have a custom build UI that runs under wayland, or to be more exact under weston in fullscreen. The display has only a resolution of 720x480. The user interface renders its content using cairo (using the cairo-gl backend), pango (for font rendering) and librsvg (for displaying some vector graphics).

Issue

On some screens the UI suddenly freezes with very similar issues described by @DeviceIoControl.
The application log just tells me Draw call returned invalid argument. expect corruption. The screen will simply not update anymore, even a restart of weston won't fix the problem. Only thing that helps is a hard reboot. It is also not 100% when the issue happens, screens that are more memory intensive (with some SVG graphics) seem to trigger it more often. Here are two dmesg logs of the problem:

Both logs contain also the kernel command line, plenty of stack traces and similar lines like:

Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory

and:

[drm:validate_tile_binning_config [vc4]] *ERROR* Failed to allocate binner memory: -12

System

  • OS: Custom Linux built with yocto (Linux raspberrypi-cm3 4.14.112 #1 SMP Fri Aug 9 13:13:52 UTC 2019 armv7l GNU/Linux)
  • Device Model: Compute Module 3+

Please tell me what other information I can provide to help to debug this.

@6by9
Copy link

6by9 commented Oct 1, 2019

@sahib Your issue is very different.
You would appear to be using the full vc4 3D driver (vc4-kms-v3d). The Pi4 has a different 3D block (v3d), and currently only supports the hybrid driver (vc4-fkms-v3d - note the "f").
Both cases would appear to be out of memory scenarios, but are going to be for different reasons.

4.14 is now very out of date, particularly with regard 3D and DRM/KMS driver changes. I would strongly recommend you update to 4.19, if not later.
3D also makes use of a userside library called Mesa, and I'd suggest you ensure that is relatively up to date (19.2.0 is now released and being used by Raspbian, although that is more for v3d fixes than vc4).

@sahib
Copy link

sahib commented Oct 1, 2019

Thank you very much @6by9, I'm still trying to wrap my head around all of this.

Your issue is very different.

Sorry then about hijacking this issue. I will open a new issue once I have some new information.

Both cases would appear to be out of memory scenarios, but are going to be for different reasons.

Is increasing the amount of space for the CMA supposed to help? Also, can this be an issue in the application itself (i.e. using insane amounts of memory for unknown reasons) or is this more of a driver issue? I suppose the answer is "both"...

4.14 is now very out of date, particularly with regard 3D and DRM/KMS driver changes. I would strongly recommend you update to 4.19, if not later.

I can try and update once I get to it. Should be easily possible.

3D also makes use of a userside library called Mesa, and I'd suggest you ensure that is relatively up to date (19.2.0 is now released and being used by Raspbian, although that is more for v3d fixes than vc4).

This seems to be at 19.0.8. I can also try to update this.

@sahib
Copy link

sahib commented Oct 8, 2019

Update on this: I updated the kernel to 4.19.71 and set the CMA memory to 256M (might try lower later). I did not update mesa. So far I have not been able to reproduce the bug. Thanks @6by9 👍

If this crops up again, I will open a new issue.

@DanielGibson
Copy link

DanielGibson commented Jan 4, 2020

Hi,
I'm running into the same problem when running a Doom3 sourceport (dhewm3 or d3wasm) on a RPi 3B, even with cma=512M in cmdline.txt (I also tried 256M).
The problem is easily/quickly reproducible by just starting a new game.
I'm running latest raspbian buster with Kernel 4.19.75-v7+.

To reproduce, first enable OpenGL in raspi-config and reboot as prompted (fake vs full KMS didn't make a difference). It might also make sense to enable the SSH server so you can log into the half-frozen RPi later.

Then build latest dhewm3 git:

  1. sudo apt install build-essentials git cmake libsdl2-dev libopenal-dev libjpeg8-dev libvorbis-dev
  2. git clone https://github.com/dhewm/dhewm3.git
  3. cd dhewm3 && mkdir build && cd build
  4. cmake ../neo/
  5. make -j4

Now you should have a dhewm3 executable in dhewm3/build/.
Now get the (free demo) game data:

  1. In the same directory your dhewm3/ directory is in, create a doom3data/ directory
  2. cd doom3data
  3. wget https://files.holarse-linuxgaming.de/native/Spiele/Doom%203/Demo/doom3-linux-1.1.1286-demo.x86.run (you can also download the same file from another mirror if you like, it's the official Doom3 Linux x86 demo from back in the day)
  4. sh doom3-linux-1.1.1286-demo.x86.run --tar xf demo/ (this unpacks the game data that's in the .run file somewhere - you should get a demo/ directory that contains just one demo00.pk4 file)
  5. mv demo base (rename the directory to base)

So now your original top dir (probably /home/raspberrypi/ by default) should contain the following stuff:

dhewm3/
dhewm3/build/
dhewm3/build/dhewm3
dhewm3/build/.... (more files, incl. base.so)
dhewm3/neo/
dhewm3/README.md/
dhewm3/.... (more stuff)
doom3data/
doom3data/base/
doom3data/base/demo00.pk4
.... (whatever else was already there)

Now run dhewm3, you should get to see the main menu, where we'll do some configuration before restarting the game:

  1. cd dhewm3/build
  2. ./dhewm3 +set fs_basepath ../../doom3data/ starts dhewm3 and tells it where to find the game data
  3. game should start and you should get to see the main menu (if it's super slow it's not using the proper OpenGL driver but llvmpipe => enable OpenGL in raspi-config!)
  4. Click Options -> System -> Low Quality
  5. In the same menu, select "Advanced Options" and set everything to "No" or "Off"
  6. Click "Close Advanced Options", then click "Apply Changes", then "Exit" (on lower right corner)

Now Doom3/dhewm3 is configured to run in the lowest possible settings, and it should have written its config so it remembers those settings after a later crash.

Now you can finally run the game:

  1. Again, ./dhewm3 +set fs_basepath ../../doom3data/ starts dhewm3
  2. In the main menu, select New Game -> Recruit and wait
  3. This will take a bit because the RPi is pretty slow, but eventually the progress bar will stop moving all of X11 will freeze, but you'll still be able to move the mouse pointer (but not do anything with it). Yes, this is a bit confusing
  4. If you ssh into the machine, you'll see a lot of "Failed to allocate from CMA" messages in the syslog and dmesg.
  5. I think that only a full reboot will make OpenGL usable again (just restarting X didn't work for me)

An excerpt from my dmesg:

[   20.911620] fuse init (API version 7.27) # the last "normal", old line from boot
[  253.900107] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  253.900122] [drm]                            V3D: 488460kb BOs (3503)
[  253.900126] [drm]                     V3D shader:    260kb BOs (64)
[  253.900130] [drm]                           dumb:   9016kb BOs (2)
[  253.900136] [drm]                total purged BO:    712kb BOs (8)
[  253.900149] vc4_v3d 3fc00000.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory.
[  254.917700] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  254.917715] [drm]                            V3D: 498188kb BOs (3603)
[  254.917719] [drm]                     V3D shader:    260kb BOs (64)
[  254.917723] [drm]                           dumb:   9016kb BOs (2)
[  254.917727] [drm]                total purged BO:   1636kb BOs (22)
[  254.918211] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  254.918215] [drm]                            V3D: 497892kb BOs (3598)
[  254.918219] [drm]                     V3D shader:    260kb BOs (64)
[  254.918222] [drm]                           dumb:   9016kb BOs (2)
[  254.918226] [drm]                total purged BO:   1636kb BOs (22)
[  254.918685] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  254.918689] [drm]                            V3D: 497892kb BOs (3598)
[  254.918693] [drm]                     V3D shader:    260kb BOs (64)
[  254.918696] [drm]                           dumb:   9016kb BOs (2)
[  254.918700] [drm]                total purged BO:   1636kb BOs (22)
... (it goes on like this forever)

Basically the same happens when using d3wasm (https://github.com/gabrielcuvillier/d3wasm/), which is based on dhewm3 but has a new renderer, instead of dhewm3 (https://github.com/dhewm/dhewm3/).
Even though it's not obvious, it's possible to build d3wasm as a normal Linux binary that will use OpenGL ES 2.0 (Vanilla Doom3 and dhewm3 use OpenGL 1.x with ARB shaders).
If you wanna try that as well, https://github.com/gabrielcuvillier/d3wasm/blob/master/BUILD.md#6-enjoy describes how to the native d3wasm build works; you can start it just like dhewm3 with +set fs_basepath ../../doom3data/ to tell it where to find the game data.

@fierevere
Copy link

Happens with Kodi as well. Leaving its media player on pause for ~30 min, or just inactive for long time (anywhere, including main interface)
Raspbian buster.

@audetto
Copy link

audetto commented Feb 8, 2020

Got the same issue twice today.
Pi 3 been updated today (before the issues).

[ 4191.445808] [drm:vc4_bo_create [vc4]] ERROR Failed to allocate from CMA:
[ 4191.445852] [drm] V3D: 126112kb BOs (389)
[ 4191.445860] [drm] V3D shader: 476kb BOs (117)
[ 4191.445867] [drm] dumb: 8116kb BOs (2)
[ 4191.445879] [drm] total purged BO: 1544kb BOs (7)
[ 4191.445897] vc4_v3d 3fc00000.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory.

Did not do anything particularly strange, other than some browsing, terminal, cmake + emacs
After the 1st occurrence I updated the gpu memory to 128 and my config is as follows

dtparam=audio=on

[pi4]
dtoverlay=vc4-fkms-v3d
max_framebuffers=2

[all]
dtoverlay=vc4-fkms-v3d
gpu_mem=128

@fierevere
Copy link

fierevere commented Feb 8, 2020

Set this in /boot/config.txt, seems to help, so far no problems for a day.
plus yesterdays Raspbian update. (kernel 4.19.97-v7+)

max_framebuffers=1
gpu_mem=128
cma_lwm=16
cma_hwm=256

@audetto
Copy link

audetto commented Feb 8, 2020

Not sure what max_framebuffers does, but I think on a pi3 it defaults to 1???
Will try and report back.

@fierevere
Copy link

default is 2
possible values 1 or 2

@JamesH65
Copy link
Contributor

JamesH65 commented Feb 9, 2020

Not sure what max_framebuffers does, but I think on a pi3 it defaults to 1???
Will try and report back.

Simply limits the number of displays that will be instantiated. So if you set it to one on a Pi4 you only get one HDMI port, which can save some memory. TBH, setting to 2 is fine for almost all use cases., even on devices prior to the 4, pre-KMS, since frame buffers are only created if displays are found.

@popcornmix
Copy link
Contributor

Set this in /boot/config.txt, seems to help, so far no problems for a day.
max_framebuffers=1
gpu_mem=128
cma_lwm=16
cma_hwm=256

cma_lwm/cma_hwm were removed from firmware over two years ago.
max_framebuffers=1 is the default if otherwise not specified in config.txt. Did you add it as a new entry or edit an existing entry of max_framebuffers=2?
I can't imagine setting gpu_mem=128 (higher than the default) will help this issue (as effectively the arm will have less memory available).

Identifying the exact line you think helped would be useful (note: it is definitely not cma_lwm/cma_hwm which don't exist).

@fierevere
Copy link

havent "crashed" yet since that change.
Was about several times daily.
Maybe this is because of changes. Maybe because of raspbian kernel update (i wonder what they changed, havent seen changelog).

@popcornmix
Copy link
Contributor

If you want to help narrow down this issue, then remove the lines one at a time and see if things are still stable after a day or two. Report back if removing any line had an obvious effect on stability.
If it's still stable after the lines are removed, then that is also useful info (presumably the issue has been resolved in a kernel update).

@audetto
Copy link

audetto commented Feb 11, 2020

It is definitely not fixed.
I was just using LibreOffice + Chrome and it happened (pi 3 fully updated)

I will try to revert back to the legacy driver and see if it still happens.

@audetto
Copy link

audetto commented Feb 18, 2020

Is there anything else that can be done to track down / solve this?
Some more logging? debug info enabled?

The legacy driver clearly does not crash, but it has the bad habit of not sending my monitor to sleep for instance, so I really miss vc4.
I can't believe no-one else is seeing this.

@j123b567
Copy link

j123b567 commented Apr 2, 2020

Isn't this workaround for this issue? https://www.raspberrypi.org/forums/viewtopic.php?t=223363#p1614476

Our tests are currently running so I don't know if it really solve this problem but it seems promising.

@Yamagi
Copy link

Yamagi commented Apr 5, 2020

I can confirm that the workaround works for dhewm3. With /sys/devices/platform/soc/*.v3d/power/control set to on I can start the game and load the first level.

@audetto
Copy link

audetto commented Apr 5, 2020

Should this be applied automatically by raspi-config when the OpenGL module is selected?

@filip-hejsek
Copy link

This seems to be the same issue as anholt/linux#135

@cagnulein
Copy link

/sys/devices/platform/soc/

the workaround doesn't work for me:

pi@raspi:~ $ cat /sys/devices/platform/soc/3fc00000.v3d/power/control
on
Jun 12 18:51:21 raspi lightdm[2064]: Error getting user list from org.freedesktop.Accounts: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.Accounts was not provided by any .service files
Jun 12 18:51:23 raspi lightdm[2127]: Error getting user list from org.freedesktop.Accounts: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.freedesktop.Accounts was not provided by any .service files
Jun 12 22:00:14 raspi kernel: [96643.452290] NMI backtrace for cpu 0
Jun 12 22:00:14 raspi kernel: [96643.452300] CPU: 0 PID: 2069 Comm: Xorg Tainted: G         C        4.19.66-v7+ #1253
Jun 12 22:00:14 raspi kernel: [96643.452303] Hardware name: BCM2835
Jun 12 22:00:14 raspi kernel: [96643.452329] [<80111f38>] (unwind_backtrace) from [<8010d4b0>] (show_stack+0x20/0x24)
Jun 12 22:00:14 raspi kernel: [96643.452342] [<8010d4b0>] (show_stack) from [<808191e0>] (dump_stack+0xd4/0x118)
Jun 12 22:00:14 raspi kernel: [96643.452351] [<808191e0>] (dump_stack) from [<8081fc08>] (nmi_cpu_backtrace+0xc8/0xcc)
Jun 12 22:00:14 raspi kernel: [96643.452360] [<8081fc08>] (nmi_cpu_backtrace) from [<8081fd04>] (nmi_trigger_cpumask_backtrace+0xf8/0x134)
Jun 12 22:00:14 raspi kernel: [96643.452368] [<8081fd04>] (nmi_trigger_cpumask_backtrace) from [<8011050c>] (arch_trigger_cpumask_backtrace+0x20/0x24)
Jun 12 22:00:14 raspi kernel: [96643.452380] [<8011050c>] (arch_trigger_cpumask_backtrace) from [<80191ab4>] (rcu_dump_cpu_stacks+0xac/0xdc)
Jun 12 22:00:14 raspi kernel: [96643.452390] [<80191ab4>] (rcu_dump_cpu_stacks) from [<8019133c>] (rcu_check_callbacks+0x8ec/0x968)
Jun 12 22:00:14 raspi kernel: [96643.452398] [<8019133c>] (rcu_check_callbacks) from [<80199440>] (update_process_times+0x40/0x6c)
Jun 12 22:00:14 raspi kernel: [96643.452410] [<80199440>] (update_process_times) from [<801abd70>] (tick_sched_handle+0x64/0x70)
Jun 12 22:00:14 raspi kernel: [96643.452419] [<801abd70>] (tick_sched_handle) from [<801abfe4>] (tick_sched_timer+0x5c/0xb8)
Jun 12 22:00:14 raspi kernel: [96643.452427] [<801abfe4>] (tick_sched_timer) from [<80199fc8>] (__hrtimer_run_queues+0x164/0x320)
Jun 12 22:00:14 raspi kernel: [96643.452435] [<80199fc8>] (__hrtimer_run_queues) from [<8019abe8>] (hrtimer_interrupt+0x130/0x2a4)
Jun 12 22:00:14 raspi kernel: [96643.452447] [<8019abe8>] (hrtimer_interrupt) from [<806abe20>] (arch_timer_handler_phys+0x40/0x48)
Jun 12 22:00:14 raspi kernel: [96643.452458] [<806abe20>] (arch_timer_handler_phys) from [<80184958>] (handle_percpu_devid_irq+0x88/0x23c)
Jun 12 22:00:14 raspi kernel: [96643.452470] [<80184958>] (handle_percpu_devid_irq) from [<8017ea5c>] (generic_handle_irq+0x34/0x44)
Jun 12 22:00:14 raspi kernel: [96643.452480] [<8017ea5c>] (generic_handle_irq) from [<8017f198>] (__handle_domain_irq+0x6c/0xc4)
Jun 12 22:00:14 raspi kernel: [96643.452492] [<8017f198>] (__handle_domain_irq) from [<801021b4>] (bcm2836_arm_irqchip_handle_irq+0x60/0xa4)
Jun 12 22:00:14 raspi kernel: [96643.452500] [<801021b4>] (bcm2836_arm_irqchip_handle_irq) from [<801019bc>] (__irq_svc+0x5c/0x7c)
Jun 12 22:00:14 raspi kernel: [96643.452504] Exception stack(0x9c0efc08 to 0x9c0efc50)
Jun 12 22:00:14 raspi kernel: [96643.452511] fc00:                   808367e4 00000000 40000093 40000093 60000013 973b4e40
Jun 12 22:00:14 raspi kernel: [96643.452518] fc20: ffffffff ffffffff 973b4f20 9d199400 973b4f10 9c0efc6c 80d0517c 9c0efc58
Jun 12 22:00:14 raspi kernel: [96643.452522] fc40: 00000000 808367f8 40000013 ffffffff
Jun 12 22:00:14 raspi kernel: [96643.452531] [<801019bc>] (__irq_svc) from [<808367f8>] (_raw_spin_unlock_irqrestore+0x50/0x70)
Jun 12 22:00:14 raspi kernel: [96643.452480] [<8017ea5c>] (generic_handle_irq) from [<8017f198>] (__handle_domain_irq+0x6c/0xc4)
Jun 12 22:00:14 raspi kernel: [96643.452492] [<8017f198>] (__handle_domain_irq) from [<801021b4>] (bcm2836_arm_irqchip_handle_irq+0x60/0xa4)
Jun 12 22:00:14 raspi kernel: [96643.452500] [<801021b4>] (bcm2836_arm_irqchip_handle_irq) from [<801019bc>] (__irq_svc+0x5c/0x7c)
Jun 12 22:00:14 raspi kernel: [96643.452504] Exception stack(0x9c0efc08 to 0x9c0efc50)
Jun 12 22:00:14 raspi kernel: [96643.452511] fc00:                   808367e4 00000000 40000093 40000093 60000013 973b4e40
Jun 12 22:00:14 raspi kernel: [96643.452518] fc20: ffffffff ffffffff 973b4f20 9d199400 973b4f10 9c0efc6c 80d0517c 9c0efc58
Jun 12 22:00:14 raspi kernel: [96643.452522] fc40: 00000000 808367f8 40000013 ffffffff
Jun 12 22:00:14 raspi kernel: [96643.452531] [<801019bc>] (__irq_svc) from [<808367f8>] (_raw_spin_unlock_irqrestore+0x50/0x70)
Jun 12 22:00:14 raspi kernel: [96643.452582] [<808367f8>] (_raw_spin_unlock_irqrestore) from [<7f731140>] (vc4_v3d_get_bin_slot+0x58/0xec [vc4])
Jun 12 22:00:14 raspi kernel: [96643.452668] [<7f731140>] (vc4_v3d_get_bin_slot [vc4]) from [<7f731434>] (validate_tile_binning_config+0x78/0x174 [vc4])
Jun 12 22:00:14 raspi kernel: [96643.452735] [<7f731434>] (validate_tile_binning_config [vc4]) from [<7f731aa8>] (vc4_validate_bin_cl+0xd4/0x2ac [vc4])
Jun 12 22:00:14 raspi kernel: [96643.452799] [<7f731aa8>] (vc4_validate_bin_cl [vc4]) from [<7f729938>] (vc4_submit_cl_ioctl+0x79c/0xd18 [vc4])
Jun 12 22:00:14 raspi kernel: [96643.453002] [<7f729938>] (vc4_submit_cl_ioctl [vc4]) from [<7f57caec>] (drm_ioctl_kernel+0xb4/0xf0 [drm])
Jun 12 22:00:14 raspi kernel: [96643.453262] [<7f57caec>] (drm_ioctl_kernel [drm]) from [<7f57ced4>] (drm_ioctl+0x230/0x3cc [drm])
Jun 12 22:00:14 raspi kernel: [96643.453386] [<7f57ced4>] (drm_ioctl [drm]) from [<802bf870>] (do_vfs_ioctl+0xbc/0x804)
Jun 12 22:00:14 raspi kernel: [96643.453396] [<802bf870>] (do_vfs_ioctl) from [<802bfffc>] (ksys_ioctl+0x44/0x6c)
Jun 12 22:00:14 raspi kernel: [96643.453404] [<802bfffc>] (ksys_ioctl) from [<802c003c>] (sys_ioctl+0x18/0x1c)
Jun 12 22:00:14 raspi kernel: [96643.453414] [<802c003c>] (sys_ioctl) from [<80101000>] (ret_fast_syscall+0x0/0x28)
Jun 12 22:00:14 raspi kernel: [96643.453418] Exception stack(0x9c0effa8 to 0x9c0efff0)
Jun 12 22:00:14 raspi kernel: [96643.453423] ffa0:                   010007a0 7eb99e28 0000000d c0b06440 7eb99e28 00000000
Jun 12 22:00:14 raspi kernel: [96643.453430] ffc0: 010007a0 7eb99e28 c0b06440 00000036 76f24968 76f24968 7611f498 7eb99e28
Jun 12 22:00:14 raspi kernel: [96643.453434] ffe0: 76d1208c 7eb99dec 76cf8788 7699551c

@j123b567
Copy link

@cagnulein your issue seems to be unrelated. This issue and the workaround covers only the failure to allocate buffer object from CMA. Which is not your case. (No simptoms like drm:vc4_bo_create [vc4]] ERROR Failed to allocate from CMA are in your log)

@cagnulein
Copy link

@j123b567 i had them before adding "/sys/devices/platform/soc/*.v3d/power/control set to on"

@audetto
Copy link

audetto commented Jun 15, 2020

I still do not understand why this "workaround" is not released properly.
Without this, the great effort to support OpenGL on PI < 4 will go to waste.

@schauveau
Copy link

schauveau commented Nov 14, 2020

I recently experienced CMA allocation problems on my RPi4 8GB + Ubuntu 20.10 + Sway (a wayland compositor based on wlroots). I added cma=512M@128M to cmdline.txt and so far that seems to work fine (but more tests are needed).
This is probably not new information but I noticed that /proc/meminfo provides some CMA values:

# grep Cma /proc/meminfo 
CmaTotal:         524288 kB
CmaFree:          433592 kB

which means that it is possible to monitor CmaFree with a command such as

while sleep 0.5 ; do grep CmaFree /proc/meminfo ; done

I noticed that most applications allocate a few MB from CMA . I assume that this is to store their window content in one or two buffers (reminder: a typical 1920x1080 screen consumes 192010804 = 8MB). However, I also noticed that CmaFree can decrease by hundred of MBs while resizing some windows (all XWayland windows, imv-wayland, Firefox Wayland with layers.acceleration.force-enables=true,...). Other windows can be resized without consuming any CMA (wev, Firefox Wayland with layers.acceleration.force-enables=false, ...).

This is coherent with the fact that I experienced a lot of crashes while resizing windows. So I suspect that the problem is that a lot of applications (including XWayland) do not immediately free their old buffers when they receive a resize event. The resource is probably marked as unused until it is released by a garbage collector. A Wayland compositor (or a X11 server?) can send several resize events per seconds thus causing a large number of old buffers to remain alive.

If I am right then there are 2 ways to solve the problem:

  1. Rewrite all applications to speedup the release of the CMA buffers during a window resize
  2. Limit the number of resize events in the Wayland compositor (or X11 server?).

@DeviceIoControl
Copy link
Author

This issue has been open for long enough, I think it time to go. Thanks to everyone for contributing and helping.

@DanielGibson
Copy link

DanielGibson commented Sep 9, 2022

Has it been fixed? No? Then please don't close it

@Newbytee
Copy link

Newbytee commented Nov 4, 2022

I still experience this on Debian 11.

@DeviceIoControl
Copy link
Author

DeviceIoControl commented Nov 4, 2022

Hi everyone,

Just wanted to clarify that I closed this issue, because the issue had been open for more than 2 years with no mention of a fix in sight. If people still feel strongly about this issue then, I can reopen it, but for the time being it will remain closed.

@Newbytee
Copy link

Newbytee commented Nov 4, 2022

I don't think there's any reason to close this given that it hasn't been resolved as far as I can tell. Why do you think it should be closed?

@DeviceIoControl
Copy link
Author

I don't use my Raspberry Pi anymore, the comments in this issue have become cluttered, and I have pretty much given up on this issue at this point since, I can't see any resolution to this issue after 2 years of it being open.

Unless there is a fix in the works that I am unaware of?

@Newbytee
Copy link

Newbytee commented Nov 4, 2022

You can unsubscribe from this issue if you don't care about it. Whether you use your RPi or not doesn't change that this issue still affects people. I don't know if a fix is in the works.

@DanielGibson
Copy link

DanielGibson commented Nov 4, 2022

Yes, please reopen it, otherwise someone else would have to create a new issue and all the information collected here is lost and the people subscribed to this post (that still care) are disconnected.

Update: Thank you very much! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests