Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry Pi 4 model B #4

Open
geerlingguy opened this issue Jan 26, 2023 · 6 comments
Open

Raspberry Pi 4 model B #4

geerlingguy opened this issue Jan 26, 2023 · 6 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented Jan 26, 2023

DSC07091

Basic information

Linux/system information

# output of `neofetch`
       _,met$$$$$gg.          pi@hqcam 
    ,g$$$$$$$$$$$$$$$P.       -------- 
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux 11 (bullseye) aarch64 
 ,$$P'              `$$$.     Host: Raspberry Pi 4 Model B Rev 1.4 
',$$P       ,ggs.     `$$b:   Kernel: 5.15.84-v8+ 
`d$$'     ,$P"'   .    $$$    Uptime: 16 secs 
 $$P      d$'     ,    $$P    Packages: 597 (dpkg) 
 $$:      $$.   -    ,d$$'    Shell: bash 5.1.4 
 $$;      Y$b._   _,d$P'      Terminal: /dev/pts/0 
 Y$$.    `.`"Y$$$$P"'         CPU: BCM2835 (4) @ 1.800GHz 
 `$$b      "-.__              Memory: 94MiB / 7812MiB 
  `Y$$
   `Y$$.                                              
     `$$b.                                            
       `Y$$b.
          `"Y$b._
              `"""

# output of `uname -a`
Linux hqcam 5.15.84-v8+ #1613 SMP PREEMPT Thu Jan 5 12:03:08 GMT 2023 aarch64 GNU/Linux

Benchmark results

CPU

Power

  • Idle power draw (at wall): 1.6 W
  • Maximum simulated power draw (stress-ng --matrix 0): 5.0 W
  • During Geekbench multicore benchmark: 5.2 W
  • During top500 HPL benchmark: 7.2 W (1.64 Gflops/W)

Disk

SanDisk Extreme 32GB A1

Benchmark Result
fio 1M sequential read 46.0 MB/s
iozone 1M random read 42.56 MB/s
iozone 1M random write 36.16 MB/s
iozone 4K random read 10.24 MB/s
iozone 4K random write 5.01 MB/s

curl https://raw.githubusercontent.com/geerlingguy/pi-cluster/master/benchmarks/disk-benchmark.sh | sudo bash

Run benchmark on any attached storage device (e.g. eMMC, microSD, NVMe, SATA) and add results under an additional heading. Download the script with curl -o disk-benchmark.sh [URL_HERE] and run sudo DEVICE_UNDER_TEST=/dev/sda DEVICE_MOUNT_PATH=/mnt/sda1 ./disk-benchmark.sh (assuming the device is sda).

Also consider running PiBenchmarks.com script.

Network

iperf3 results:

Ethernet

  • iperf3 -c $SERVER_IP: 939 Mbps
  • iperf3 --reverse -c $SERVER_IP: 938 Mbps
  • iperf3 --bidir -c $SERVER_IP: 895 Mbps up / 854 Mbps down

WiFi

  • iperf3 -c $SERVER_IP: 108 Mbps
  • iperf3 --reverse -c $SERVER_IP: 103 Mbps
  • iperf3 --bidir -c $SERVER_IP: 1 Mbps up / 105 Mbps down

GPU

glmark2-es2 result:

=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      Broadcom
    GL_RENDERER:    V3D 4.2
    GL_VERSION:     OpenGL ES 3.1 Mesa 23.2.1-1~bpo12+rpt3
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 845 FrameTime: 1.184 ms
[build] use-vbo=true: FPS: 1335 FrameTime: 0.749 ms
[texture] texture-filter=nearest: FPS: 1130 FrameTime: 0.886 ms
[texture] texture-filter=linear: FPS: 1108 FrameTime: 0.903 ms
[texture] texture-filter=mipmap: FPS: 1099 FrameTime: 0.911 ms
[shading] shading=gouraud: FPS: 1081 FrameTime: 0.925 ms
[shading] shading=blinn-phong-inf: FPS: 885 FrameTime: 1.131 ms
[shading] shading=phong: FPS: 699 FrameTime: 1.432 ms
[shading] shading=cel: FPS: 664 FrameTime: 1.507 ms
[bump] bump-render=high-poly: FPS: 575 FrameTime: 1.740 ms
[bump] bump-render=normals: FPS: 1140 FrameTime: 0.878 ms
[bump] bump-render=height: FPS: 1061 FrameTime: 0.943 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 422 FrameTime: 2.374 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 225 FrameTime: 4.459 ms
[pulsar] light=false:quads=5:texture=false: FPS: 1177 FrameTime: 0.850 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 125 FrameTime: 8.011 ms
[desktop] effect=shadow:windows=4: FPS: 435 FrameTime: 2.302 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 156 FrameTime: 6.419 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 162 FrameTime: 6.207 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 199 FrameTime: 5.028 ms
[ideas] speed=duration: FPS: 744 FrameTime: 1.345 ms
[jellyfish] <default>: FPS: 419 FrameTime: 2.390 ms
[terrain] <default>: FPS: 28 FrameTime: 36.164 ms
[shadow] <default>: FPS: 112 FrameTime: 8.937 ms
[refract] <default>: FPS: 43 FrameTime: 23.685 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 1254 FrameTime: 0.798 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 692 FrameTime: 1.445 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 1205 FrameTime: 0.830 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 965 FrameTime: 1.037 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 611 FrameTime: 1.638 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 922 FrameTime: 1.085 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 927 FrameTime: 1.079 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 589 FrameTime: 1.700 ms
=======================================================
                                  glmark2 Score: 697 
=======================================================
  • TODO: Haven't determined standardized benchmark yet. See Issue #2.

Memory

tinymembench results:

Click to expand memory benchmark result
tinymembench v0.4.10 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :   2747.5 MB/s (1.7%)
 C copy backwards (32 byte blocks)                    :   2757.0 MB/s (0.1%)
 C copy backwards (64 byte blocks)                    :   2749.6 MB/s
 C copy                                               :   2731.0 MB/s
 C copy prefetched (32 bytes step)                    :   2726.8 MB/s
 C copy prefetched (64 bytes step)                    :   2727.8 MB/s
 C 2-pass copy                                        :   2189.6 MB/s (0.4%)
 C 2-pass copy prefetched (32 bytes step)             :   2307.0 MB/s
 C 2-pass copy prefetched (64 bytes step)             :   2292.3 MB/s (0.3%)
 C fill                                               :   3126.3 MB/s (1.3%)
 C fill (shuffle within 16 byte blocks)               :   3122.2 MB/s (0.9%)
 C fill (shuffle within 32 byte blocks)               :   3105.8 MB/s (0.9%)
 C fill (shuffle within 64 byte blocks)               :   3110.4 MB/s (0.9%)
 NEON 64x2 COPY                                       :   2735.7 MB/s
 NEON 64x2x4 COPY                                     :   2734.0 MB/s
 NEON 64x1x4_x2 COPY                                  :   1099.1 MB/s (0.2%)
 NEON 64x2 COPY prefetch x2                           :   2728.2 MB/s
 NEON 64x2x4 COPY prefetch x1                         :   2725.5 MB/s
 NEON 64x2 COPY prefetch x1                           :   2726.2 MB/s
 NEON 64x2x4 COPY prefetch x1                         :   2728.5 MB/s
 ---
 standard memcpy                                      :   2737.5 MB/s
 standard memset                                      :   3102.7 MB/s (0.9%)
 ---
 NEON LDP/STP copy                                    :   2731.7 MB/s
 NEON LDP/STP copy pldl2strm (32 bytes step)          :   2717.2 MB/s
 NEON LDP/STP copy pldl2strm (64 bytes step)          :   2718.5 MB/s
 NEON LDP/STP copy pldl1keep (32 bytes step)          :   2728.9 MB/s
 NEON LDP/STP copy pldl1keep (64 bytes step)          :   2731.1 MB/s
 NEON LD1/ST1 copy                                    :   2733.4 MB/s
 NEON STP fill                                        :   3111.4 MB/s (1.1%)
 NEON STNP fill                                       :   2701.2 MB/s (0.9%)
 ARM LDP/STP copy                                     :   2735.1 MB/s
 ARM STP fill                                         :   3084.1 MB/s (0.9%)
 ARM STNP fill                                        :   2640.1 MB/s (1.3%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    4.7 ns          /     7.4 ns 
    131072 :    7.2 ns          /     9.9 ns 
    262144 :   10.3 ns          /    13.2 ns 
    524288 :   11.9 ns          /    15.1 ns 
   1048576 :   22.7 ns          /    34.8 ns 
   2097152 :   80.9 ns          /   117.8 ns 
   4194304 :  108.9 ns          /   140.9 ns 
   8388608 :  129.4 ns          /   161.1 ns 
  16777216 :  139.8 ns          /   170.3 ns 
  33554432 :  145.1 ns          /   175.4 ns 
  67108864 :  156.5 ns          /   191.4 ns 

Phoronix Test Suite

Results of the pi-general-benchmark.sh:

  • pts/encode-mp3: 23.887 sec
  • pts/x264 4K: 1.75 fps
  • pts/x264 1080p: 7.61 fps
  • pts/phpbench: 187221
  • pts/build-linux-kernel (defconfig): 5066.188 sec

Other Data

Crypto performance as measured by OpenSSL (see sbc-bench ARMv8 Crypto Extensions):

pi@raspberrypi:~ $ openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 5145475 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 1378033 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 351656 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 88374 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 11062 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 5531 aes-256-cbc's in 3.00s
OpenSSL 1.1.1n  15 Mar 2022
built on: Wed Feb  8 14:21:54 2023 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) blowfish(ptr) 
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/openssl-ysjt2m/openssl-1.1.1n=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DBSAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      27442.53k    29398.04k    30007.98k    30164.99k    30206.63k    30206.63k
@geerlingguy
Copy link
Owner Author

Benchmark of SD card on PiBenchmarks.com: https://pibenchmarks.com/benchmark/67793/

@github-actions
Copy link

This issue has been marked 'stale' due to lack of recent activity. If there is no further activity, the issue will be closed in another 30 days. Thank you for your contribution!

Please read this blog post to see the reasons why I mark issues as stale.

@github-actions github-actions bot added the stale label Jul 13, 2023
@geerlingguy geerlingguy removed the stale label Aug 31, 2023
This was referenced Nov 27, 2023
@arekm
Copy link

arekm commented Jan 31, 2024

I wonder about one thing. You seem to never experienced rpi 4 wifi instability (causing lockup (only wifi lockup) requiring hard reset to get wifi working again). It's daily (or rather hourly) experience here on every rpi 4 board (but only have 4GB, rev 1.5 boards; official PSU). Leaving iperf -t0 running is enough to trigger.

@k-koeberl
Copy link

May be this question is wrong here but some of you may have seen the same effect as myself.
I am using Sandisk Ultra Fit from 32GB up to 512GB for a long time now on my RPI3 and RPI4 boards and they work fine. Lately I have ordered "new" 256GB Ultra Fit's and they are unusable slow.

This are the parameters of the "old" ones that are working well
SanDisk Ultra Fit 256GB (made in China)

Bus 002 Device 008: ID 0781:5583 SanDisk Corp. Ultra Fit
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 3.00
bDeviceClass 0
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 9
idVendor 0x0781 SanDisk Corp.
idProduct 0x5583 Ultra Fit
bcdDevice 1.00
iManufacturer 1 SanDisk
iProduct 2 Ultra Fit
iSerial 3 0401a026bcbb31d7dc3a43a4fd6a281e69edd0bfafede1b68eed300273ea9b981c960000000000000000000090719afaff806e18835581077c271bb5
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x002c
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0x80
(Bus Powered)
MaxPower 896mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 8 Mass Storage
bInterfaceSubClass 6 SCSI
bInterfaceProtocol 80 Bulk-Only
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 1
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 15
Binary Object Store Descriptor:
bLength 5
bDescriptorType 15
wTotalLength 0x0016
bNumDeviceCaps 2
USB 2.0 Extension Device Capability:
bLength 7
bDescriptorType 16
bDevCapabilityType 2
bmAttributes 0x00000002
HIRD Link Power Management (LPM) Supported
SuperSpeed USB Device Capability:
bLength 10
bDescriptorType 16
bDevCapabilityType 3
bmAttributes 0x00
wSpeedsSupported 0x000e
Device can operate at Full Speed (12Mbps)
Device can operate at High Speed (480Mbps)
Device can operate at SuperSpeed (5Gbps)
bFunctionalitySupport 1
Lowest fully-functional device speed is Full Speed (12Mbps)
bU1DevExitLat 10 micro seconds
bU2DevExitLat 256 micro seconds
Device Status: 0x0000
(Bus Powered)

This are the "new" ones that are unusable under Linux
SanDisk Ultra Fit 256GB (made in Taiwan)

Bus 002 Device 007: ID 0781:55b1 SanDisk Corp. SanDisk 3.2 Gen1
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 3.20
bDeviceClass 0
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 9
idVendor 0x0781 SanDisk Corp.
idProduct 0x55b1
bcdDevice 1.10
iManufacturer 1 SanDisk
iProduct 2 SanDisk 3.2 Gen1
iSerial 3 A2003921444C4129
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 0x002c
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0x80
(Bus Powered)
MaxPower 896mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 8 Mass Storage
bInterfaceSubClass 6 SCSI
bInterfaceProtocol 80 Bulk-Only
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 3
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0400 1x 1024 bytes
bInterval 0
bMaxBurst 3
Binary Object Store Descriptor:
bLength 5
bDescriptorType 15
wTotalLength 0x0016
bNumDeviceCaps 2
USB 2.0 Extension Device Capability:
bLength 7
bDescriptorType 16
bDevCapabilityType 2
bmAttributes 0x00000006
BESL Link Power Management (LPM) Supported
SuperSpeed USB Device Capability:
bLength 10
bDescriptorType 16
bDevCapabilityType 3
bmAttributes 0x00
wSpeedsSupported 0x000e
Device can operate at Full Speed (12Mbps)
Device can operate at High Speed (480Mbps)
Device can operate at SuperSpeed (5Gbps)
bFunctionalitySupport 2
Lowest fully-functional device speed is High Speed (480Mbps)
bU1DevExitLat 10 micro seconds
bU2DevExitLat 2047 micro seconds
Device Status: 0x000c
(Bus Powered)
U1 Enabled
U2 Enabled

Under Windows both are working at the same speed and without any problem. I also tried to get further information
from the vendor but the only comment was that the USB flash drives are only certified for Windows.
Has anyone seen similar effects ? Are there any parameters that could make the new Sandisk Ultra Fits work under Linux ?

@L4wnmower
Copy link

Hi Jeff,
If you could manage a bit of extra testing time, it would be interesting to amend above results for the Pi4B & Pi400, using latest Bookworm with the NUMA patches. Same applies to Pi5 and Pi500 (although these may have the performance bump included?)
Thanks.
Merry Xmas and A Wonderful New Year

@geerlingguy
Copy link
Owner Author

@L4wnmower - I have re-tested both Pi 5 and 500 (and CM5), at least on many of the CPU-related tests, and added those results inline with their posts (for comparison to the original software tests). Will try to do Pi 4 at some point too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants