Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orange Pi 5 Max #49

Open
geerlingguy opened this issue Aug 17, 2024 · 4 comments
Open

Orange Pi 5 Max #49

geerlingguy opened this issue Aug 17, 2024 · 4 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented Aug 17, 2024

Basic information

  • Board URL (official): Orange Pi 5 Max
  • Board purchased from: Amazon
  • Board purchase date: August 17, 2024
  • Board specs (as tested): 8GB RAM
  • Board price (as tested): $113

Linux/system information

# output of `screenfetch`
orangepi@orangepi5max:~$ screenfetch
         _,met$$$$$gg.           orangepi@orangepi5max
      ,g$$$$$$$$$$$$$$$P.        OS: Debian 11 bullseye
    ,g$$P""       """Y$$.".      Kernel: aarch64 Linux 6.1.43-rockchip-rk3588
   ,$$P'              `$$$.      Uptime: 3m
  ',$$P       ,ggs.     `$$b:    Packages: 1541
  `d$$'     ,$P"'   .    $$$     Shell: bash 5.1.4
   $$P      d$'     ,    $$P     Resolution: 1920x1080
   $$:      $$.   -    ,d$$'     DE: Xfce
   $$\;      Y$b._   _,d$P'      WM: Xfwm4
   Y$$.    `.`"Y$$$$P"'          WM Theme: Numix
   `$$b      "-.__               GTK Theme: Materia [GTK2]
    `Y$$                         Icon Theme: LoginIcons
     `Y$$.                       Font: Sans 10
       `$$b.                     Disk: 4.4G / 37G (13%)
         `Y$$b.                  CPU: Unknown @ 8x 1.8GHz
            `"Y$b._              GPU: llvmpipe (LLVM 11.0.1, 128 bits)
                `""""            RAM: 726MiB / 7934MiB

# output of `uname -a`
Linux orangepi5max 6.1.43-rockchip-rk3588 #1.0.0 SMP Mon Jul  8 11:54:40 CST 2024 aarch64 GNU/Linux

Benchmark results

CPU

Power

  • Idle power draw (at wall): 1.4 W
  • Maximum simulated power draw (stress-ng --matrix 0): 10.9 W
  • During Geekbench multicore benchmark: 9.9 W
  • During top500 HPL benchmark: 12.8 W

Disk

SanDisk Ultra 32GB A1 microSD card

Benchmark Result
iozone 4K random read 10.01 MB/s
iozone 4K random write 2.20 MB/s
iozone 1M random read 60.16 MB/s
iozone 1M random write 23.09 MB/s
iozone 1M sequential read 60.18 MB/s
iozone 1M sequential write 20.51 MB/s

MakerDisk NVMe 2280 M.2 512GB SSD

Benchmark Result
iozone 4K random read 47.50 MB/s
iozone 4K random write 135.29 MB/s
iozone 1M random read 1028.30 MB/s
iozone 1M random write 1460.79 MB/s
iozone 1M sequential read 1038.04 MB/s
iozone 1M sequential write 2256.93 MB/s
wget https://raw.githubusercontent.com/geerlingguy/pi-cluster/master/benchmarks/disk-benchmark.sh
chmod +x disk-benchmark.sh
sudo MOUNT_PATH=/ TEST_SIZE=1g ./disk-benchmark.sh

Run benchmark on any attached storage device (e.g. eMMC, microSD, NVMe, SATA) and add results under an additional heading.

Also consider running PiBenchmarks.com script.

Network

iperf3 results:

Built-in 2.5 Gbps Ethernet (Realtek RTL8125 rev 05)

  • iperf3 -c $SERVER_IP: 2.35 Gbps
  • iperf3 -c $SERVER_IP --reverse: 773 Mbps
  • iperf3 -c $SERVER_IP --bidir: 2.33 Gbps up, 362 Mbps down

Built-in WiFi (Synaptics AP6611S)

  • iperf3 -c $SERVER_IP: 297Mbps
  • iperf3 -c $SERVER_IP --reverse: 176 Mbps
  • iperf3 -c $SERVER_IP --bidir: 237 Mbps up, 38 Mbps down

GPU

glmark2-es2 / glmark2-es2-wayland results:

arm_release_ver: g13p0-01eac0, rk_so_ver: 10
=======================================================
    glmark2 2023.01
=======================================================
    OpenGL Information
    GL_VENDOR:      ARM
    GL_RENDERER:    Mali-G610
    GL_VERSION:     OpenGL ES 3.2 v1.g13p0-01eac0.68603db295fbf2c59ac6b927fdfb1c32
    Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
    Surface Size:   800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 943 FrameTime: 1.061 ms
[build] use-vbo=true: FPS: 906 FrameTime: 1.104 ms
[texture] texture-filter=nearest: FPS: 896 FrameTime: 1.117 ms
[texture] texture-filter=linear: FPS: 893 FrameTime: 1.120 ms
[texture] texture-filter=mipmap: FPS: 917 FrameTime: 1.091 ms
[shading] shading=gouraud: FPS: 894 FrameTime: 1.120 ms
[shading] shading=blinn-phong-inf: FPS: 868 FrameTime: 1.153 ms
[shading] shading=phong: FPS: 930 FrameTime: 1.076 ms
[shading] shading=cel: FPS: 843 FrameTime: 1.187 ms
[bump] bump-render=high-poly: FPS: 628 FrameTime: 1.593 ms
[bump] bump-render=normals: FPS: 981 FrameTime: 1.020 ms
[bump] bump-render=height: FPS: 930 FrameTime: 1.076 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 964 FrameTime: 1.038 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 827 FrameTime: 1.210 ms
[pulsar] light=false:quads=5:texture=false: FPS: 909 FrameTime: 1.101 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 459 FrameTime: 2.183 ms
[desktop] effect=shadow:windows=4: FPS: 817 FrameTime: 1.224 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 232 FrameTime: 4.320 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 240 FrameTime: 4.172 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 363 FrameTime: 2.761 ms
[ideas] speed=duration: FPS: 571 FrameTime: 1.753 ms
[jellyfish] <default>: FPS: 750 FrameTime: 1.334 ms
[terrain] <default>: FPS: 190 FrameTime: 5.285 ms
[shadow] <default>: FPS: 829 FrameTime: 1.207 ms
[refract] <default>: FPS: 274 FrameTime: 3.663 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 931 FrameTime: 1.075 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 951 FrameTime: 1.052 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 940 FrameTime: 1.064 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 940 FrameTime: 1.064 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 913 FrameTime: 1.096 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 931 FrameTime: 1.075 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 904 FrameTime: 1.107 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 885 FrameTime: 1.131 ms
=======================================================
                                  glmark2 Score: 770 
=======================================================

Note: This benchmark requires an active display on the device. Not all devices may be able to run glmark2-es2, so in that case, make a note and move on!

TODO: See this issue for discussion about a full suite of standardized GPU benchmarks.

Memory

tinymembench results:

Click to expand memory benchmark result
tinymembench v0.4.10 (simple benchmark for memory throughput and latency)

==========================================================================
== Memory bandwidth tests                                               ==
==                                                                      ==
== Note 1: 1MB = 1000000 bytes                                          ==
== Note 2: Results for 'copy' tests show how many bytes can be          ==
==         copied per second (adding together read and writen           ==
==         bytes would have provided twice higher numbers)              ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
==         to first fetch data into it, and only then write it to the   ==
==         destination (source -> L1 cache, L1 cache -> destination)    ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in    ==
==         brackets                                                     ==
==========================================================================

 C copy backwards                                     :  11592.7 MB/s (5.9%)
 C copy backwards (32 byte blocks)                    :  11590.3 MB/s (0.2%)
 C copy backwards (64 byte blocks)                    :  11599.0 MB/s (0.5%)
 C copy                                               :  11734.1 MB/s (0.2%)
 C copy prefetched (32 bytes step)                    :  12359.8 MB/s (0.2%)
 C copy prefetched (64 bytes step)                    :  12244.9 MB/s (0.3%)
 C 2-pass copy                                        :   4635.4 MB/s
 C 2-pass copy prefetched (32 bytes step)             :   7781.0 MB/s
 C 2-pass copy prefetched (64 bytes step)             :   8196.3 MB/s
 C fill                                               :  27981.7 MB/s (0.4%)
 C fill (shuffle within 16 byte blocks)               :  28023.4 MB/s (0.3%)
 C fill (shuffle within 32 byte blocks)               :  28033.7 MB/s (0.3%)
 C fill (shuffle within 64 byte blocks)               :  28004.2 MB/s (0.3%)
 NEON 64x2 COPY                                       :  12093.1 MB/s
 NEON 64x2x4 COPY                                     :  12004.0 MB/s (0.1%)
 NEON 64x1x4_x2 COPY                                  :   4554.4 MB/s (1.5%)
 NEON 64x2 COPY prefetch x2                           :  10658.2 MB/s (0.2%)
 NEON 64x2x4 COPY prefetch x1                         :  11332.3 MB/s (0.2%)
 NEON 64x2 COPY prefetch x1                           :  10819.3 MB/s
 NEON 64x2x4 COPY prefetch x1                         :  11340.1 MB/s (0.2%)
 ---
 standard memcpy                                      :  12093.2 MB/s
 standard memset                                      :  28018.2 MB/s (0.3%)
 ---
 NEON LDP/STP copy                                    :  12069.4 MB/s
 NEON LDP/STP copy pldl2strm (32 bytes step)          :  12648.6 MB/s
 NEON LDP/STP copy pldl2strm (64 bytes step)          :  12627.3 MB/s
 NEON LDP/STP copy pldl1keep (32 bytes step)          :  12264.5 MB/s
 NEON LDP/STP copy pldl1keep (64 bytes step)          :  12252.8 MB/s
 NEON LD1/ST1 copy                                    :  12005.7 MB/s (0.2%)
 NEON STP fill                                        :  27941.7 MB/s
 NEON STNP fill                                       :  27952.9 MB/s
 ARM LDP/STP copy                                     :  12078.6 MB/s (0.1%)
 ARM STP fill                                         :  28011.7 MB/s (0.3%)
 ARM STNP fill                                        :  28026.2 MB/s (0.2%)

==========================================================================
== Framebuffer read tests.                                              ==
==                                                                      ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled.       ==
== Writes to such framebuffers are quite fast, but reads are much       ==
== slower and very sensitive to the alignment and the selection of      ==
== CPU instructions which are used for accessing memory.                ==
==                                                                      ==
== Many x86 systems allocate the framebuffer in the GPU memory,         ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover,    ==
== PCI-E is asymmetric and handles reads a lot worse than writes.       ==
==                                                                      ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer    ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall    ==
== performance improvement. For example, the xf86-video-fbturbo DDX     ==
== uses this trick.                                                     ==
==========================================================================

 NEON LDP/STP copy (from framebuffer)                 :   1553.6 MB/s (18.1%)
 NEON LDP/STP 2-pass copy (from framebuffer)          :    657.2 MB/s
 NEON LD1/ST1 copy (from framebuffer)                 :    785.7 MB/s
 NEON LD1/ST1 2-pass copy (from framebuffer)          :    672.2 MB/s
 ARM LDP/STP copy (from framebuffer)                  :    771.4 MB/s (0.1%)
 ARM LDP/STP 2-pass copy (from framebuffer)           :    669.8 MB/s (0.3%)

==========================================================================
== Memory latency test                                                  ==
==                                                                      ==
== Average time is measured for random memory accesses in the buffers   ==
== of different sizes. The larger is the buffer, the more significant   ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      ==
== accesses. For extremely large buffer sizes we are expecting to see   ==
== page table walk with several requests to SDRAM for almost every      ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest).                                         ==
==                                                                      ==
== Note 1: All the numbers are representing extra time, which needs to  ==
==         be added to L1 cache latency. The cycle timings for L1 cache ==
==         latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
==         two independent memory accesses at a time. In the case if    ==
==         the memory subsystem can't handle multiple outstanding       ==
==         requests, dual random read has the same timings as two       ==
==         single reads performed one after another.                    ==
==========================================================================

block size : single random read / dual random read
      1024 :    0.0 ns          /     0.0 ns 
      2048 :    0.0 ns          /     0.0 ns 
      4096 :    0.0 ns          /     0.0 ns 
      8192 :    0.0 ns          /     0.0 ns 
     16384 :    0.0 ns          /     0.0 ns 
     32768 :    0.0 ns          /     0.0 ns 
     65536 :    0.0 ns          /     0.0 ns 
    131072 :    1.1 ns          /     1.5 ns 
    262144 :    2.3 ns          /     2.9 ns 
    524288 :    3.5 ns          /     4.0 ns 
   1048576 :   10.0 ns          /    13.1 ns 
   2097152 :   13.9 ns          /    15.7 ns 
   4194304 :   62.3 ns          /   100.7 ns 
   8388608 :  157.9 ns          /   221.3 ns 
  16777216 :  210.2 ns          /   259.7 ns 
  33554432 :  236.5 ns          /   272.7 ns 
  67108864 :  249.7 ns          /   278.7 ns 

sbc-bench results

Run sbc-bench and paste a link to the results here: https://0x0.st/XyIw.bin (ThomasKaiser/sbc-bench#100)

Phoronix Test Suite

Results from pi-general-benchmark.sh:

  • pts/encode-mp3: 12.077 sec
  • pts/x264 4K: 3.83 fps
  • pts/x264 1080p: 23.40 fps
  • pts/phpbench: 371184
  • pts/build-linux-kernel (defconfig): 1310.609 sec
@geerlingguy
Copy link
Owner Author

Thanks to @[email protected] on Mastodon for making me aware of the Max's existence. Will be interesting to see if this board is more of the goldilocks that Radxa was trying to thread with the X4...

@martinbone
Copy link

@geerlingguy Very interesting - I've been looking for an OP5 Max benchmark like this. The main thing that stands out to me is the speed of NVMe compared with the first OP5 (which isn't surprising considering the PCIe / lanes on the Max). This is what I got on my OP5:

Benchmark Result
iozone 4K random read 50.70 MB/s
iozone 4K random write 123.35 MB/s
iozone 1M random read 363.86 MB/s
iozone 1M random write 366.33 MB/s
iozone 1M sequential read 368.94 MB/s
iozone 1M sequential write 372.99 MB/s

Your glmark2-es2-wayland result looks quite low ... but I imagine this is due to the OS and drivers installed. On my OP5 I've got a score of 4800 (Joshua Riek's Ubuntu 22.04, running CPU and GPU at Performance and PAN_MESA_DEBUG=gofaster).

Many thanks for the data on the NVMe speed!!

@jimmyhon
Copy link

jimmyhon commented Sep 13, 2024

I also experienced the slow RX bandwidth on RTL8125 by default.

I was able to get full speed (2.35 Gbps) by moving the irq handler to the big A76 cores

ls /proc/irq/125/enP3p49s0
echo f0 > /proc/irq/125/smp_affinitiy

So it's not a limitation of the hardware.

@butterdori
Copy link

butterdori commented Dec 14, 2024

Hi Jeff!
Thanks for the very great and detailed reviews, both here and on your YouTube channel. I've been looking to build a portable NAS to complement my main home NAS, and looks like beefed up Pis are a great solution. I came across this but also noticed that Radxa released a 5B+, which has dual m.2 slots (albeit at half (x2) the speeds) and built-in Wifi with a cheaper cost than the original 5B (around $100 for 8GB ram and 64GB eMMC on board). For my needs, it seems that Radxa has more capable hardware, but I've also been hearing Orange Pi has better software support. Given the finicky nature of rockchips on Linux anyway, the difference between the two may not be great. After all, if I can get Debian running, it won't take much to run a NAS server (OMV etc), so my software support needs won't be extensive, especially if I'm not looking to do some extreme stuff like PCIe GPUs. So, I'm torn. Should I get the Orange Pi 5 Max or the Radxa Rock 5B+? Am I too worried about the software support of Radxa? I would very much appreciate your advice.
Thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants