ARM Board Benchmarks

Overview

  1. NAND speed
  2. Power consumption
  3. Memory benchmark
  4. General information

NAND speed

# time dd if=/dev/mtdblock4 of=/dev/null bs=128K count=1000
1000+0 records in
1000+0 records out
real    0m 25.06s
user    0m 0.14s
sys     0m 12.89s
This is roughly 5.2 MB/s.

DM3730 Module (pre) DM3730 Module (rev 1.0) SheevaPlug BeagleBoard Atmel SAMA5D4 Unit
5.2 4.2 3.7 9.2 7.0 MB/s

Power Consumption

DM3730 Module (pre) BeagleBoard-XM PandaBoard BeagleBone Unit
idle 0.04 A
nbench 0.11 A
cpuburn 0.14 A
cpuburn-neona8 0.17 A

idle is on the Linux command prompt (see kernel config). Statically linked Linux ELF executables are here: nbench, cpuburn, cpuburn-neona8.

Memory Benchmark

Benchmarking memory performance of three ARM boards under Linux.
Results in a nutshell: No crashes (good), DM3730 module (pre) seems to have fast reads but slow writes;
DM3730 module (rev 1.0) has faster memory than the previous board and Beagleboard-XM.

DM3730 Module (pre) DM3730 Module (rev 1.0) BeagleBoard-XM BeagleBoard Olimex A13 Olimex A10 Lime PandaBoard BeagleBone BeagleBone Black Wandboard i.MX6 Solo Wandboard i.MX6 Dual Banana Pi Raspberry 2 B Raspberry 3 B Atmel SAMA5D4 IDS i.MX6DL IDS i.MX6DL (timings) i.MX6UL EVK core i7-870 Unit
CoreMark 1765 1745 1758 1307 2232 1562 1108 2194 2605 2607 2180 1971 1445 1056 1201 2076 1909 1150 11589 Score
nbench numeric sort 3.48 3.45 3.23 2.51 4.42 4.32 2.83 2.20 4.31 4.29 4.26 4.09 3.78 2.98 1.79 2.02 3.43 3.42 2.22 13.42 Rel. to AMD K6/233
string sort 2.9 2.86 2.88 2.14 3.66 3.67 2.70 1.82 3.60 3.92 3.99 2.67 2.33 2.27 1.18 1.33 3.17 3.16 1.42 64.83 Rel. to AMD K6/233
bitfield 4.74 4.70 4.72 3.50 6.01 5.99 4.60 2.99 5.91 7.11 7.12 5.08 4.57 3.60 2.36 2.68 5.41 5.66 2.68 25.56 Rel. to AMD K6/233
Fourier 0.95 0.93 0.88 0.69 1.18 1.18 2.77 0.59 1.16 4.45 4.45 3.50 3.18 2.76 1.61 1.84 3.55 3.55 1.84 27.04 Rel. to AMD K6/233
stream copy 549.8 606.1 561.6 465.0 462.6 544.4 333.1 329.9 443.1 570.7 606.4 1715.9 1868.4 1907.8 513.6 566.0 666.3 536.3 1245.5 10041.0 MB/s
scale 230.9 234.3 230.0 173.9 218.9 239.2 281.8 134.0 229.1 556.5 575.9 789.6 736.2 846.5 448.1 501.1 626.2 515.6 458.2 9845.6 MB/s
add 250.3 263.0 257.1 193.7 296.6 284.8 429.0 186.6 310.9 744.6 767.8 516.3 637.8 994.5 375.2 424.3 710.6 590.6 472.7 10664.6 MB/s
ramspeed float copy 425.5 523.5 464.5 162.7 373.5 431.6 252.0 299.4 365.2 500.8 567.4 1543.7 1711.5 1730.7 414.4 457.7 529.2 460.9 1130.1 9422.5 MB/s
float scale 324.5 355.8 352.8 240.5 338.4 356.1 246.9 245.9 380.7 548.0 634.0 458.9 432.6 1796.4 404.4 446.0 569.5 439.1 356.6 9443.9 MB/s
float add 330.8 380.5 327.4 184.3 464.1 397.1 311.4 324.3 442.6 719.3 758.7 516.6 548.9 1049.8 335.4 366.8 881.8 669.5 444.7 10243.2 MB/s
cachebench set 766.2 1390.86 1337.9 496.9 1040.8 1114.3 3383.8 940.3 1428.9 5598.4 5620.3 2451.7 2224.9 1479.9 982.1 1115.6 4133.7 4419.3 1299.6 52507.5 MB/s
copy 1517.8 2782.5 2680.6 997.2 1990.5 2234.5 3889.1 1889.5 2868.3 6492.6 6494.1 5312.2 4818.7 3496.5 1647.0 1879.4 4816.4 5165.7 2790.8 105798.8 MB/s
read 356.0 352.2 356.3 264.2 451.3 450.2 907.1 223.9 444.7 1517.3 1517.7 1254.7 1138.6 757.8 501.0 569.5 1106.7 1207.7 665.2 9102.7 MB/s
write 302.9 299.0 302.4 224.2 373.3 360.3 1503.0 190.1 377.2 2505.4 2506.0 1504.3 1365.2 1508.6 667.2 758.4 1859.1 1993.7 797.4 13137.2 MB/s
memspeed copy libc 285 333 239 204 195 226 254 200 232 713 912 205 224 293 216 561 MB/s
read vld1 544 646 319 541 615 227 418 478 603 775 1439 211 231 945 865 601 MB/s
write libc 758 1333 1225 506 987 1517 935 1445 1144 1091 510 576 367 274 1256 MB/s
ssvb-membench C copy backwards 209.35 257.64 239.01 201.22 213.57 217.67 124.77 146.39 202.61 140.00 146.12 253.36 248.21 945.52 107.70 117.62 181.41 170.36 235.35 4690.79 MB/s
C copy 252.00 283.72 264.26 230.07 225.19 255.87 136.89 154.99 206.72 324.84 342.09 793.93 994.30 1013.01 137.89 151.29 342.38 278.22 620.43 4617.55 MB/s
C copy prefetched (step 32) 319.08 472.64 245.83 280.51 412.98 323.19 242.28 317.83 465.01 365.41 428.82 797.85 624.59 1055.35 146.19 160.67 390.27 300.27 601.72 4660.04 MB/s
C fill 780.57 1384.72 1167.93 521.59 1016.12 1108.96 1576.54 980.51 1494.86 2205.50 2271.77 2655.22 1198.66 1132.72 533.74 602.62 382.91 286.54 1510.69 7437.59 MB/s
memcpy 338.05 389.88 273.25 285.83 252.06 364.17 296.84 265.21 325.19 452.84 558.03 846.48 875.38 1019.17 259.04 285.50 370.54 290.21 638.58 7089.39 MB/s
memset 780.49 1384.72 1167.98 521.52 1016.12 1096.98 1579.76 978.05 1497.05 2206.27 2272.88 2482.86 1198.75 1124.68 533.13 602.02 382.80 286.22 1319.53 12699.08 MB/s
latency (2^17, single) 10.1 10.3 138.4 15.7 8.3 7.6 32.8 18.3 9.4 20.8 21.0 10.1 9.9 16.4 52.0 46.1 25.8 25.8 21.9 4.4 ns
latency (2^17, dual) 20.5 20.8 289.8 55.6 15.7 14.7 44.7 37.1 18.8 28.5 28.5 15.1 16.7 26.1 75.5 66.7 35.6 35.4 35.4 6.1 ns

All ARM boards run the same statically-linked Linux ELF executables available here: coremark, cachebench, memspeed, nbench, ramspeed, stream, ssvb-membench.

General Information

How to compile

ARM compiler: gcc 4.6.3 / Linaro 2012.06 with -O2 -march=armv7-a -ffast-math -fPIC -mfloat-abi=softfp -mfpu=neon
x86-64 compiler: gcc 4.6.1 / Ubuntu with -O2 -march=native

System information

How to run

Peter Meerwald