orly going thirty: October 2019

There are a large number of synthetic CPU benchmarks available - for example, GeekBench, JetStream, SPEC. The utility of these benchmarks for whole-system performance is debatable. Then we have benchmarks that attempt to measure whole-system performance; for example the time-honored Linux kernel compilation, and elaborate benchmarks such as SAP Sales and Distribution (SAP SD), otherwise known as the famous "SAPS rating."

Here I am attempting to measure some degree of whole-system performance by using ffmpeg to transcode Big Buck Bunny. This is a CPU-bound (more correctly, FPU-bound) benchmark with some memory and I/O load due to the very large size of the movie. I've used a statically-linked binary that is not particularly optimized for particular processor features or GPU's (ffmpeg can greatly speed up transcoding on Nvidia GPU's).

Here are the necessary steps to replicate my results (these are for Linux; on MacOS, I used the ffmpeg distribution from brew but the steps are otherwise identical):

wget https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz

tar xf ffmpeg-release-amd64-static.tar.xz

wget http://distribution.bbb3d.renderfarming.net/video/mp4/bbb_sunflower_1080p_60fps_normal.mp4

for i in 1 2 3; do
rm -f output.mp4; time ffmpeg-*-amd64-static/ffmpeg -threads 2 -loglevel panic -i bbb_sunflower_1080p_60fps_normal.mp4 -vcodec h264 -acodec aac -strict -2 -crf 26 output.mp4 2>&1 >>out.txt
done

Note that we are limiting the number of threads that FFMPEG can use to 2, which allows it to only use 2 cores. On a 4-core (or more..) machine, the encoding results are much better, but since many of my data points are from 2-core machines, we have to limit the number of threads to 2 in order to have an apples-to-apples comparison.

Note that on a 2-core hyper-threaded system, "in theory" 4 threads is ideal; however, hyper-threading is really only relevant for I/O-bound workloads, and since FFMPEG is CPU-bound, a thread limit of 2 is more appropriate.

We can see on this simple test, that for the MacOS trials:

there is a 37% performance improvement from Sandy Bridge to Broadwell (3 generations)
17% improvement from Broadwell to Kaby Lake (2 generations)

Over 5 generations there is a cumulative improvement of 48%.

For the AWS M instance family:

13% from Sandy Bridge (m1) to Ivy Bridge (m3) (1 generation)
14% from Ivy Bridge (m3) to Broadwell (m4) (2 generations)
13% from Broadwell (m4) to Skylake (m5) (1 generation)

Over 4 generations there is a cumulative improvement of 35%.

For the AWS C instance family:

28% from Ivy Bridge EP to Haswell (1 generation)
12% from Haswell to Skylake (2 generations)

Over 3 generations there is a cumulative improvement of 37% - but this is also partially due to differing clock speeds.

We normally would consider a benchmark such as SAPS to be a rigorous, whole-system benchmark because SAPS measures order line items per hour (an application metric) across infrastructure (CPU, memory, I/O), operating system, Java virtual machine, database, and ERP application. But it very much seems that SAPS is essentially a CPU benchmark.

Consider the following:

SAP certification #2015005 from 2015-03-10 (AWS c4.4xlarge, 8 cores / 16 threads) - 19,030 SAPS or 2,379 SAPS/core
SAP certification #2015006 from 2015-03-10 (AWS c4.8xlarge, 18 cores / 36 threads) - 37,950 SAPS or 2,108 SAPS/core

Here we observe almost linear scaling - as the number of cores/threads is increased from 8 to 18 (2.25X) the SAPS increases from 19,030 to 37,950 (1.99X).

If we consider the SAPS results for the previous-generation AWS C3 instance family:

SAP certification #2014041 from 2014-10-27 (AWS c3.8xlarge, 16 cores / 32 threads) - 31,830 SAP or 1,989 SAPS/core

The C3 result is about 6% lower than the c4.8xlarge on a per-core basis. If we recall the naive Big Buck Bunny transcoding benchmark, the C4 is about 12% faster than C3. Thus it appears that SAPS is not purely a CPU benchmark (as it should be) but is strongly CPU-dominated (at least half of the SAPS is directly attributable to CPU performance).

Naively concluding, there appears to be (on average) around 10% performance improvement across Intel CPU generations (across tick and tock). This means CPU performance doubles in 6.9 years (87 months - a far cry from Moore's Law which optimistically predicted 18 months

This is a discontinued IP camera from ProLink (https://prolink2u.com/product/pic-3002wn/). There is another review here but otherwise not much additional information.

I bought four of these for home surveillance, but have discovered a number of shortcomings which you need to consider when buying these::

the iOS client hard-resets my iPhone 7 Plus randomly, although my wife's iPhone 8 is "fairly" stable
only SD card and DropBox recording work, when recording to a NAS (Windows share) the recording randomly stops. Dropbox requires a paid Dropbox account since the number of files is quite large, and you would also need a lot of bandwidth; a day of recording is about 3GB
there is no FTP recording, contradicting the review linked above
the cameras sometimes randomly lose their recording settings
there are days and days with no available recordings, because the cameras stop recording after a couple days; therefore you have to power-cycle them every few days

These cameras are cheap, and in principle have a lot of features. The video quality is reasonably OK, the IR mode works fine, but unless you use the SD card or DropBox recording, the "added" features are unreliable. And even if using SD card or DropBox, you have to reboot them every couple days otherwise they stop recording entirely.

orly going thirty

Intel and AMD Processor Micro-Benchmarking

ProLink PIC3002WN Review