SpeedStep® Limiting Single-Threaded Tasks
Written by John Miller
This is something that will need to be further fleshed out by benchmarks and research, but I wanted to start writing about it now, before I forget. More is certainly to come.
First, a little backstory for context. The company I work for, Esri, just released version 10.6.1 of their ArcGIS Enterprise product. Something I found very appealing about this release is that it includes a new interface and set of tools for processing imagery from UAVs into orthophotos, DSMs, and DEMs, called Ortho Maker. Esri has a desktop product that does this as well, Drone2Map, which is based on the Pix4D Mapper engine. Ortho Maker is unique in comparison because it is intended to run on server infrastructure rather than workstations or HEDTs, and anyone who knows me will know that I find that incredibly appealing.
So, when 10.6.1 was formally released, I wiped the TS140 'goonie' to a clean slate and configured it as follows: 2x8GB RAM (I moved the other 2 to TS140#1), 256GB Samsung 840 Pro boot/primary data drive, 2TB WD Green as a backup/scratch drive - this simply because it was laying around and I figured I could use some extra storage that didn't need performance or redundancy. Finally, a fresh copy of Server 2016 - which took forever to update, then a base enterprise deployment of ArcGIS Enterprise 10.6.1. Now, this also required an Image Server license because raster processing tools are used in order to generate the final products. In a production environment the server with the image server license would not be the same as the server running the portal and hosting server (you can read more about ArcGIS Enterprise architecture here), but since I am the only individual that will be accessing this server, and I know that nothing will be putting load on the hosting server or portal while the image server components are working.
I have a few different sample UAV imagery collections that I use to test this software, both my own and others that are available publicly on the internet, so I began with processing a few to get a better feel for the capabilities of the software. As a solution engineer, part of my job is to thoroughly test all the components of our software that I could find myself recommending to a customer, and this is no exception.
Where SpeedStep comes in...
After watching Process Explorer and the Windows Task Manager while the various stages of the ortho maker's processes were run, it became clear that these are single threaded tools. I imagine this will change in future releases, and there may be an option to make it multi threaded in the future, but as it stands each of the tasks only runs one thread at a time - each process never taking more than 25% of the total CPU time (the E3-1225v3 is 4c/4t).
Windows Task Manager shows the current clock speed of the CPU, which is very useful in cases like these, and what I was seeing was clock speeds much lower than expected. The E3-1225v3's base is 3.2GHz, with boost up to 3.6GHz, however Task Manager was showing that the system was rarely, if ever, hitting higher than 1.2GHz. Now, we all know what SpeedStep is - it's that fabulous technology that changes CPU speed on the fly, keeping power consumption and temperatures down. The dark underbelly to SpeedStep, apparently, is that it uses total CPU usage to gauge whether it needs to crank up the speed rather than per-core usage.
In this case, that's a problem. If the server is running two Ortho Maker tasks, and is otherwise idle, the CPU will only ever be at 50% + a small amount of system overhead, maybe 15%. What that means is that the system will never determine that it's necessary to step up, and the performance of those single threads will be far less than they could be. I suppose if I was concerned about temperatures and fan speeds then maybe that would apply, but this is a server we're talking about here.
How to resolve it
Thankfully this is an easy problem to solve. Disable SpeedStep. This can even be done from within windows, without rebooting. Launch powercfg.cpl and change the power plan from "Balanced" to "High Performance". If you want to see what this does, click "Change plan settings" then "Change advanced power settings" - under processor power management both the minimum processor state and maximum processor state will be set to 100% - meaning that in my case the CPU will always be at 3.2GHz, and will boost when possible.
SpeedStep can also (usually) be disabled from the BIOS, and this would be my recommendation. I'm currently working on this server remotely so that's not an option for me, but I will be addressing it soon!
What other people are saying
Not surprisingly, quite a few people have discovered this quirk, and Microsoft even has some KBs that address this when it comes to using Hyper-V. Basically what it comes down to is "know your workloads, and know your hardware". In many cases, leaving speedstep enabled will be best.
What I need to do next
More research, and some single threaded benchmarks of my own. I also want to test this theory across multiple machines - especially the C220M3 (currently in storage...).
- https://support.citrix.com/article/CTX221770 This one has a suggestion for artificially forcing the CPU to step up: "Try to generate CPU demand on a Unidesk Windows 7 desktop by opening two or three Command Prompt windows and entering dir /s at the root of the C: drive.*"
- Many more...