Beware the Wimpy CPU
We are engaged with a customer where we are delivering System Platform on standalone skids. It’s a pretty new experience for us. We’re used to stacking up a bunch of Dual Xeon servers with tons of RAM with thousands and thousands of IO. This one was quite different. We are running everything on a single, fanless PC mounted in a stainless steel enclosure with a few hundred IO.
Early on in the process we were given a pretty tight power requirement so we ended up with a Intel Atom D525 processor in our unit. It booted quickly. We didn’t have any obvious issues during development. Programs opened and closed quickly, everything seemed ok. To be fair we did most of our development on our VSphere infrastructure so we only had a small amount of time before FAT for full blown testing.
Testing started off ok enough. InTouch opened quickly, navigation worked ok. We’re used to seeing small delays as you change screens and IO is subscribed. Everything fell apart, however, when we started doing some apparently CPU intensive activities. If you reference a previous article where we discuss how to handle mega super huge arrays. We were using this approach on this system.
http://www.avidsolutionsinc.com/blog/archestranaut/2011/10/async-scripts-without-using-async/
We went from a particular formula download process taking approximately 15 seconds tip to tail to almost 3 minutes! Holy cow… that’s not going to work. With necessity being the mother of invention we cracked open our algorithms and squeezed out some more efficiencies to the point we got this down to about 1.25 minutes. Much better but still not great.
First thing we looked at was memory usage. In this unit we had 4 GB of RAM and we were using less than half. No issues there. Ok, it must be the disk. This unit had a junker 5400 RPM drive so obviously that was the problem. Easy to fix we thought. I just do happened to have an Intel SSD at the house (building a VSphere server at home… yes dorkdom personified) so I brought this in to run testing. First off the unit now booted like demon. So fast you almost didn’t see the startup screens etc. Sweet, we’ve got this one licked. Run our test.. absolutely no improvement. Hrmmm. After sitting around and thinking about it we realized our slowdown didn’t really have anything to do with checkpointing which would have been most directly related to disk performance. We should have figured this out when we moved our checkpoint from our spinning disk to a slower compact flash card installed in the unit and saw no change in speed. All that was left was CPU
So, after discussions with the customer we order a new unit with a Core i7 processor, the best desktop CPU you can purchase right now. After a little magic with Acronis transferring the system image we were back up and running. First test, down under 30 seconds! Success but not perfect. We were hoping to get back to our 15 second time frame. The best we can guess is that our VSphere platform had so much CPU horsepower we probably wouldn’t be able to match that in a single fanless unit. A big consideration is the heat load. Because this unit was in a sealed enclosure we had to be very cautious about how much head the CPU generated.
The key learning here is that even though a particular setup seems fine you need to make sure you test end to end with all of your code before declaring a particular hardware platform good enough.
Something we found during this process was a great site to help you look at relative CPU speeds before you purchase. People use to worry about clock speeds and assume 2.5 GHZ was always better than 2.0 GHZ. What if the 2.0 GHZ unit has 8 cores and the 2.5 GHZ was a dual core. Well obviously the 2.0 GHZ unit is much more of a workhorse. The site we found is referenced below:
Of interest are the relative speeds of the CPU’s we played with. The Atom D525, the original low power unit, had a relative speed number of 772 units. This is an aggregation of a lot of tests so it’s a general reference, not a guarantee of how your application will perform. Our new CPU was an Intel Core i7-2655LE. The relative performance for this one was 2674 units, almost a 4x improvement. As you can see from the above discussion above we didn’t speed things up by 4x but we did make a substantial improvement.
One last caveat. One i7 processor that’s close in clock speed to another doesn’t mean a small difference. For example the Core i7 2600K @ 3.4GZ vs. 2.2 GHZ for our chosen unit had a relative performance index of 8652, almost 3.5x difference.
Does anyone else have a similar experience to discuss? Part of being a system integrator is making lots of decisions based on experience and a hunch; without all the information you need when your putting together hardware for a project. If your good most of the time everything works out ok. Sometimes things don’t turn out perfect and you have to scurry to find a solution. That scurry and effort is usually the difference between a one time job and repeat customer.
- Andy


Great article! Thanks for sharing the specifics and the site.
Andy,
One thing I have ran into lately is C states and P states of processors, the virtualization technolgy sometimes cannot control this often properly, because of green and laptop optimized OS’s the Bios in general ships with these enabled, there are now recently discussions all over the IT space to disable these.. so for the novice c steps are associated with speed step technologies and let the processors cycle the voltage supply and and frequency when idle, P states are the ability to park the core on idle, I did today a simple test had a VM with 4 cores and ran prime 95 on it (maxes out the cores) the host machine was happily still puttin 4-6 cores in park..so now I am about to switch those off in the bios and run some tests again…it might be nothing but worth a look @…since gamers mess with this stuff all time here is an explanation, and yes most serious gamers trun all the control stuff off…http://www.overclock.net/t/1058894/intel-acpi-guide-c-g-s-p-states-and-ocs
Did you consider creating an external program to handle the computationally intensive operations? I believe ArchestrA will let you link to a DLL or the like, and I know Wonderware will let you embed an ActiveX control, so you could let machine code handle the intensive stuff, then let the interpreted language stuff handle light stuff. Alternatively, you could dump your data to a text file and have a program running that occasionally interprets the text file and spits out another text file with the post-manipulation data.
I guess the downside of using another piece as a “black box” like that is it’s more difficult to modify or troubleshoot, mind you.