We are engaged with a customer where we are delivering System Platform on standalone skids. It’s a pretty new experience for us. We’re used to stacking up a bunch of Dual Xeon servers with tons of RAM with thousands and thousands of IO. This one was quite different. We are running everything on a single, fanless PC mounted in a stainless steel enclosure with a few hundred IO.
Early on in the process we were given a pretty tight power requirement so we ended up with a Intel Atom D525 processor in our unit. It booted quickly. We didn’t have any obvious issues during development. Programs opened and closed quickly, everything seemed ok. To be fair we did most of our development on our VSphere infrastructure so we only had a small amount of time before FAT for full blown testing.
Testing started off ok enough. InTouch opened quickly, navigation worked ok. We’re used to seeing small delays as you change screens and IO is subscribed. Everything fell apart, however, when we started doing some apparently CPU intensive activities. If you reference a previous article where we discuss how to handle mega super huge arrays. We were using this approach on this system.
We went from a particular formula download process taking approximately 15 seconds tip to tail to almost 3 minutes! Holy cow… that’s not going to work. With necessity being the mother of invention we cracked open our algorithms and squeezed out some more efficiencies to the point we got this down to about 1.25 minutes. Much better but still not great.
First thing we looked at was memory usage. In this unit we had 4 GB of RAM and we were using less than half. No issues there. Ok, it must be the disk. This unit had a junker 5400 RPM drive so obviously that was the problem. Easy to fix we thought. I just do happened to have an Intel SSD at the house (building a VSphere server at home… yes dorkdom personified) so I brought this in to run testing. First off the unit now booted like demon. So fast you almost didn’t see the startup screens etc. Sweet, we’ve got this one licked. Run our test.. absolutely no improvement. Hrmmm. After sitting around and thinking about it we realized our slowdown didn’t really have anything to do with checkpointing which would have been most directly related to disk performance. We should have figured this out when we moved our checkpoint from our spinning disk to a slower compact flash card installed in the unit and saw no change in speed. All that was left was CPU
So, after discussions with the customer we order a new unit with a Core i7 processor, the best desktop CPU you can purchase right now. After a little magic with Acronis transferring the system image we were back up and running. First test, down under 30 seconds! Success but not perfect. We were hoping to get back to our 15 second time frame. The best we can guess is that our VSphere platform had so much CPU horsepower we probably wouldn’t be able to match that in a single fanless unit. A big consideration is the heat load. Because this unit was in a sealed enclosure we had to be very cautious about how much head the CPU generated.
The key learning here is that even though a particular setup seems fine you need to make sure you test end to end with all of your code before declaring a particular hardware platform good enough.
Something we found during this process was a great site to help you look at relative CPU speeds before you purchase. People use to worry about clock speeds and assume 2.5 GHZ was always better than 2.0 GHZ. What if the 2.0 GHZ unit has 8 cores and the 2.5 GHZ was a dual core. Well obviously the 2.0 GHZ unit is much more of a workhorse. The site we found is referenced below:
Of interest are the relative speeds of the CPU’s we played with. The Atom D525, the original low power unit, had a relative speed number of 772 units. This is an aggregation of a lot of tests so it’s a general reference, not a guarantee of how your application will perform. Our new CPU was an Intel Core i7-2655LE. The relative performance for this one was 2674 units, almost a 4x improvement. As you can see from the above discussion above we didn’t speed things up by 4x but we did make a substantial improvement.
One last caveat. One i7 processor that’s close in clock speed to another doesn’t mean a small difference. For example the Core i7 2600K @ 3.4GZ vs. 2.2 GHZ for our chosen unit had a relative performance index of 8652, almost 3.5x difference.
Does anyone else have a similar experience to discuss? Part of being a system integrator is making lots of decisions based on experience and a hunch; without all the information you need when your putting together hardware for a project. If your good most of the time everything works out ok. Sometimes things don’t turn out perfect and you have to scurry to find a solution. That scurry and effort is usually the difference between a one time job and repeat customer.