June 8th, 2011
We ran into a bootstrap issue on a live system this week. This customer can’t lose data. On the plus side, it’s a redundant setup & everything is running fine on one machine. The problem is that we need to wipe the platform that isn’t running the objects (use Platform Killer).
Read the rest of this entry »
6 Comments |
Redundancy, System Platform |
Permalink
Posted by David Goodman
May 13th, 2011
So I hope you enjoyed my previous article on different ways to achieve redundancy in your environments. However, one really tough lesson hit home with us this pas week.
We purchased a NetGear ReadyNAS PRO network storage array back in 2008. For the longest time we used it as our primary storage backend for your VMWare ESX environment. However, sometime last year we installed a newer, bigger, faster array and relegated this guy to just storing a bunch of install media and standalone virtual machines we needed to offload from our laptops.
Anyway, on with the story. Last Friday we started having problems accessing the data on this unit. One of our engineers walked into our server room and noticed an X above our #2 disk on the LCD display. Well that’s not good. I get on the horn with CDW and get a new drive in on Monday morning. We replace the drive but the array won’t start a rebuild. Well, since we can’t get to the admin console we have to reboot. In the process of rebooting we lost a second drive. This was a Raid-5 array across 6 disks. If you know anything about Raid groups you know that at this point we’re screwed, you can only lose one drive out of a Raid-5. Long story shorter we lost >3TB worth of virtual machines and install media…and we weren’t backing this unit up… OUCH. Luckily we are smart enough to store all of our project file in a location that is backed up nightly.
Read the rest of this entry »
4 Comments |
Redundancy |
Permalink
Posted by Andy Robinson
April 26th, 2011
Depending on your requirements the concept of system redundancy and resiliency might never cross your mind. For many facilities, however, having a system that minimizes outages and hiccups to an absolute minimum is a must.
When you talk about redundancy you really should look at two different aspects; redundancy and resiliency. What’s the difference between the two? For me redundancy is having functions duplicated across multiple components so that if a single component fails the system continues in operation without anyone even noticing. Resiliency is the ability of a system to easily recover from a failure.
Read the rest of this entry »
2 Comments |
General, Hardware, Redundancy |
Permalink
Posted by Andy Robinson
October 29th, 2010
Andy & I have always been fairly frustrated with the Wonderware Alarm Logger service. It almost seems like an afterthought for System Platform (just look at the service name: New_AlarmLogger). Perhaps the most frustrating part is trying to integrate it into a system with redundant App Servers. To the developers: please integrate this into the System Platform in a future release.
Read the rest of this entry »
2 Comments |
Wonderware Alarm Logger |
Permalink
Posted by David Goodman