August 31st, 2011
Seeing all the activity at VMWorld via the Twittersphere got me to thinking about some current technologies and how what we do in the “Galaxy” might carry over.
Then it hit me, Archestra is a lot like the famed “Cloud”. Don’t get me wrong I’m not speaking of the Cloud in a condescending manner. I think the technologies that make up the collective Cloud are set to transform the way we do work on a scale similar to the integration of the Internet into our daily work flow. Answer this question honestly… do you feel stranded now when you have to work on your computer without an Internet connection… kinda like leaving the house to drive more than 5 miles away without a cell phone. Oh the horror. What will I do if I don’t see that email 1 minute after it’s sent. I don’t think it’s too far fetched to think that a handful of years in the future we won’t be installing local applications for editing and composition. Note that I am intentionally leaving out runtime applications as I think that’s a tougher sell to move off site.
Anyway, back to my original thought, how is Archestra like the Cloud and what can we learn from these similarities?
Read the rest of this entry »
3 Comments |
Archestra, General, Redundancy, Security, System Platform |
Permalink
Posted by Andy Robinson
July 11th, 2011
I think any good App Server engineer needs to learn about checkpointing. It’s such a critical piece of how App Server does what it does that you can’t simply be ignorant of it. Fortunately it works so well that most people never have to troubleshoot it… as opposed to troubleshooting why you can’t get communications with a platform to deploy something.
Here’s an article that came out a couple days ago that discusses some instances where the checkpointing system may not act appropriately, or at least how you think it should.
Mash Here
The second article discusses the success of using App Server on the High Speed Chinese Rail project. I haven’t had a chance to read it yet but I expect it will be really good. I think anytime someone tries to tell you App Server can’t scale you can point to this as published evidence of the fact that it can.
Mash Here
6 Comments |
Archestra, Articles, General, Redundancy |
Permalink
Posted by Andy Robinson
June 8th, 2011
We ran into a bootstrap issue on a live system this week. This customer can’t lose data. On the plus side, it’s a redundant setup & everything is running fine on one machine. The problem is that we need to wipe the platform that isn’t running the objects (use Platform Killer).
Read the rest of this entry »
6 Comments |
Redundancy, System Platform |
Permalink
Posted by David Goodman
May 31st, 2011
Sometime during the middle of last week one of my customers had a network card fail on them. No big deal. We’ve got redundancy using teaming on the network cards so we kept humming along without an issue until we got a chance to work on it, which happened to be the same day.
As an aside it happened to be a terminal server, which in this particular environment is a who cares type of thing. I say this to force you to think about your own environments. In this particular facility if an operator can’t see something for 10 or 15 minutes it might be an irritation but it’s not cause for a riot. However, losing process data is a totally different story. This data is what substantiates the fact that conditions in the facility were and are under control and within limits. If we don’t have this data we might not be releasing product… and that’s a bad thing for everyone. I contrast this with a previous stint in a major chemical plant where if the operators lost visibility we went into full blown meltdown mode, not it wasn’t a Nuclear facility but we panicked. Depending on the plant you could lose everything in a matter of minutes if an abnormal condition came up and you didn’t deal with it in a timely manner. There was a particular incident where we lost power to the control room (UPS didn’t work either) but the control cabinets stayed up and kept the process running. After about 5 minutes the decision was made to drop the plant using the big red button on the emergency shutdown system. It was much safer to bring the plant down using the ESD than allow some condition to run away while we were blind. All of this just to say there is no one size fits all answer for where you should have your redundancy. It should be truly driven by your specific requirements in your facility. Read the rest of this entry »
4 Comments |
Archestra, Redundancy, System Platform |
Permalink
Posted by Andy Robinson
May 13th, 2011
So I hope you enjoyed my previous article on different ways to achieve redundancy in your environments. However, one really tough lesson hit home with us this pas week.
We purchased a NetGear ReadyNAS PRO network storage array back in 2008. For the longest time we used it as our primary storage backend for your VMWare ESX environment. However, sometime last year we installed a newer, bigger, faster array and relegated this guy to just storing a bunch of install media and standalone virtual machines we needed to offload from our laptops.
Anyway, on with the story. Last Friday we started having problems accessing the data on this unit. One of our engineers walked into our server room and noticed an X above our #2 disk on the LCD display. Well that’s not good. I get on the horn with CDW and get a new drive in on Monday morning. We replace the drive but the array won’t start a rebuild. Well, since we can’t get to the admin console we have to reboot. In the process of rebooting we lost a second drive. This was a Raid-5 array across 6 disks. If you know anything about Raid groups you know that at this point we’re screwed, you can only lose one drive out of a Raid-5. Long story shorter we lost >3TB worth of virtual machines and install media…and we weren’t backing this unit up… OUCH. Luckily we are smart enough to store all of our project file in a location that is backed up nightly.
Read the rest of this entry »
4 Comments |
Redundancy |
Permalink
Posted by Andy Robinson
April 26th, 2011
Depending on your requirements the concept of system redundancy and resiliency might never cross your mind. For many facilities, however, having a system that minimizes outages and hiccups to an absolute minimum is a must.
When you talk about redundancy you really should look at two different aspects; redundancy and resiliency. What’s the difference between the two? For me redundancy is having functions duplicated across multiple components so that if a single component fails the system continues in operation without anyone even noticing. Resiliency is the ability of a system to easily recover from a failure.
Read the rest of this entry »
2 Comments |
General, Hardware, Redundancy |
Permalink
Posted by Andy Robinson