Journal - Gary L Kelley

Monday

Oct222012

Disaster Recovery vs. SHTF planning

Monday, October 22, 2012 at 8:00AM

Maybe it is the economy, or maybe the election. Increasingly I am seeing people talking about “prepping” for when we descend into anarchy.

From SHTFPLAN - 8 Reasons Why The Great Depression Is The Best Case Scenario

The number of websites for this is impressive…here is a sample:

SHTF Plan - When It Hits The Fan, Don’t Say We Didn’t Warn You
Homestead Survival
Cultivating Home

There are also people analyzing where to go and publishing books on it.

Far be it for me to criticize these people. Heck, they might be right. That said, if you really believe the end is coming, you’d buy a piece of land and go “off the grid.”

It’s hard to fathom what off the grid would be like. While it may be fun to flirt with the idea of living simply, the truth is the vast majority of us would struggle to do so.

Where is this rambling post going?

It’s about what we can do to make sure we are not thrown back into the dark ages.

So many companies still believe Disaster Recovery for Systems is optional. That’s just irresponsible.

DR plans need to be put together, and tested, so outages are minimized. Plans need to meeting the needs of the business from a Recovery Point Objective (RPO) and a Recovery Time Objective (RTO). This all starts with a Business Impact Analysis….understanding the impacts of outages.

Depending upon your business, the recovery plan can be very simple or very elaborate. As a consultant, I live on my laptop and so have a backup (old) laptop. My documents are stored in the cloud and are backed up. Simple stuff.

If you run a trading business, your needs are for real time replication and activation. More elaborate and certainly costly…and “cheap” compared to being out of the market during a market swing.

Do I have some MREs (Meals Ready to Eat) at home? Yes, enough for a couple weeks as a simple storm can take me personally off the grid. I’d be fine for a couple weeks. So for me DR preparedness is about common sense, spending wisely, and knowing you can recover.

If the end of the world does come, let me apologize now for asking you to prepare for an event. You would have been better off digging a hole for your bunker.

What are your thoughts?

Gary L Kelley | |

Monday

Mar262012

When to Update Production and DR

Monday, March 26, 2012 at 8:00AM

Some companies run a “production only” environment. Think a restaurant, where they buy packaged software and can use paper as a backup system. The chances are excellent they buy packaged software, and are looking to the software provider to have proper systems management.

Other companies can take environments to another extreme…having multiple development, certification, integration, performance, staging production and disaster recovery environments, steadfastly promoting code through each environment. Many organizations take “release management,” to levels comparable to large software firms.

Of course, the age old debate of few formal releases vs quick regular releases is always in play.

This post tackles a simple question. Assuming you have both a production and a disaster recovery environment, what is the upgrade order?

Obviously the circumstances in your environment will drive what you do. One might argue in a truly active:active environment, there is no such concept as disaster recovery. For purposes of this post, production and DR are separate, failover is possible, and failover is tested regularly.

Some people argue the natural upgrade approach is to upgrade disaster recovery first, then complete an upgrade process by touching production. In this approach, every stage of the promotion process is views as tests preserving production for the “final” change.

We posit Disaster Recovery should be upgraded after ensuring Production is stable.

It’s not that we don’t think Production is important. To the contrary, we revere the Production environment.

By upgrading DR after Production, you assure the business can failover if the upgrade proves untenable in production. There is always a known working copy available.

IT professionals following this approach have to determine when Production is stable. Is it an hour? A day? A cycle?

We suggest it is after a day’s stable processing. A day is arbitrary; as a practical matter once changes are in place there is a point of no return where any fixes will be made in the new environment and not after failing back.

IT professionals must remember a promotion cycle is not complete until DR is upgraded. When organizations neglect to upgrade disaster recovery, they lose their failover ability.

Is this a once size fits all recommendation? No, you need to look at your environment and the changes underway. Database Schema changes, and core functionality changes may preclude a phased approach. Since we fundamentally don’t subscribe to big bang, we suggest always trying to maintain a failover ability.

How does your organization deal with upgrades/migrations to minimize risk?

Gary L Kelley | |

Monday

Jan092012

Every Problem has a Solution

Monday, January 9, 2012 at 8:00AM

It’s Sunday as I write this. Years of early morning alarms had me wide awake at 5AM. The house was still quiet, and it seemed natural to catch up on some TV.

Years ago, my daughter and I would watch Grey’s Anatomy together. She’s since moved off the show, and I still enjoy. My DVR had the most recent episode…what better thing to catch at 5AM?

Grey’s Anatomy, courtesy ABC.GO.COM

So, yes world, I watch Grey’s. And I found myself in full tears during this episode (if you watch online, I was in tears at the 28:50 mark, as an eighteen year old makes life and death decisions for her father.)

In IT, we generally don’t have life and death situations. In healthcare IT, at an extreme our systems may impact healthcare…and yet as IT staff we are not faced with it.

Certainly many IT types have uttered the words, “I’m going to get killed if…. <fill in the blank>”

This system doesn’t come back up
This project is late
This runs over budget

The truth is, worst case is someone losing their job…and that only happens in extreme cases.

Recently I was having a conversation with the VP of Operations for a major hosting company and commenting on how calm he (always) is. He laughed…and shared there are nights he doesn’t get sleep. And that in IT, every problem has a solution.

At first I wanted to challenge the point, and as I thought about it more…..agreed with him. By the time something is through the development cycle and into production, the “unsolvable problems” have been addressed.

Sure, hardware may break and can be repaired, software may fail and need to be patched…but IT always rises to the challenge.

Management might suggest issues sometimes take too long to address…and that’s where management’s commitment to real, tested, and usable disaster recovery (DR) often comes into play. With “real” DR, business impacts are often minimized.

Has your business ever died due to IT?

Gary L Kelley | |