The Insanity Must Stop
Tuesday, February 2, 2010 at 9:22AM
Gary L Kelley in IT, Process, Production

“The insanity must stop,” the newly minted IT director shouted in shear frustration.
After numerous outages caused by people making mistakes combined with some equipment malfunctions, the IT team was just beaten down. .


Too many long days, combined with long nighttime problem resolution conference calls, prompted a vocalization capturing what everyone was thinking. Saying it out loud informally made it OK to discuss the situation and even chuckle about it.


“Being lucky” often means covering the contingencies so when things go awry the organizational and computerized systems can recover gracefully. When you find yourself in a “bad patch,” don’t invocate “Murphy’s Law” as the culprit. The culprit is…you.

You are positioned and expected to lay out the processes and procedures to enable stability. A “Production Stabilization Program” is often indicated when the organization has grown very quickly and/or implemented more change than the organization/systems can tolerate.

Understanding what is going on in a “bad patch” is critical. Analyze logs and incident reports identifying common themes and root cause. Talk to your staff about what they are feeling and seeing, and what solutions they may offer.

It is often useful to categorize the situation using a cause and effect diagram (Fishbone / Ishikawa ) or borrowing from a SWOT analysis (SWOT is as an acronym for Strengths, Weaknesses, Opportunities, and Threats, categorized by internal vs. external factors.) Such a categorization allows you to visually identify the opportunity areas for the organization. On this, the tool selection is far secondary to the capturing and categorization.

With the knowledge of the impacting factors, plans can be put in place to address. Some of the fixes may be quick (i.e.: we need to reprovision some storage to address the failures) and others may become longer term, strategic initiatives (i.e.: we need to implement a disaster recovery strategy we can use “daily” mitigating outages). In this case, having a plan and implementing becomes a very sensitive issue, and one teams can rally behind.

We often hear it’s hard to be responding on a daily basis to issues and working what’s essentially a separate project to identify impacts and develop plans to address. Managers need to be sensitive when the long hours are taking a toll…when staff becomes “snappy” or “grumpy”, it’s not the best time to add work. Often a fresh set of eyes and external perspective is invaluable.

Anyone being brought to bear on Production Stabilization Programs needs to have the professional maturity and sensitivity to be perceived as not out to “shoot the messenger” or solve all the problems of the world. They are working through a separate process, and their role is about process. If the staff misunderstands this, morale will suffer substantially!

The root cause identification and articulation (at a high, process level) and subsequent planning effort can be best put together by a team external to the issues, clearly with input from the affected team, with a review and vetting by the team for management consumption.

So when you are tempted to enlist Murphy’s Law as the cause, remember what Abe Lincoln said, “when you’re being run out of town on a rail, get in front of the crowd and make it look like you’re leading the parade.” Take the bull by the horns and lead the team to stability!

Article originally appeared on Gary L Kelley (http://garylkelley.com/).
See website for complete article licensing information.