Want more unvarnished truth?
What I'm saying now
What you're saying...
Looking for something? Look here!
I think tag clouds are pretty, and not to be taken overly seriously
##MoveWithGary #Home Inspection #MoveWithGary 111 Chop House 75 on Liberty Wharf 9/11 A Broth of a Boy ABCs Abiouness accountability activities alcohol Allora Ristorante Analysis Angry Hams ANSI/TIA 942 Anthony's Pier 4 Apple Application Armsby Abbey Arsenal Arturo's Ristorante Ashland AT&T Audio Automation baby Baby Monitor babysitting Back To School Bad News Bangkok Thai banks lending movewithgary Bar Bay State Common baystateparent BBQ BCP Bees BeeZers Before I die I want to... behavior Big Bang Bike Bill of Rights Bistro Black Box BlackBerry Boston Boston Marathon boundaries Boyston BPO brand Breakfast Bridge Bring Your Own Technology Budget Building permit Burlington Burn Burrito buyer BYOD Cabling Cambridge Camp Campaign career Casey's Diner Castle casual cCabling Cell Phone Central Square Change Management Cheers Chef Sun ChengDu Chet's Diner Children Chinese Christmas Christmas Families Holiday CIO Cloud coddle collage College College Acceptance co-lo Co-Location Co-Location Tier Power Cooling Comfort Food Condo Control Country Country Kettle Crisis customer dad Dad Phrases damage daredevil Data Center Data Center Design Davios Day Care Dead Death declaration Del Frisco's Design Desktop Video dinner Disaster Recovery Divorce Do Epic Shit dodgeball downsizing Downtown Crossing DR driving Droid Easter Economic Kids Edaville Education Elbow Night Elevator Employee Engagement Erin Estate Planning Etiquette Evaluation events Exchange Expiration Dates Facebook Failing family Family Law Fatherhood Favorite things first time buyer Flash Flemings Fogo de Chão Food Hits and Misses Format Foundry on Elm Foxborough Frameworks fraternity Fraud French Fried Clams friends fun Fusion Generations germs Girl Scouts girls Global Go/No Go GPS Grafton Grandchild Grandpa Harry's hazing Healthcare Healthy Choices while Dining Out Help Desk Hisa Japanese Cuisine Historic holiday Home Home Inspection home renovation hope Horizons hose Hot Dog Hurricane IIT Assessment incident Indecision Indian Infrastructure Inn Innovation Insurance Internet Inventory Management iPhone IT IT Assessment IT Satisfaction Italian Jack Daniels Jakes Restaurant Janet Japanese Jazz Joey's Bar and Grill JP's Khatta Mitha kickball kids Laid off Lakes Region Lala Java Leadership Learning legacy Legal Legal Harborside Les Zygomates L'Espalier Liberty Wharf life transition lights out Linguine's loss Love Lucky's Cafe luxury luxury home M&M Macys Thanksgiving Day Parade mai tai Managed Application Services Managed Services managers Mandarin Manners Mark Fidrych marlborough marriage Mary Chung mass save Maxwell-Silverman Mediterranean meetings Memorial Day memory Mendon Mergers Mexican MiFi Migration Ming III miss MIT MIT CIO Symposium mmortgage Mobility Moes Hot Dog Truck MOM money mortgage Mother MoveWithGary Moving on Name nature neanderthal neighborhood Network new listing New York Marathon newborn newtomarket Northborough Not Your Average Joe's Nuovo Nursing On-Call Operations Operators Oregon Club Organization Pancakes Pandemic Parental Control Parenting Patch Peeves People Perserverance UMASS growth Photography Play Plug and Run Predictable Pride Problem Process Production program Project Management propane PTA. PTO PUE QR Quick Response Rant re/max Real Estate Realtor Recognition Red Rock Resiliency Respect restaurant Restaurant Guy RFP ribs Ritual Root Cause Analysis rReal Estate Sam Adams Sandy Sapporo savings School Sea Dog Brewing Company Sea Dog Steak and Ale Seafood Seaport Security Sel de la Terra Service Service Desk Service Indicator Light sharing ShearTransformation SHIRO Shit Pump Shriners SHTF Simplification Skunk Works Skype Sleep sleepovers Sloan Smith & Wollensky soccer Son SOP sorority spanking Squarespace staffing staging Starbucks Status Reporting Steak Steve Jobs Storage Strategy stress Summer Sushi swimming Tacos Acalpulco teacher Technology Teen Telephony Temperature Strip Tenka terrorist Testing Texas BBQ Company Text Thai Thanksgiving in IT The Mooring Thomas Thought Leader Three Gorges III TIA 942 Timesheets Toby Keith Toddlers traditions Transition treehouse turnover TV Twitter unspoken moments Valentine's Day Value Vendor Venezuelan Verizon Vermont Video Vietnamese voice VoIP Watertown Wedding Westborough Korean Restaurant Westborough MA. StormCam WiFI Wi-Fi Wilbraham Wine Worcester work work life balance working Yama Zakura Zem Han Zitis
Saturday
Oct242009

Data Center Disciplines

I have been teased my entire career about my nearly obsessive behavior around keeping data center rooms neat and tidy. While I’d love to blame my Mother for my neatness, the truth is keeping a data center clean is about one word: discipline.


Having a neat and tidy data center environment sends a reinforcing message to everyone entering about the gravity of the work performed by the systems in the area. This is important for staff, vendors, and clients.

The observational characteristics I look at when I walk in a data center are:


  • Life Safety – are the aisles generally clear? Are there Emergency Power Off switches and fire extinguishers by the main doors, is there a fire suppression system in place, is the lighting all working….

  • Cleanliness – Forty years ago data centers were kept spotless to prevent disk failures. A speck of dust might make a hard drive disk head fail. These days, disks are generally sealed, and can operate in pretty rough environments (consider the abuse of a laptop disk drive.)
    While disk drives are generally sealed, why should data centers be dirty? Look for dust, dirty floors, and filthy areas under raised flooring. One data center I went in had pallets of equipment stored in the space…was the data center for computing or warehousing?

  • Underfloor areas – are the underfloor areas, assuming use as an HVAC plenum, generally unobstructed? More than one data center I’ve been in had so much cable (much abandoned in place) under the floor the floor tiles wouldn’t lay flat. This impacts airflow, and makes maintenance a challenge.
    I also like to see if the floor tiles are all in place, and if some mechanism is used to prevent cold air escaping through any penetrations. 30% of the cost of running a data center is in the cooling, and making sure the cooling is getting where it needs to be is key. (While at the opposite end of the space, I like to see all ceiling tiles in place. Why cool the area above the ceiling?)

  • HVAC – are the HVAC units working properly? Go in enough data centers, and you’ll learn how to hear if a bearing is failing, or observe if the HVAC filters are not in place. As you walk the room, you can simply feel whether there are hot spots or cold spots. Many units have on board temperature and humidity gauges – are the units running in an acceptable range?

  • Power Distribution Units – are the PDUs filled to the brim, or is available space available? Are blanks inserted into removed breaker positions, or are their “open holes” to the power. When on-board metering is available, are the different phases running within a small tolerance of each other? If not, outages can occur when hot legs trip.

  • Hot Aisle/Cold Aisle – Years ago all equipment in data centers was lined up like soldiers. This led to all equipment in the front of the room being cool, and all the heat cascading to the rear of the room. Most servers today will operate as high as 90 degrees before they shut themselves down or fry. By having a hot aisle/cold aisle orientation, including blanks in empty shelves on servers, cooling is most effectively in place. Some organizations have moved to cooling being in the racks as a designed alternative.

  • Cable plant – the power and communications cable plants are always an interesting tell tale sign of data center disciplines. Cables should always be run with 90 degree turns (no transcontinental cable runs, no need for “cable stretching”). Different layers of cables under a raised floor are common (power near the floor, followed by copper communications then fiber). (A pet peeve of mine in looking at the cable plant is how much of the data center space is occupied with cables. Cables need to get to the equipment, but the cable plant can be outside the cooled footprint of the data centers. Taking up valuable data center space for patch panels seems wasteful. One data center devoted 25% of the raised floor space for cable patch panels. All this could have been in not conditioned space.)

  • Error lights – As you walk around the data center, look to see what error lights are illuminated. Servers are often monitored electronically, and error lights utility is lessened is a argument. That said, error lights on servers, disk units, communications units, HVAC, Power Distribution units and the like are just that: errors. The root cause of the error should be eliminated.

  • Leave Behinds – what’s left in the data center is often an interesting archeological study. While most documentation is available on line, manuals from systems long since retired are often found in the high priced air and humidity controlled data center environment. Tools from completed projects laying around are a sign thoughtfulness isn’t in place for technicians (I’ll bet their own tools are where they belong).

  • Security – data centers should be locked, and the doors should be kept closed. Access should be severely limited to individuals with Change or Incident tickets. This helps eliminate the honest mistakes.

While far from an inclusive list, this article is to help silence my lifelong critics about my data center obsessions. These are simple things anyone can do to form a point of view on data center disciplines. Obviously follow ons with reporting, staff discussions, etc. is appropriate.

Wednesday
Oct212009

Alcohol as a Truth Serum

Alcohol has a strange way of impacting people. For some, it is a truth serum reducing inhibitions around what NOT to say. Others end up weaving great stories when imbibing on some spirits.

Such was not the case when Cindy’s husband approached me during a holiday party. “I don’t like it when Cindy gets paged. She doesn’t get paid for it and it interrupts our activities.”

This was NOT what I wanted to hear.

First, I really dislike Holiday Parties. Tending to the introverted like so many IT types, I’ve conditioned myself to get out of the corner and make a pass through the entire event. Once I complete my tour, I reward myself with a “get out of party” card. Second, having a discussion with a concerned spouse during an event can be an opportunity or a disaster.

Cindy, the Production Control Scheduler, looked on in horror. Her husband is a big burly man, intimidating at first sight. He was raising a concern, one I suspect Cindy had heard about privately on other occasions.

“I don’t like it when Cindy gets paged, either. It’s a bother for our Operators, too. What it often means is something is wrong in the schedule Cindy produced and by giving her direct insight to the issues we place her in a position of addressing them permanently so she won’t get paged again in the future.” Hubby suddenly began to see the light around the accountability Cindy had, and her direct power to impact the results.

It’s important to note we both worked in a manufacturing company having the rules of a Union without the Union…this played into my continuation. “Cindy is entitled to 4 hours pay every time she is paged. I know Cindy feels awkward about putting in for 4 hours if it was a quick call from the operators. I believe she only records “serious” time spent on issues, and on Monday I can sit with Cindy and review our reimbursement processes.”

At this point, Hubby seemed to be more interested in Cindy’s time recording practices than anything else, and I suspect Cindy had further conversations with Hubby in the coming days.

There are couple areas where conversation is good with the subject of on-call.

Companies need to be clear on their on-call approaches and whether any financial or other remuneration is received. My own sense is an occasional quick call comes with the territory. If logging on and researching is needed, we need to acknowledge the impact. Personally, I’m not a fan of rigorous time reporting on these kinds of interruptions. I’d rather be more lenient of someone leaving early to play golf or see the kids play (or whatever the passion is), or taking an occasional day off. (As a manager, I believe such accommodations should be within a week or two of a significant on-call event. I really dislike being confronted with 42 days off accumulated over the last two years.)

Staffs need to make their friends and family aware of what on-call means for them. Few industries have professionals with 24x7 requirements of IT. While “smart hands” may be in a rotation, managers are often called into every major issue and are on conference calls a large part.

Staffs also need to think about the root cause of the issues. Breathing a collective sigh of relief after a remediation is just the first step. Thoughtful analysis of the root case for the issue needs to be performed with an eye towards learning how to prevent issues in the first place. This effort must be focused on improvement, not on a witch hunt. “Post Action Review” is an important concept, where individuals and their managers can present findings to senior IT managers where a lively, thoughtful discussion can take place.

Through thoughtful, reasoned communications improvements can be made reducing issues and outages.

Tuesday
Oct202009

Wanted: Technology to Drive Process

“Technology driving process; It’s not supposed to work that way.”

Anyone with formal training in process engineering knows you start with defining and optimizing your processes, and then use technology to streamline those processes. In an ideal world, this works perfectly. Layer organizational structures, system ownership and governance models, and personalities, and people naturally gravitate towards what they know best; technology.

A number of years ago I began moving my Infrastructure & Operations division to more of a process-based organization. I, and my management team, attended a series of seminars given by the late Dr. Michael Hammer. We received our certificates in “Process Mastery” and, with our newfound evangelical powers, were ready to transform the organization.

Barbara, having a reputation as an overachiever, volunteered her group, the Desktop Support, Engineering, and Help Desk department, as the first to make the move into the world of process. Over the period of a year, Barbara documented existing processes, designed new process where others were missing, created a process map, developed Service Level Agreements with users and other IT groups, and conducted training sessions with her team. The results were better delegation of tasks to the right individuals; managing to metrics, happier employees, and, most importantly, improved service levels.

As Barbara’s manager, I pushed for more improvement. Barbara responded by identifying Help Desk requests that could not be resolved on the first call and required assistance from others in the IT organization. In reviewing the list, Barbara realized 30% of the tasks could be shifted to the Help Desk and drive down costs while dramatically improving resolution time for the user. Requests such as resetting passwords, granting access to network file shares, provisioning user logins, email accounts, and printers, distribution of remote access security tokens and instructions, and creation of new user profiles were all currently being performed by senior systems administrators. The management team thought Barbara’s idea was great, and she was empowered to make it happen.

Barbara thought this would be simple. She had achieved what she thought was buy-in from the entire infrastructure and operations organization. What she didn’t expect was resistance around what people believed gave them power. Employees in other groups were fine giving over these mundane tasks so long as they still had control over approving each transaction. They felt this authority (and the trust that went along with it) is what made them special to me and others in the IT organization. What they failed to see was the erosion in the level of respect they received when they complained about not having enough resources, but failed to seize this opportunity.

The solution came through implementing technologies enabling Help Desk personnel to grant user privileges without being server administrators. Once the systems administrators saw they had not lost any control, were able to delegate tasks to the Help Desk, and were able to focus on more high value projects, they began to think about methods to offload other processes to the Help Desk. Success was achieved.

The lesson from this story is the need to discover what drives people before reengineering their processes. In this situation, the sense of control the ability to manage the technology was the drivers for the Systems Administrators. Empowering the Systems Administrators to use technology and enable the transferring of some of their processes made them supporters and eventually advocates.

In the case of the Systems Administrators, they defined their processes by technology. Therefore, technology became their driver for re-engineering their processes.

Some of the “squishier” skills, such as process, project management, client management, and budget management can be difficult for administrators and operators to understand or appreciate, and can be in conflict with their priorities. In a production IT shop, keeping systems up and running and eliminating any user downtime is the top priority. Asking people to take time away from their priorities, particularly in these challenging times, may be counter-productive and may produce defensive behavior. Take care in understanding your audience .

Tuesday
Oct202009

Fear Factor

It’s 4:00AM and your primary storage array just failed. That’s particularly concerning in a financial services company. Fixed Income traders start working at 7:00AM. You convene your incident management team and the decision is to failover. You inform your system and database administrators and the answer is that it might be riskier to fail over than to fix the problem.

Argh!


We invest millions of dollars in redundant systems to achieve high availability or disaster recovery and in some companies we are too scared to use them. We spend years perfecting software configurations and hardware clusters for failover but never feel comfortable enough to initiate a failover. Hardware and Software companies sell us new technologies with the promise of “5-nines” reliability, but nobody has factored in human emotion and fear of failure.

I was always very proud of my messaging (email, instant messaging, unified communication, etc.) team with regard to their use of clusters and failover. They convinced me the email system could be made highly available with minimal time to failover, data loss, and time to failback, but what really impressed me was how they used these capabilities on a daily basis. They changed their release management processes to include failover of the active server thus eliminate all downtime. They practiced failing over under different scenarios and always knew exactly how much time it would take (3 minutes) to bring up the passive node. Every year, they found ways to enhance those capabilities and leverage what they had learned towards improving our disaster recovery capabilities. When we decided to distribute our centralized email system to our global offices, they gave management the option of local failover, failover to our primary data center, or both. In short, they got it!

As a manager, you want to believe in your people, vendors, and the technologies choices you have chosen for your infrastructure. The truth is, you can believe, so long as you help people get over their fear. The messaging team used the technologies as part of their operational routines. Things matured to the point, where they might failover a server or virtual server at their own discretion. They earned the trust of the management team. I remember the day the entire IT management team was on a conference call counting down the minutes for the first production failover of the server. After that, it was no longer an event.

Years ago, we ran our investment business on two, large (for their time), super mini-computers. All data, programs, and systems management tools (schedulers, backup, monitoring) were fully replicated on each system and our computer operators load balanced between the systems. Even though it required manual intervention, it was our earliest form of high availability, and it worked.

High availability architectures are beneficial, so long as they are used frequently. Allowing them to sit idle deepens the level of fear within the support staff. Using them gives your staff the opportunity for improvement. Making failover part of your operational procedures reduces the need for Disaster Recovery testing, improves uptime, and can ease the pain of release management.

The goal of all Infrastructure & Operations organizations is 100% uptime. Exercising your high-availability architectures will bring your systems close to this goal and improve the performance of your staff.

Saturday
Oct172009

The Implications of Being “On-Call”

Dear family and friends,

I have been paged by the Operators at work because there’s an issue requiring attention and I’m on-call.

Being on call is important. Something is broken and needs to be fixed. Since I work in a larger team, I have to cover every once in a while, although it seems my “on-call” periods fall when important things happen. When you ask me, “Isn’t there someone else they can call?” it’s simply that it is my turn.

You need to know I see the disappointment on your face when you hear the paging tone on my smart phone.

The truth is I feel the same way; the page is an intrusion into our lives and often it comes at inopportune times.

You see, part of my job is fixing issues, and the other part is making sure we don’t have issues in the first place. That said, things happen.

Yes, I remember being on a conference call Christmas Eve. I haven’t forgotten leaving the concert so I could get to the closest PC (I guess the wireless card improves that!) Looking at your eighth grade “graduation” pictures, snapped while I was in the hallway talking someone through an issue, makes me sad. That special weekend in Nantucket was ruined with me on the phone Saturday night.

Carrying a laptop around every few weeks isn’t my idea of a good time either. It’s heavy, and I can’t have the freedom to ride the rides, go down the water slide, or just be playful with you.

Some people say, “it’s my job, it pays the bills, get used to it.” While true at some level, the times I’ve missed pale by comparison to a “job.” Systems people get paged, and have to fix things.

Other professions use on call rotations, too. When you are ill, and want to talk to your Doctor, they get a call. Stock traders watch the markets around the world, some even changing their sleep pattern to be “up” for other markets. The plumber was with his family too Thanksgiving when the drain backed up.

When I get paged, there’s often emptiness in my heart. If fixing the problem takes a long time, I really do miss you and often hear you continuing the fun on the other side of the door. And while I’m happy to solve an issue, I also feel really badly when we can’t just pick up where we left off. You see, to me our time together “freezes” when I go into problem solving mode, while you move on to the next thing.

While I’m away, take extra pictures and save me dessert. I’m not being rude, in fact to the contrary I am very torn.

As soon as I get back, let’s try picking up where we left off.

Page 1 ... 39 40 41 42 43