The art of high-availability (HA) planning — that is, keeping your IT system up and running as much as 99.999 percent of the time — is to balance potential business cost with the definite cost of insuring against system downtime.
META Group research shows that most users’ time, money, and effort for HA go to addressing unplanned downtime. For example, what happens if a disk drive fails or someone trips on the server’s power cord? The flaw to this approach is that as much as 70 percent to 90 percent of downtime may be directly associated with planned activities.
Through 2003, database management system vendors will continue to improve and promote HA options — including online utilities, improved replication, and scale-out clusters — to meet expected user demand. Through 2005, organizations will struggle to implement and enforce processes, such as change management, designed to mitigate the need for planned downtime and the impact that planned changes have on unplanned downtime. This will be due to an overreliance on redundant infrastructures and change management tools without devoting proper attention to internal people, processes, and responsibilities.
Solving unplanned downtime is about eliminating single points of failure. This goal is a priority among infrastructure planners, and certainly redundant application infrastructures are a prerequisite of any HA strategy. However, a recent outage at one of the major stock exchanges shows that simply having two of everything cannot eliminate all downtime (such as that caused by a change made to application code during a planned maintenance window). Failover may occur, but if the failover node is also running corrupted software, it too will fail.
While this appears to the outside world as a technology glitch, the truth is that software (unlike hardware) never wears out, so when there is a problem, it is likely a human or process issue, not a technology issue. Organizations must recognize that people, process, and infrastructure are all interdependent facets of an HA solution. In fact, the people and process issues comprise at least 80 percent of the solution, with infrastructure (i.e., redundancy) assuming the remainder.
Most IT organizations need look no further than their own mainframe environment for inspiration. Although it is universally accepted as more reliable and available than distributed platforms, the mainframe is not inherently more reliable. It simply has more defined processes, procedures, roles, and responsibilities (e.g., scheduling groups, change management groups, or security groups) than the distributed platform. Centralized infrastructure support models are a first step that every IT organization must consider when developing an HA strategy.
Other Recent META Reports |
How To Ensure IT Projects Boost Profits
Java Keeps Brewing, But .Net Looms |
Planned downtime is reserved for performing necessary jobs, such as batch jobs, system performance tuning, or application or system code fixes and upgrades; the common denominator is typically a change of some kind. Change, while unavoidable, is by far the number-one factor in the cause of downtime. How, then, do best-practice IT organizations address the issue of HA and planned change?
1. Invest in Change Infrastructure.
Organizations must invest in infrastructure to support change. On the pure hardware and software side of infrastructure, this means a well-planned and well-managed quality assurance (QA) test environment.
On the people and process side, a change management group should be established, with responsibility to coordinate, serialize, document, communicate, and approve change promotion (e.g., test to QA, QA to production) for all groups effecting change to an application. This group should require a detailed process flow for applying the change and detailed plans to fall back or recover (including expected time frames). Included should be a listing of users, applications, and jobs potentially affected by the proposed change.
2. Communicate.
Plans should be communicated to all interested parties within a reasonable time before any scheduled change. This enables colleagues to evaluate any related or conflicting changes they may be contemplating, or to request a delay if the impact of the change is ill-timed based on application usage. It is also important that communication take place early and often in the event of failure after the change is applied. This has proven critical in reducing the actual user impact of a failure, as well as end-user frustration.
3. Reduce Change Mass.
The first rule in problem determination is to understand what has changed in the system. Organizations should introduce change in small, related batches if possible. Introducing too much change at one time will inevitably lengthen the application’s mean time to recovery by lengthening problem determination.
4. Work With Vendors.
Managing all the potential vendor software patches (code that fixes reported bugs) and all the different ways vendors package, deliver, and upgrade patches can be intimidating. Organizations should contact vendors or review their Web sites for available fixes on a monthly basis.
5. Protect Personnel.
Organizations must keep the people happy that make the change process work. Personnel should not be overmanaged and should be given the authority and responsibility to do their jobs. Organizations should motivate and reward them for meeting service-level agreements by paying bonuses and giving additional pay for on-call duties. Organizations should keep personnel educated and current by sending them to a related conference or course once a year.
Business Impact:Application availability greatly affects an organization’s bottom line, brand equity, and valuation. Budget dollars should be spent on the people that plan and execute change processes in the high-availability infrastructure.
Bottom Line:Planned downtime is about managing change and accounts for as much as 90 percent of all application downtime. To manage change, organizations must recognize this and focus high-availability planning resources on addressing the issue of people and process.
Charlie Garry is a consultant for META Group, an IT consulting firm based in Stamford, Conn.
Ethics and Artificial Intelligence: Driving Greater Equality
FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning
FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.