When we look at how important IT systems have become to organizations and
society as a whole, we need to factor in resiliency when designing them.
Resiliency pertains to the system’s ability to return to its original
state after encountering trouble. In other words, if a risk event knocks
a system offline, a highly resilient system will return back to work and
function as planned as soon as possible.
While many may take this process for granted, not all systems recover
cleanly. Sometimes IT staff isn’t even involved. But other times a lot of
staff members are involved and they’re dealing with a great deal of
stress getting that system back online. And whether all the data is there
or not is another story altogether.
From now on, we cannot afford fragile systems or systems that require an
unmanageable amount of time and effort to recover. We must take
resiliency into account.
How many systems do you have that will come back online if the power is
cut and the UPS runs to the point of exhaustion, causing a hard crash?
Stand-alone PCs and network devices are usually pretty good about coming
back. However, as the level of complexity and interdependency increases,
simply coming back online after a hard crash may mean corrupted storage,
split clusters and the failure of dependent services.
That means you shouldn’t bet on highly complex systems simply coming back
up after whatever negative event you experience — be it hardware or
software failure or some form of security incident.
From an organizational perspective, if power is lost at a plant for two
days, can it recover? If a key service is lost because a database becomes
corrupt, can the business recover? Organizations that can bounce back are
resilient and the ones that can’t may have some troubled times.
Making your system resilient takes a lot of planning.
To build resilient systems, you need a holistic mentality. Prioritize
every foreseeable risk and then determine not only how to reduce the risk
in the first place, but determine how to minimize its impact on the
system and the organization. Those are two different issues.
Granted, recovery controls, also known as corrective controls, are
risk mitigating controls. However, we need to make sure that teams
managing systems take into account not just controls that reduce the
probability of a risk event, but also reduce the impact of the event.
They must plan for failure, not optimism.
Resiliency directly targets minimizing the impact by bringing people,
processes and technology either back to their original state or a
modified state until the risk has been reasonably addressed.
Any system has three dimensions that must be considered — people,
processes and technology. To build in resiliency, all must be taken into
account because if one of them fails, then the likelihood of poor
resiliency and overall system failure increases.
In addressing the ‘people’ dimension, there must be identified backups
and cross-training to ensure that if anyone is sick, on vacation or
incapacitated, there is another person, if not entire other teams, who
can fill in. For example, if a data center is damaged due to a natural
disaster and the staff there is trying to address their legitimate family
crises, is there another group that can do the work from another site and
take some of the pressure off?
When addressing process issues, IT administrators must spend some time
assessing.
Are the current processes so rigid that they break under any variation?
Or are there logical emergency processes that can be triggered in the
event of a problem?
Technology is interesting. If you have the right people and the right
processes supporting it, then magic happens. If either element is weak,
the technology is standing on a bad foundation.
With that said, resiliency can apply directly to the technology. To
illustrate, some systems are very sensitive to power or temperature
fluctuations. If there is a high risk that neither of those elements can
be reasonable safeguarded, then you have a fragile system — one that
will be prone to break during regular operations, let alone in an
emergency.
Be sure to consider environmental and other risks when evaluating systems
and subsystems. Be sure to factor in resiliency during the evaluation.
What if the power spikes? What if there is a brown out? What if the room
temperature goes to 100 degrees Fahrenheit for 48 hours? How will any of
these risks affect the system’s ability to recover following?
Again, the key is to identify risks, prioritize them, and then figure out
how to mitigate the most likely ones in an effective and cost-efficient
manner.
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs
FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.