It seems like just about every day brings with it a new cloud storage product announcement from vendors big and small, but the reality is that beyond enterprise firewalls, cloud storage’s potential is limited.
There are two reasons for this: bandwidth limitations and the data integrity issues posed by the commodity drives that are typically used in cloud services. Together those two issues will limit what enterprise data storage users can do with external clouds.
The ideal for cloud storage is to be self-managed, self-balanced and self-replicated, with regular data checksums to account for the undetectable or mis-corrected error rates of various storage technologies. Cloud storage depends on being able have multiples copies of files managed and checksummed and verified regularly, distributed across the storage cloud for safekeeping.
It’s a great idea, but it faces more than a few challenges, such as reliability, security, data integrity, power, and replication time and costs. But one of the biggest issues is simply that hardware is going to break. There are two ways disk and tape drives break:
The most common type of failure is known as the vendor’s bit error rate. The bit error rate is the expected failures per number of bits moved. The following is generally what is published by vendors:
|
These seem like good values, but it is important to note that they haven’t improved much in the last 10 years, maybe by an order of magnitude, while densities have soared and performance has increased moderately. This will begin to cause problems as the gaps get worse in the future (see RAID’s Days May Be Numbered). So using vendors’ best-case number, how many errors will we see from moving data around, which is needed for replication in clouds?
|
Clearly, moving 100PB on even 1TB enterprise drives can potentially cause significant loss of data, especially as many clouds I am familiar with do not use RAID and maintain data protection via mirroring. Remember, this is a perfect world and does not include channel failures, memory corruptions and all the other types of hardware failures and silent corruption. What happens if the world is not perfect and failure rates are an order of magnitude worse?
|
With current technology, you could lose 900TB of data, which is not trivial and would take some time to replicate.
Now let’s look at the time to replication with various Internet connection speeds and data volumes.
|
Clearly, no one has an OC-768 connection, nor are they going to get one anytime soon, and very few have 100PB of data to replicate into a cloud, but the point is that data densities are growing faster than network speeds. There are already people talking about 100PB archives, but they don’t talk about OC-384 networks. It would take 10 months to replicate 100PB with OC-384 in the event of a disaster, and who can afford OC-384? That’s why, at least for the biggest enterprise storage environments, a centralized disaster recovery site that you can move operations to until everything is restored will be a requirement for the foreseeable future.
The bandwidth problem isn’t limited to enterprises. In the next 12 to 24 months, most of us will have 10Gbit/sec network connections at work (see Falling 10GbE Prices Spell Doom for Fibre Channel), while at home the fastest connect available as the current backbone of the Internet is OC-768, and each of us internally is going to have a connection that is 6.5 percent of OC-768. That will be limited, of course, by our DSL and cable connections, but their performance is going to grow and use up the backbone bandwidth. This is pretty shocking when you consider how much data we create and how long it takes to move it around. I have a home Internet backup service and about 1TB of data at home. It took me about three months to get all of the data copied off site via my cable connection, which was the bottleneck. If I had a crash before the off-site copy was created, I would have lost data.
Efforts like Internet2 may help ease the problem, but I worry that we are creating data faster than we can move it around. The issue becomes critical when data is lost — through human error, natural disaster or something more sinister — and must be re-replicated. You’d have all this data in a cloud, with two copies in two different places, and if one copy goes poof for whatever reason, you’d need to restore it from the other copy or copies, and as you can see that is going to take a very long time. During that time, you might only have one copy, and given hard error rates, you’re at risk. You could have more than two copies — and that’s probably a good idea with mission-critical data — but it starts getting very costly.
Google, Yahoo and other search engines for the most part use the cloud method for their data, but what about all the archival sites that already have 10, 20, 40 or more petabytes of storage data that is not used very often? There are already a lot of these sites, whether it is medical data, medical images or genetic data, or sites that have large images such as climate sites or the seismic data that is used for oil and gas exploration, and, of course, all the Sarbanes-Oxley data that is required to be kept in corporate America. Does it make sense to have all of this data online? Probably not, and the size of the data and the cost of power will likely be the overriding issues.
Henry Newman, CTO of Instrumental Inc. and a regular Enterprise Storage Forum contributor, is an industry consultant with 28 years experience in high-performance computing and storage.
Article courtesy of Enterprise Storage Forum.
Ethics and Artificial Intelligence: Driving Greater Equality
FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning
FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.