A few weeks ago I had a dream that I was the COB, CEO, and CTO of a major storage company, with the opportunity to architect and develop any product I wanted. Basically, I got to be the storage king in this dream, but of course, with being a king comes the responsibilities of your subjects (the company stockholders and employees) and your lineage (ensuring that you are successful in the market so your company has a future).Also, as a king you are periodically required to take over other lands (buy companies), make treaties with others (joint marketing and/or development agreements), or declare war and eliminate the enemy (beat them in the market to render them a non-factor).
As a mere storage consultant, I figured dreams could not get any better than this, and the best part was I remembered the dream in the morning. That next morning, I began thinking about the reality of what’s missing in the market and what requirements are not being met by the major product vendors’ current product offerings.
The old adage “build it and they will come” may well apply to mundane evolutionary products, but what about revolutionary products? What market requirements are not currently being met, and if the market truly is ready for something revolutionary in terms of large storage configurations, what would the product look like and why would customers consider buying it?
What Market Requirements Aren’t Being Met
Again, if I were king, I would first have my marketing requirements people confirm my speculation, but I personally believe there are three very important factors currently missing from the market. First, though, let me define the market.
I like to differentiate between storage and data. Storage for the most part has become a commodity market. RAID, for example, is now sold by dollars per gigabyte ($10-$20 is often quoted), while back in 1996 I remember RAID costs of over $1 per megabyte.
With storage now delivered and marketed as a commodity, what about the critical information you put on the storage — your data? To me, that’s where the real value is. People in general really do not care all that much about storage, but data is a completely different story. I believe that in the future data will become a more important requirement of the storage architecture, and the focus might even change from that of storage architecture to data architecture. (Well, that’s my hope at least, both as a consultant and as storage king in my dream.)
So the bottom line is that as king I want to define the market for my company as data, not with data being storage, but rather how you access, protect, maintain, and migrate what appears as files on the computer systems used. This includes, in most cases, the file system(s) that are used on top of the storage. And while raw devices are sometimes used for databases, for all intents and purposes the database is really managing the raw device the same as a file system manages a raw device, which is why I contend the database is a file system.
With all of this is mind, let’s take a closer look at what requirements are specifically missing from the market today:
Page 2: High Performance and Predictive Scaling
High Performance and Predictive Scaling
Some newer NAS products do scale reasonably well, but you are currently limited to 1 Gbit connections (some new 10 Gbit host cards are out, but even at PCI-X 133, they cannot be used efficiently). Most sites requiring multiple gigabytes of performance solve the high performance problem by using Fibre Channel-attached storage. Given the TCP/IP overhead and NFS, this is not possible with NAS, as even 100 MB/sec from a single host is nearly impossible.
For the most part, file systems do not scale linearly. There are many reasons for this lack of scaling, including:
Each of these areas can be mitigated by tuning the file system and tuning the applications, but what about the RAID? The RAID device is a block device (at least for now) that reads ahead blocks based on sequential addresses and writes behind blocks based on sequential addresses. The RAID and the file system have no communication about the topology of the data that you are using. All the RAID knows are simple block counts.
If the file system does not place data in sequential block order on the RAID, the RAID cannot know how to efficiently operate. The SCSI protocol does not provide a way of passing the data topology to the RAID, so if the data is not read sequentially and allocated sequentially, the RAID operates inefficiently, which means that scaling with the hardware is not really possible.
Even if the addresses are not allocated sequentially, most RAID devices still try and readahead, but this adds overhead, as you are reading data that you will not use, which of course reduces the RAID performance. A new device allocation method will be developed over the next few years that uses objects. This method is now in the process of being standardized. This development should help, but the file system will still need to communicate with the object, and work on that end is far in the future at best.
End-to-End Security
Most local file systems provide standard security such as ACL (access control lists), UNIX groups, and permissions. Some file systems support encryption such as Microsoft NTFS on a file or folder basis, but encryption is very CPU intensive, and key management gets more difficult as we all get older and forget our many passwords more and more often. The issue of end-to-end local file system security has not been efficiently solved from the host to the RAID either. (Please review this article for a closer look at this issue.)
Now, add to this the requirements for multi-level security, or MLS, that many vendors are moving toward for authentication and tracking file access. The U.S. Government has some new requirements in this area that are interesting for both operating system security and encryption, but even with these requirements, true end-to-end security still comes up short.
In addition, as you may have read from past articles, I have been involved with shared file systems for a long time, and security policy between multiple vendors’ operating systems with shared file systems is virtually impossible. Some of the problems in this area are that file systems distributed across heterogeneous operating systems have no common and often no public interface for security, and issues like HBA, switch, RAID, tape, and SAN/WAN encryption have not been adequately addressed either.
Simplified Management
Wouldn’t it be nice to have a tool that:
I’m sure I’m missing a few things, but even all of the above would be the Holy Grail for management. Unfortunately, though, we’re nowhere close to having a tool that does all of this. A number of vendors are working on tools — VERITAS, McDATA, and EMC, just to name a few — that will help somewhat, but we won’t be arriving at the Holy Grail anytime soon, I’m afraid.
Page 4: What This Product Would Solve
What This Product Would Solve
Assuming that the market analysis is valid and that the pain points customers are suffering from are correct enough for them to considering purchasing it, the product I would create would be a SAN/NAS hybrid that combines the best of both worlds and adds significant new features.
Many NAS limitations are based on TCP/IP overhead, and NAS does not allow for centralized control. The only way to centrally control a heterogeneous shared file system is to move most of the functionality to a single unit, as you cannot control an end-to-end security policy from one host in a pool of heterogeneous machines.
So, for the data-centric world I think is coming, the only way to manage the data is to create a single machine with a new DMA-based protocol that looks like NFS in terms of no changes to the user application, but scales more like a locally-attached RAID communicating without TCP/IP. This new protocol would have to support:
Since my data is now centralized, security, replication, data encryption, HSM, backup, and disaster recovery policies can be implemented more easily. Another advantage is that I would be free from having to write and maintain tools for each OS, OS release vendor, etc.
The new box would have a tight coupling between the file system and the reliable storage. I might have RAID 1-like functions for small random access files and RAID 5-like functions for larger, sequentially accessed files. The file system could understand the topology of the file in question and read ahead based on access patterns like reading the file backward, even though the file might not be sequentially allocated. Tight coupling between the cache and the data would improve scaling and reduce latency and costs.
Ah, cost — that’s the key. What would the return on investment (ROI) be for this new data-centric device? Well, that’ where my dream ended. We may never know if this box would work, what the ROI would be, and whether or not people would actually buy it, but I do believe it meets the requirements of the market.
Can it be built? I think it can. Will it be built? I don’t know, but it sure would solve a bunch of problems if done correctly.
Please feel free to send any comments, feedback, and/or suggestions for future articles to Henry Newman.
»
See All Articles by Columnist Henry Newman
Ethics and Artificial Intelligence: Driving Greater Equality
FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning
FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.