With any new hot trend comes a truckload of missteps, bad ideas and outright failures. I should probably create a template for this sort of article, one in which I could pull out a term like “cloud” or “BYOD” and simply plug in “social media” or “Big Data.”
When the trend in question either falls by the wayside or passes into the mainstream, it seems like we all forget the lessons faster than PR firms create new buzzwords.
Of course, vendors within trendy news spaces also tend to think they’re in uncharted waters. But in fact there’s actually plenty of history available to learn from. Cloud concepts have been around at least since the 1960s (check out Douglas Parkhill’s 1966 book, The Challenge of the Computer Utility, if you don’t believe me), but plenty of cloud startups ignored history in favor of buzz.
And it’s not like gaining insights from piles of data is some new thing that was previously as rare as detecting neutrinos from deep space.
Here are five history lessons we should have already learned, but seem to be doomed to keep repeating:
It wasn’t that long ago that every time a cloud project or company failed, some tech prognosticator would sift through the tea leaves and claim that the cloud concept itself was dead.
The same thing is happening with Big Data. According to a recent survey, 55 percent of Big Data projects are never even completed. It’s hard to achieve success if you don’t even finish what you started, yet many mistakenly believe that this means Big Data is bunk.
Not true. Plenty of companies are reaping the rewards of Big Data, analyzing piles of data to improve everything from marketing and sales to fraud detection.
“It reminds me of the Moneyball craze during the early 2000’s, when Major League Baseball teams started to figure out that statistics could be used to build a winning ball club, rather than relying on a scout’s stopwatch and gut,” noted Matt Fates, a partner with Ascent Venture Partners. “There was initial backlash against the ‘stat geeks,’ but today every team has an advanced statistics department that helps general managers make better decisions. This was bringing data, and insights, to bear on decisions in a way that turned conventional wisdom on its head. It was not ‘big data’, but it led to big changes. It never would have started had one GM not been open-minded about statistics. His success forced others to follow.”
Of course, some of the confusion stems from how indiscriminately the term Big Data is thrown around, since most of us don’t need Big Data per se, but rather just data analytics, which leads us to the second history lesson everyone is failing to recall:
People mean many different things when they use terms such as “cloud” and “Big Data.” Are you talking about virtualized infrastructures when you say cloud? Private clouds? AWS? Similarly, Big Data can refer to existing pools of data, data analytics, machine learning, and on and on.
The Big Mistake with the term Big Data is that many use the term to mask vague objectives, fuzzy strategies and ill-defined goals.
Often when people use these terms loosely it’s because they not only don’t really know what the heck the terms mean in general, but they also don’t know what they mean to their particular business problems. As a result, vendors are asked for proposals that are a poor fit for an organization’s cloud or Big Data challenges.
If your CEO or CIO orders you to start investigating Big Data, your first question needs to be the most basic one: Why, specifically?
If you can’t answer that question concisely, you’re in trouble.
If you’re the person tasked with building out a Big Data architecture, then it’s fine to focus on details that won’t matter to anyone who isn’t a data scientist.
If you’re a business user or non-data scientist, it’s best to just ignore all this noise. It’ll sort itself out soon enough. I’ve seen this phenomena repeat with everything from CDNs to storage to cloud computing and now Big Data. Engineers and product developers often fall prey to “if we build it, they will come” syndrome, ignoring the real-world pain points of potential customers in favor of hyping their technical chops.
When they fail to find real-world customers for the resulting products, they then set their sights on technical minutiae, since it couldn’t possibly be a flawed go-to-market strategy that was the problem in the first place.
Take the recent news that Facebook is making its query analysis software, Presto, open source. Is this a win for Hadoop or for SQL? Does it mark the end of Hive?
Who cares?
Okay, if you’re reading this, you’re probably an early adopter or you’ve already placed some Big Data bets, so it matters to you. But for the rest of the world, it’s not even on their radar – nor should it be.
Ryan Betts, CTO of VoltDB, a NewSQL database vendor, does care, but even he, as deeply engrossed in the minute details as anyone, recognizes that the real point of Big Data is far less granular: “Data is only valuable when you can interact with it. Data you can’t interact with? That’s just overhead. Access and interactivity need to come first.”
For every rule, there’s the exception that proves the rule, and here’s one: SQL vs. NoSQL is a fight that will have real-world ramifications. NoSQL startups have been getting a lot of attention lately, with the likes of Cloudera, 10Gen and Datameer raising significant VC funding.
However, tech giants seem to be betting against NoSQL. “As SQL relational systems first came to market, many years ago, they competed with navigation and document oriented solutions. SQL won,” Betts pointed out. “The expressiveness and the flexibility to interact with data is why SQL matters. SQL is fast. SQL scales (witness Impala, BigQuery and Facebook’s announcement today). SQL matters to the marketplace – ask any ODBC-compliant BI vendor. To date first Google, then Apache Impala, and now Facebook have announced SQL interfaces to their large volume data stores. It’s nice to see ‘NoSQL’ learning the lessons of 30 years.”
Of course, the NoSQL camp has its own arguments for why their approach is better, but the smart money looks like it’s heading in the opposite direction – for now.
I recently attended a panel on Big Data where one of the panelists made some sarcastic comments about Big Data not being real, since it’s typically either capitalized or in quotes – or both.
I get the joke (although it’s not a terribly funny one), but many people take these jokes seriously.
Not too long ago, there was plenty of cloud skepticism, even from people who should have known better (such as Larry Ellison, until he saw the light and hastily directed Oracle to play cloud catch-up). Now, I hear plenty of Big Data skepticism, most of which is either stems from ignorance or an urge to protect the status quo.
Granted, some of the skepticism is well-earned, since vendors in hot spaces tend to hype the crap out of even something as trivial as a UI upgrade, but Big Data is here to stay, and it’s making an impact already.
Recently, Cryptolocker, which is arguably the most effective and sophisticated piece of ransomware released to date, was kept in check through Big Data analytics.
Cryptolocker’s creators built a Domain Generation Algorithm that produces thousands of different rendezvous domains for the malware to try until it finally finds a command-and-control server. This tactic helps the malware evade static blacklists and reputation systems. Upon infecting a device, Cryptolocker must establish a connection with a command and control server to obtain an infection-specific encryption key, which is used to help the attacker receive the ransom payment later.
Using traditional detection and prevention methods, “it took about 30 days for security vendors to capture malware samples and reverse engineer them to come up with a way to contain it,” said Dan Hubbard, CTO of cyber-security service provider OpenDNS.
OpenDNS took a different approach to fend off Cryptolocker. Using Big Data analytics and predictive algorithms, OpenDNS’ Umbrella security service was able to block Cryptolocker from day one of the outbreak. The services identifies the patterns used by Cryptolocker’s Domain Generation Algorithm and predicts the malicious sites it tries to connect with. Since the OpenDNS Umbrella service monitors inbound and outbound Internet traffic, it can block outbound Cryptolocker traffic and prevent machines that are infected from having their data encrypted.
“With Big Data-powered predictive security we are able to cut off the head of Cryptolocker and then pinpoint infected machines for disinfection,” Hubbard said.
That’s one security lesson that will, hopefully, not be lost to history.
Jeff Vance is a regular contributor to many high-tech and business-focused publications, including Forbes.com, Wired, Network World, CIO, Datamation and many others. Connect with him on LinkedIn (jeffvanceatsandstorm), follow him on Twitter @JWVance, or add him to your circles on Google Plus (+jeffvance).
Photo courtesy of Shutterstock.
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs
FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.