The concept of machine learning has been around for decades, primarily in academia. Along the way it has taken various forms and adopted various terminologies, including pattern recognition, artificial intelligence, knowledge management, computational statistics, etc.
Regardless of terminology, machine learning enables computers to learn on their own without being explicitly programmed for specific tasks. Through the use of algorithms, computers are able to read sample input data, build models and make predictions and decisions based on new data. This concept is particularly powerful when the set of input data is highly variable and static programming instructions cannot handle such scenarios.
In recent years, the proliferation of digital information through social media, the Internet of Things (IoT) and e-commerce, combined with accessibility to economical compute power, has enabled machine learning to move into the mainstream. Machine learning is now commonly used across various industries including finance, retail, healthcare and automotive. Inefficient tasks once performed using human input or static programs have now been replaced by machine learning algorithms.
Here are a few examples:
Prior to the use of machine learning, fraud detection involved following a set of complex rules as well as following a checklist of risk factors to detect potential security threats. But with the growth in the volume of transactions and the number of security threats, this method of fraud detection did not scale. The finance industry is now using machine learning to identify unusual activity and anomalies and reporting those to the security teams. PayPal is also using machine learning to compare millions of transactions to identify fraudulent and money laundering activity.
Without machine learning, recommendations on product purchases and which movies to watch were mainly by word of mouth. Companies like Amazon and Netflix changed that by adopting machine learning to make recommendation to their customers based on data they had collected from other similar users. Using machine learning to recommend movies and products is now fairly common. Intelligent machine learning algorithms analyze your profile and activity against the millions of other users they have in their database and recommend products that you are likely to buy or movies that you may be interested in watching.
For all its increased popularity and use, machine learning still hasn’t yet made its way into any part of data protection, and that is being acutely felt in big data. Specifically, backup and recovery for NoSQL databases (Cassandra, Couchbase, etc.), Hadoop, and emerging data warehouse technologies (HPE Vertica, Impala, Tez, etc.) is a very manual process with a lot of human interaction and input. It is quite a paradox that these big data platforms are used for machine learning while the underlying data protection processes supporting these platforms rely on human intervention and input.
For example, an organization may have a defined recovery point objective (RPO) and recovery time objective (RTO) for a big data application. Based on those objectives, an IT or DevOps engineer determines the schedule and frequency for backing up application data. If the RPO is 24 hours, the engineer may decide to perform backups once per day starting at 11:00 p.m.
While this logically makes sense, the answer is not as simple as that, especially in a big data environment. the big data environments are often very dynamic and unpredictable. These systems may be unusually busy at 11:00 p.m., loading new data or running nightly reports and making that time least optimal for scheduling a backup.
Why can’t the data protection application recommend the best time to schedule a backup task to meet the recovery point objective?
Another common example of inefficiency in data protection relates to storing backup data. Typically, techniques such as compression and de-duplication are applied to backup data to reduce the backup storage footprint. The algorithms used for these techniques are static and follow the same mechanism independent of the type of data being dealt with. Given that big data platforms use many different compressed and uncompressed file formats (Record Columnar (RC), Optimized Row Columnar (ORC), Parquet, Avro, etc.), a static algorithm for deduplication and compression does not yield the best results.
Why can’t the data management application learn and adopt the best deduplication and compression techniques for each of the file formats?
Machine learning certainly could aid in optimizing a company’s data protection processes for big data. All pertinent data needs to be collected and analyzed dynamically using machine learning algorithms. Only then will we be able to do efficient, machine-driven data protection for big data. The question is not if but when!
By Jay Desai, VP, product management, Talena, Inc.
Photo courtesy of Shutterstock.
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs
FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.