Dr. Usama Fayyad is a data mining pioneer who began working in the field in 1989. He got his start at NASA’s Jet Propulsion Laboratory, compiling data on astronomical phenomena such as volcanoes, star systems, etc. From there, he went on to work for Microsoft research and then, frustrated by problems he was seeing in the data mining industry, he left Microsoft and started digiMine to deal with the issues of data mining and data warehousing. In this article, he shares his thoughts about the industry and how to get the most out of your data.
“There are two sides to data mining, descriptive and predictive,” says Dr. Fayyad. “Descriptive data mining reorganizes the data, digging deeper into it and pulling out patterns, such as customer similarity, which allows you to create a short description about that group of customers.
“Predictive data mining looks for the best prediction, such as the best product to pitch to a customer. You won’t get much insight, but it increases the performance, the ROI. Using both techniques will give you the best results.
“An important issue today is SQL, the standard interface for databases, which has proven to be the wrong interface,” Fayyad says. “As an example, let’s say you worked for a telecommunications company, and you want to find records about cell phone fraud. Well, guess what? These naturally asked questions cannot be answered by today’s databases, because the interface was designed to address problems where you know the target and you want the database to quickly retrieve the result. If you don’t have an exact description of the target, you’re lost with a database today. This is why data mining is seeing a lot of demand.
“When I started in this field back in 1989, there were many people in large corporations struggling with large data sets. And even though there’s a lot of data out there, it’s not necessarily the right kind. Also, there’s big difference in the ability to store data and the ability to access it in a useful way.
“In response, companies began building data warehouses, which convert transactional database data into a format that allows for more analytically oriented queries to go against it. In theory, it contained all the details for data mining. But in reality, it was a huge challenge. Today, industry analysts are recording an 85% failure rate on data warehouses.
Data Warehousing Woes
“The big question today is: Where’s the data? If there was a data warehouse, most likely it failed or it’s not working. I saw this so consistently that I started digging into it and discovered three problems:
At this point, Fayyad left Microsoft to create digiMine. He says he realized that “you cannot mine if you can’t have access to the data. And you can’t have the right data in the right format unless you ensure that there’s a successful data warehouse. At digiMine, we build and host data warehouses for companies; then we embed data mining on top as a solution.”
Here are some guidelines for collecting data:
Fayyad says digiMine begins from the other end, asking the client what data needs to be mined and how to apply the algorithms. From there, digiMine sets up the data warehouse and the technology to grab the data from a variety of formats. The customer installs their software, which they maintain and run from their data center.
Fees charged depend on the subscription, ranging from $7,000-$10,000/month for a customer who wants pure data mining and not much warehousing, to $30,000-$40,000/month if the customer wants data warehouse hosting, maintenance and enterprise solutions.
Then there is the issue of purchasing software. According to Fayyad, “There are many tools available from companies such as SAS or IBM, but in order to use them properly, you had better be an expert, preferably a Ph.D. in the area of data mining or statistics. If you’re not, you just bought a bunch of shelfware.
“For most users, data mining tools offer the wrong interface. You need data mining solutions. If you have a large staff of experts who know data mining very well, data mining tools will do the job,” he says. “However, this department of experts is now acting as the interface between the tools and the ultimate user.”
If you’re considering purchasing data mining software, Fayyad recommends you look at applications that package the data mining inside the software, or that you purchase a service option.
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs
FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.