Datamining poised to go mainstream |
With e-commerce and CRM propelling the market forward and Microsoft on the bandwagon, datamining has finally arrived. |
October 1999 |
In this article: |
|
|
|
Who’s who in datamining | |
Techniques used in datamining | |
Datamining: How it’s done |
It used to be that datamining was limited to high-end database marketing firms and Global 100 firms–the kind whose online transaction processing (OLTP) systems generated millions of rows of data daily. There’s always been an aura of mystery, even magic, associated with datamining. It was a science practiced on powerful UNIX systems overseen by unsmiling statisticians and brilliant mathematicians.
Today that’s changing. Many Web sites are generating log files and e-commerce transaction files that are eminently mineable. Last month, for instance, online retail giant Amazon.com made headlines with its “purchase circles,” based on the fundamental datamining technique of affinity grouping (clustering). When retail sites suggest specific items to customers based on their past purchases, the sites are using a combination of customer relationship management (CRM) and datamining to increase their revenues.
CIO David Meany, Just for Feet Inc. |
Datamining is part of a process called knowledge discovery, where the goal is to better understand the organization’s data in order to resolve business problems or capitalize on opportunities.
Sizing things up
Consider retail shoe vendor Just for Feet Inc. (www.feet.com) of Birmingham, Ala. The company has approximately 160 superstores, in addition to 170 Athletic Attic, Athletic Lady, and Imperial Sports stores. Each store carries from 3,000 to 6,000 different shoe styles. Multiply the styles by all the different sizes, and you’ll start to appreciate what the shoe industry refers to as the “size explosion.” And what better way to take advantage of all that data than with a data warehouse/datamining initiative?
|
Each Just for Feet store functions as its own distribution center. With the “in” styles changing so fast, and with regions–even neighborhoods–having different hot styles, it’s not hard to realize how important it is for Just for Feet to have the right kind of shoes in stock at the right location. As a result, it made sense for the company to focus its initial datamining efforts on product rather than customer data. “You can be item-centric or customer-centric,” says David Meany, CIO, referring to alternative approaches to designing and mining Just for Feet’s terabyte-scale data warehouse. But you can’t do both at once.
Datamining purists might say that when Just for Feet generates exception reports for its buyers, that’s not genuine datamining. But the company’s buyers are thrilled with these weekly and monthly reports on sales that allow them to spend more time on the more creative aspects of their jobs–predicting fashion trends and future demand. Meany explains that Just for Feet also does “real” datamining to find answers to issues. For example, the company analyzes distribution practices to see how they impact product sell-through.
The first two phases of the company’s multiphase data warehousing/datamining initiative are now in production, built with the help of ICL Plc (www.icl.com), a global IT services company based in London. Just for Feet used ICL’s Fast Track Development Toolkit to generate the schema for an Informix Corp. Dynamic Server release 8.0 database and perform the initial data population. Currently, Meany only keeps about a year’s worth of transaction-level data in Just for Feet’s data warehouse, which is stored in a Sun Microsystems Inc. Enterprise E6500 server. The system maintains aggregate data for 1997 and 1998.
Although the first stages of Just for Feet’s implementation have been inventory-focused, plans are already underway to expand the company’s analysis capabilities and better leverage the customer component of the data warehouse. Keeping up with the “in” styles is only part of the lure of customer data. Consumers can join the Just for Feet club, with the enticement of special savings. Membership is easy, all you have to do is enter a telephone number and the system does a reverse lookup to determine the address. Is Meany looking forward to mining all of this customer data? You’d better believe it.
|
And then there are companies like Fingerhut Companies Inc. (fingerhut.com), the $2 billion firm known for its catalog, direct marketing, and telemarketing ventures, that have spent years honing the process of datamining. The Minnetonka, Minn.-based company’s marketing analytics group maintains several hundred generic models that are used to build targeted segmentation models that generate mailing lists for catalogs.
Typically, the datamining team combines four models: a response model (will the customer respond?), a purchase model (how much will the customer buy?), a return model (is the customer likely to return merchandise?), and a payment model (is the customer a credit risk?). The company maintains data (almost 1,400 variables per customer) on more than 30 million customer households in a data warehouse that tops 7 terabytes.
The players, new and old
Although datamining isn’t new technology, it has only recently emerged from academia, research labs, and several dozen vendors. The availability of data warehouses and cheap storage have certainly contributed to the trend, but today’s keen interest in datamining is largely driven by the explosive growth of e-commerce. Sales and marketing departments want to leverage the data gleaned from Web traffic patterns to do one-to-one marketing.
If the prospect of mining customer data to increase revenues, reduce risk, or detect fraud isn’t enough to propel datamining into the mainstream, there’s always the Microsoft factor. Microsoft Corp. ventured into datamining when the Redmond, Wash., software maker announced work on the OLE DB Extensions for Data Mining specification in May 1999. The project is a joint effort between the Microsoft SQL Server group and Microsoft Research’s Data Mining & Exploration group led by Usama Fayyad in consultation with a select group of vendors (see “Who’s who in datamining”). OLE DB is a specification for a set of data access interfaces designed to enable access to heterogeneous data sources. It’s considered the successor of open database connectivity (ODBC) and has already been “extended” for online analytic processing (OLAP) and a variety of vertical markets.
|
The Microsoft OLE DB for DM endeavor will likely spawn compliant datamining products sometime in 2000. But that doesn’t mean you can’t do datamining against SQL Server (or any other database) today. In fact, Microsoft’s Site Server 3.0 already includes features such as an intelligent “cross-sell” based on historical sales baskets in stores, the contents of the current shopper basket, and the browsing behavior of the shopper. Site Server ranks products that are likely to be most interesting to the shopper.
|
Microsoft isn’t the only firm with interdependent products. IBM Corp.’s SurfAid Analytics (surfaid.dfw.ibm.com) relies on the company’s own Intelligent Miner for Data to deliver sophisticated Web site analytics for a fixed monthly fee that ranges from under $1,000 to about $30,000. SurfAid is a small, entrepreneurial e-business within IBM Global Services, which is based in Somers, N.Y. Clients upload daily Web log files to the SurfAid FTP site. RS/6000 AIX scripts handle preprocessing, which includes “stitching back together” navigation paths of individual Web visitors. Then, one of SurfAid’s RS/6000s runs the IBM Intelligent Miner datamining tool kit against the customer file, which may contain over 150 million hits per day. The result is a daily report that customers can access at a private URL. Because IBM DB2 for OLAP is running behind the scenes, users can “slice and dice” the data starting with almost a dozen different reports.
IBM, by the way, shipped its first datamining tool kit in 1995. Today, the company’s Intelligent Miner for Data and Intelligent Miner for Text are used by customers with large DB2 databases. IBM has also developed a graphical query language, query by image content (QBIC), which lets users make queries of large image databases based on visual image content–properties such as color percentages, color layout, and textures occurring in the images. It is used with Digital Library to do graphical datamining.
Shortly after Microsoft parted the curtains on its datamining spec, Oracle Corp. announced its purchase of leading datamining vendor Thinking Machines Corp. and its Darwin product family. The Redwood City, Calif.-based company hasn’t made any announcements about how Darwin will be integrated into its product line. Although Oracle already has its own text mining product called Oracle ConText, it’s likely that the company will weave Darwin into its marketing campaign and Oracle Applications product line. In another significant move toward consolidation, SPSS Inc. (www.spss.com) acquired Integral Solutions Ltd. (ISL) and its popular Clementine product.
Darwin and Clementine are two of six datamining tools suites that Stamford, Conn.-based Gartner Group, in an August 1999 report on datamining, identified as key players in the generic datamining market. The other four are Angoss’ Knowledge Suite, IBM’s Intelligent Miner for Data, SAS’s EnterpriseMiner, and SGI’s MineSet.
In the audio mining field, speech vendors such as Dragon Systems (http://dragonsystems.com) and Virage Inc. (http://www.virage.com) are working with all the major database vendors–including IBM–to support the technique, which is scheduled to be available later this year. Audio mining might be used to monitor call center traffic, customer service calls, or company voice mail (privacy issues aside) looking for anything from profanity to recurring customer service complaints to suspected industrial espionage.
E-commerce, CRM, and data warehousing will all help propel the datamining market forward. Standards such as extensible markup language (XML), the predictive modeling markup language (PMML), the cross-industry standard process for datamining (CRISP-DM), as well as Microsoft’s OLE DB for DM, will help, too. The evolving technology combined with such success stories as Just for Feet and Fingerhut will certainly drive the market into the mainstream. //
Karen Watterson is an independent San Diego-based consultant who specializes in database and data warehouse design. She’s an editor of industry newsletters (www.pinpub.com) and has just completed a book on SQL Server, “10 Projects you can do with Microsoft SQL Server.” She can be reached at Karen_Watterson@email.msn.com.
|
Ethics and Artificial Intelligence: Driving Greater Equality
FEATURE | By James Maguire,
December 16, 2020
AI vs. Machine Learning vs. Deep Learning
FEATURE | By Cynthia Harvey,
December 11, 2020
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2021
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
DATAMATION DAILY NEWSLETTER
SUBSCRIBE TO OUR IT MANAGEMENT NEWSLETTER
An eWEEK Property
Copyright 2021 TechnologyAdvice All Rights Reserved.
Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.