This article is based on an upcoming book, Principles of Database Management: The Practical Guide to Storing, Managing and Analyzing Big and Small Data.
Data management entails the proper management of data as well as the corresponding data definitions or metadata. It aims at ensuring that (meta-) data is of good quality and thus a key resource for effective and efficient managerial decision making. Data quality (DQ) is often defined as ‘fitness for use,’ which implies the relative nature of the concept.
Data that is of acceptable quality in one decision context may be perceived to be of poor quality in another decision context, even by the same business user. For instance, the extent to which data is required to be complete for accounting tasks may not be required for analytical sales prediction tasks.
Data quality determines the intrinsic value of the data to the business. Information technology only serves as a magnifier for this intrinsic value. Hence, high-quality data combined with effective technology is a great asset, but poor quality data combined with effective technology is an equally great liability. This is sometimes also referred to as the GIGO, or Garbage In, Garbage Out principle, stating that bad data results into bad decisions, even with the best technology available.
Decisions made based on useless data have cost companies billions of dollars. A popular example of this is the address of a customer. It is estimated that approximately 10% of customers change their address on a yearly basis. Obsolete customer addresses can have substantial consequences for mail order companies, package delivery providers or government services.
Poor DQ impacts organizations in many ways. At the operational level, it has an impact on customer satisfaction, increases operational expenses, and will lead to lowered employee job satisfaction. Similarly, at the strategic level, it affects the quality of the decision-making process. The magnitude of DQ problems is continuously being exacerbated by the exponential increase in the size of databases. This certainly qualifies data quality management as one of the most important business challenges in today’s data based economy.
Organizations are hiring various data management related job profiles to ensure high data quality and transforming data into actual business value. In what follows, we review the information architect, database designer, data owner, data steward, database administrator and data scientist. Depending upon the size of the database and the company, multiple profiles may be merged into one job description.
The information architect (also called information analyst) is responsible for designing the conceptual data model, preferably in dialogue with the business users. He/she bridges the gap between the business processes and the IT environment and closely collaborates with the database designer who may assist in choosing the type of conceptual data model (e.g. EER or UML) and the database modeling tool. A good conceptual data model is a key requirement for storing high quality data in terms of data accuracy and data completeness.
The database designer translates the conceptual data model into a logical and internal data model. He/she also assists the application developers in defining the views of the external data model as such contributing to data security. To facilitate future maintenance of the database applications, the database designer should define company-wide uniform naming conventions when creating the various data models which enforces data consistency.
Every data field in every database in the organization should be owned by a data owner, who is in the authority to ultimately decide on the access to, and usage of, the data. The data owner could be the original producer of the data, one of its consumers, or a third party. The data owner should be able to fill in or update its value which implies that the data owner has knowledge about the meaning of the field and has access to the current correct value (e.g. by contacting a customer, by looking into a file, etc.). Data owners can be requested by data stewards (see below) to check or complete the value of a field, as such correcting a data quality issue.
Data stewards are the DQ experts in charge of ensuring the quality of both the actual business data and the corresponding metadata. They assess DQ by performing extensive and regular data quality checks. These checks involve, amongst other evaluation steps, the application or calculation of data quality indicators and metrics for the most relevant DQ dimensions.
Clearly, they are also in charge of taking initiative and to further act upon the results of these assessments. A first type of action to be taken is the application of corrective measures. However, data stewards are not in charge of correcting data themselves, as this is typically the responsibility of the data owner. The second type of action to be taken upon the results of the data quality assessment involves a deeper investigation into the root causes of the data quality issues that were detected.
Understanding these causes may allow designing preventive measures that aim at eradicating data quality problems. Preventive measures may include modifications to the operational information systems where the data originate from (e.g., making fields mandatory, providing drop-down lists of possible values, rationalizing the interface, etc.).
Also, values entered in the system may immediately be checked for validity against predefined integrity rules and the user may be requested to correct the data if these rules are violated. For instance, a corporate tax portal may require employees to be identified based upon their social security number, which can be checked in real-time by contacting the social security number database. Implementing such preventive measures obviously requires the close involvement of the IT department in charge of the application.
Overall, preventing erroneous data from entering the system is often more cost-efficient than correcting errors afterward. However, care should be taken not to slow down critical processes because of non-essential data quality issues in the input data.
The database administrator (DBA) is responsible for the implementation and monitoring of the database. Example activities include: installing and upgrading the DBMS software, backup and recovery management, performance tuning and monitoring, memory management, replication management, security and authorization, etc. A DBA closely collaborates with network and system managers.
He/she also interacts with database designers to reduce operational management costs and guarantee agreed upon service levels (e.g. response times and throughput rates). The DBA can contribute to data availability and accessibility, two other key data quality dimensions.
Data scientist is a relatively new job profile within the context of data management. He/she is responsible for analyzing data using state-of-the-art analytical techniques to provide new insights into e.g. customer behavior. A data scientist has a multidisciplinary profile combining ICT skills (e.g., programming) with quantitative modeling (e.g., statistics), business understanding, communication, and creativity.
A good data scientist should possess sound programming skills in such languages as Java, R, Python, SAS, etc. The programming language itself is not that important, as long as the data scientist is familiar with the basic concepts of programming and knows how to use these to automate repetitive tasks or perform specific routines.
Obviously, a data scientist should have a thorough background in statistics, machine learning and/or quantitative modeling. Essentially, data science is a technical exercise. There is often a huge gap between the analytical models and business users. To bridge this gap, communication and visualization facilities are key. A data scientist should know how to represent analytical models, accompanying statistics and reports in user-friendly ways by using traffic-light approaches, OLAP (on-line analytical processing) facilities, If-then business rules, etc.
A data scientist needs creativity on at least two levels. On a technical level it is important to be creative with regard to data selection, data transformation and cleaning. The steps of the standard analytical process must be adapted to each specific application and the “right guess” could often make a big difference. Second, analytics is a fast-evolving field.
New problems, technologies and corresponding challenges pop up on an ongoing basis. It is important that a data scientist keep up with these new evolutions and technologies and has enough creativity to see how they can yield new business opportunities. It is no surprise that these data scientist are hard to find in today’s job market. However, data scientists contribute to the generation of new data and/or insights, which could leverage new strategic business opportunities.
To conclude, ensuring high quality data is multidisciplinary exercise combining various skills. In this article we reviewed the following data management job profiles from a data quality perspective: information architect, database designer, data owner, data steward, database administrator and data scientist.
About the authors:
Wilfried Lemahieu is a professor at KU Leuven, Faculty of Economics and Business, where he also holds the position of Dean.
Bart Baesens is a professor of Big Data and Analytics at KU Leuven (Belgium) and a lecturer at the University of Southampton (United Kingdom).
Seppe vanden Broucke works as an assistant professor at the Faculty of Economics and Business, KU Leuven, Belgium.
Huawei’s AI Update: Things Are Moving Faster Than We Think
FEATURE | By Rob Enderle,
December 04, 2020
Keeping Machine Learning Algorithms Honest in the ‘Ethics-First’ Era
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 18, 2020
Key Trends in Chatbots and RPA
FEATURE | By Guest Author,
November 10, 2020
FEATURE | By Samuel Greengard,
November 05, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
November 02, 2020
How Intel’s Work With Autonomous Cars Could Redefine General Purpose AI
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 29, 2020
Dell Technologies World: Weaving Together Human And Machine Interaction For AI And Robotics
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
October 23, 2020
The Super Moderator, or How IBM Project Debater Could Save Social Media
FEATURE | By Rob Enderle,
October 16, 2020
FEATURE | By Cynthia Harvey,
October 07, 2020
ARTIFICIAL INTELLIGENCE | By Guest Author,
October 05, 2020
CIOs Discuss the Promise of AI and Data Science
FEATURE | By Guest Author,
September 25, 2020
Microsoft Is Building An AI Product That Could Predict The Future
FEATURE | By Rob Enderle,
September 25, 2020
Top 10 Machine Learning Companies 2020
FEATURE | By Cynthia Harvey,
September 22, 2020
NVIDIA and ARM: Massively Changing The AI Landscape
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
September 18, 2020
Continuous Intelligence: Expert Discussion [Video and Podcast]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 14, 2020
Artificial Intelligence: Governance and Ethics [Video]
ARTIFICIAL INTELLIGENCE | By James Maguire,
September 13, 2020
IBM Watson At The US Open: Showcasing The Power Of A Mature Enterprise-Class AI
FEATURE | By Rob Enderle,
September 11, 2020
Artificial Intelligence: Perception vs. Reality
FEATURE | By James Maguire,
September 09, 2020
Anticipating The Coming Wave Of AI Enhanced PCs
FEATURE | By Rob Enderle,
September 05, 2020
The Critical Nature Of IBM’s NLP (Natural Language Processing) Effort
ARTIFICIAL INTELLIGENCE | By Rob Enderle,
August 14, 2020
Datamation is the leading industry resource for B2B data professionals and technology buyers. Datamation's focus is on providing insight into the latest trends and innovation in AI, data security, big data, and more, along with in-depth product recommendations and comparisons. More than 1.7M users gain insight and guidance from Datamation every year.
Advertise with TechnologyAdvice on Datamation and our other data and technology-focused platforms.
Advertise with Us
Property of TechnologyAdvice.
© 2025 TechnologyAdvice. All Rights Reserved
Advertiser Disclosure: Some of the products that appear on this
site are from companies from which TechnologyAdvice receives
compensation. This compensation may impact how and where products
appear on this site including, for example, the order in which
they appear. TechnologyAdvice does not include all companies
or all types of products available in the marketplace.