Mr. Loshin has produced a hands-on, practical work on data quality improvement and management. If you need some theory but a nuts and bolts focus with a framework laid out for you, this is the book for you. You can do a thorough assessment of the quality of your data across several dimensions, and develop a roadmap for making a program of specific improvements.
Transactional data is distinguished here from informational data, especially from what quality means. They differ in nature and in function. Informational data is often mishandled by trying to apply the same standards and principles as you would for transactional data. Because transactional data is primary and in full view of your business operations, it can overshadow informational data. While quality is necessary and vital for transactional data, it is not sufficient for an optimally profitable process. You may be in a line of business where the race is won on the competitive advantage gotten from complex, accurate and flexible informational data. Quality flaws here are not always obvious up front.
I use this book to:
- develop action plans at all levels from assessments to strategy to hard dollar reporting.
- teach myself, my staff and my colleagues (each needing different education)
On the technical level, this book is certainly not the last word; but it is detailed enough and thorough enough to get real work done while organizing your research and development efforts to plan an enterprise level program of data quality improvement both transactional and informational. I have been able to define a stream of benefits at each stage of the program.
The vulnerabilities I can address using this book are related to regulatory non-compliance, audit failures for data integrity in reporting and contractual liabilities. On the other hand, I use the data profiling techniques to improve revenue streams as allowed by contracts and related instruments. In particular, contracts that provide for incentives (and penalties), premiums, bonuses and performance schedules. Data profiling, parsing and standardization are the big efforts to achieving profit results especially when you are dealing with heterogeneous data sets, for example transactional portfolios from different customers or partners where you need to establish audit worthy claims to monies due you. We leave a lot of money on the table that we either overlook or miscalculate or do not claim in accordance with agreed terms and conditions. All interested parties need to have confidence in the reliability and accuracy of data.
Informationally, profit and performance align by themselves to create optimal processes and systems. Data quality tools are used to engineer that alignment. Of course, the really nasty side of improving data quality is the organizational. The graves are full of visionaries, pioneers, prophets or whatever buzz word you like or despise. Here I mean the doers, and not the happy babble types that float in and out of our rather more difficult working lives. So Mr. Loshin here has a good list of topics in the organizational area, so that you can see and understand what are the obstacles and any possible paths for progress. Maturity, readiness, will, governance, policy and leadership are all pre-requisites for making quality improvements that can be measured in hard dollar pay-offs. He has working templates and descriptions for charting all the components of organizational maturity.. He has all the worksheets and exhibits you will need to determine and to document your assessments and recommendations. I followed his roadmap without making a big political (confrontational) deal about getting agreement, one player at a time until the executive leadership decided what geniuses they were for coming up with it.
I would have been satisfied enough with just the first five chapters, just to get me launched. But the real meat is in the metrics. You will see the standard old warhorse of statistical process control. I have really come to like the granularity of daily performance at the shop floor level because you can find the weak and the broken events as they happen, especially as you start to deploy process changes. People really respond when you are able to show them within a day of launch if they are still doing x or not yet doing y. By the way, all the mathematics here are early high school level, so you can expect to roll them out to all your analysts, leads and coordinators with just a little presentation, review and coaching.
In highly federated organizations, your first steps will present themselves out of common ground, dependencies and other basic interfaces. Communities of interest and best practices follow especially if you can establish a single showcase.
Mr. Loshin lays out the dimensions of data quality, each requiring a method of measurement. He says that each line of business can pick these up on a dashboard that can eventually roll up to an enterprise view. Furthermore, he wants these dimensions to create a hierarchy of categories that lay out according to a pyramidal framework where the broad base is constructed from simple measurements and many rules, upward toward the enterprise view of few rules and complex metrics. This undertaking is ambitious, far beyond the statistical process control and continuous process improvement of the industrial beginnings.
His dimensions are categorized between the intrinsic and the contextual. Both are problematic. Contextual dimensions are by definition formed in relation to one another, tested for consistency and coherence. There is no way of knowing if you have done a complete job at any point in time. Anyway all these systems and process exist in a dynamic world anyway, so you are never truly done, but you get the point. But the idea is that if they make relational sense, you are likely on the right track.
We are still far from some grand unification theory of business process and data quality. Indeed, just when you get to the critical detail of this schema, you hit a patch like this:
Yet although information policies (such as those governing security or privacy) are a major source of data quality assertions, they imply the need for data governance, which is covered in chapter 7. - [i.e. the previous chapter]
Huh? If you read it really fast you can pretend that it means something. But just try to parse it. In a tough and complicated exposition where you most need clarity, he goes all muddy. This is no time for guessing, which is precisely what you are left to do. In a moment of a Godel-like nightmare, Mr. Loshin makes completeness and coherence two of the contextual dimensions. He does say consistency rather than coherence, but to little comfort. Talk about asking for trouble, especially under the scrutiny of those who have not yet bought into this program...
Yet his failure to attain the elegance of mathematical theory or the rigor of applied science does not invalidate his approach. Seeing is believing, which is his ultimate aim. The intrinsic dimensions are more straightforward and serve a bit as guideposts in otherwise uncharted territory. Again it is his tables that provide you with handholds to his framework. Chances are you will not be able to farm out a lot of this work to your people, especially at the beginning. You must make the high level customization to your organization, your business context. Then others can carry it forward into their own work and departments. A core team of architects, analysts, process mappers, project coordinators, data stewards and report producers will require a material operating expense just to get started, about a six month period, and about eighteen to be really firing on all cylinders. You will be integrated into the familiar world of budgeting and forecasting, capacity planning, performance measurement, and project/data governance.
His treatment of a Data Requirements Analysis Process shows you how to transform the way you manage data and processes. The BIG however, is that it presumes a level of project and development life cycle management far advanced from what is usually existing in operational business units.
A further complication is that informational data by nature is a highly derived and constructed. Because strategic foresight and organizational cooperation (let alone external cooperation) are rare in our customary approach to work, whether business or government, these derivations and constructions are so much more divergent at their origins. You can build cooperation into your methodologies and governance in order to make data standards a priority. The best thing you can start with is a proper assessment of the state of standardization (and data quality, for that matter) and decide on ways to move toward your desired point of arrival. A common fallacy, stemming from shortsightedness, is to choose highly proprietary solutions, confusing obscurity with innovation and competitive advantage. When leadership imagines that it is way out ahead of the pack, it is actually only temporarily so, because you often motivate the seemingly disadvantaged to cooperate against the outlier. Think Beta Max. By contrast, when contemplating the critical employment of meta data, Mr. Loshin does so via ISO 11179.
I cannot resist pointing out that, as he is illustrating the problem of data quality, he gives as an example all the different ways of representing California. He includes "06" as the number in alphabetical order. The table he references meanwhile clearly shows California in position five. I almost think he is doing this on purpose to prove his point in a self-validating example. That or he just wants to se if I am paying attention.
By the way, for those who are old cranks when it comes to grammar, this book may be the ultimate in showing why perfect grammar can be worth millions, just by making data representation and communication clear. Grammar is especially effective in defeating that old bugaboo - ambiguity.
Mr. Loshin covers both remediation (correction and correction, to name two types) as well as future planning and design. He gives a substantial discussion of Service Level Agreements. Finally, I know of no better place to go for help in beginning data profiling and parsing; he gives each a chapter. He shows how it all comes together in a nice master data management schema. Thusly, he returns to the top of the pyramid, the Enterprise view, well connected, powerful and clearly represented. This is the only business book I am likely to wear out before I am done with it.
Please vote so if you found this review helpful. Thank you.