Mastering Data Warehouse Design: Relational and Dimensional Techniques (Computer Science) Paperback – 22 Aug 2003
|New from||Used from|
- Choose from over 13,000 locations across the UK
- Prime members get unlimited deliveries at no additional cost
- Find your preferred location and add it to your address book
- Dispatch to this address when you check out
Customers Who Bought This Item Also Bought
Enter your mobile number below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
Getting the download link through email is temporarily not available. Please check back later.
To get the free app, enter your mobile phone number.
From the Back Cover
At last, a balanced approach to data warehousing that leverages the techniques pioneered by Ralph Kimball and Bill Inmon
Since its groundbreaking inception, the approach to understanding data warehousing has been split into two mindsets: Ralph Kimball, who pioneered the use of dimensional modeling techniques for building the data warehouse, and Bill Inmon, who introduced the Corporate Information Factory and leads those who believe in using relational modeling techniques for the data warehouse. Mastering Data Warehouse Design successfully merges Inmon’s data ware– house design philosophies with Kimball’s data mart design philosophies to provide you with a compelling and complete overview of exactly what is involved in designing and building a sustainable and extensible data warehouse.
Most data warehouse managers, designers, and developers are familiar with the open letter written by Ralph Kimball in 2001 to the data warehouse community in which he challenged those in the Inmon camp to answer some tough questions about the effectiveness of the relational approach. Cowritten by one of the best–known experts of the Inmon approach, Claudia Imhoff, this team of authors addresses head–on the challenging questions raised by Kimball in his letter and offers a how–to guide on the appropriate use of both relational and dimensional modeling in a comprehensive business intelligence environment. In addition, you’ll learn the authors’ take on issues such as:
- Which approach has been found most successful in data warehouse environments at companies spanning virtually all major industrial sectors
- The pros and cons of relational vs. dimensional modeling techniques so developers can decide on the best approach for their projects
- Why the architecture should include a data warehouse built on relational data modeling concepts
- The construction and utilization of keys, the historical nature of the data warehouse, hierarchies, and transactional data
- Technical issues needed to ensure that the data warehouse design meets appropriate performance expectations
- Relational modeling techniques for ensuring optimum data warehouse performance and handling changes to data over time
About the Author
CLAUDIA IMHOFF (CImhoff@Intelsols.com) is President and Founder of Intelligent Solutions, a leading consultancy on analytic CRM and BI technologies and strategies. She is a popular speaker, an internationally recognized expert, and coauthor of five books.
NICHOLAS GALEMMO (firstname.lastname@example.org) was Information Architect at Nestlé USA. He has twenty–seven years experience as a practitioner and consultant involved in all aspects of application systems design and development. He is currently an independent consultant.
JONATHAN G. GEIGER (JGeiger@IntelSols.com) is Executive Vice President at Intelligent Solutions, Inc. In his thirty years as a practitioner and consultant, he has managed or performed work in virtually every aspect of information management.
Top Customer Reviews
Most Helpful Customer Reviews on Amazon.com (beta)
Both Inmon and Imhoff on the other hand are rather self-aggrandizing (Inmon once waltzed into one of his keynote speeches dressed like a boxer to the theme from Rocky!), and both Inmon and Imhoff seem to have based their careers around bashing Kimball. In their desperation to present an alternative to Kimball's methodology and carve out their own niche, they've presented mostly incoherent, illogical and unusable ideas sometimes laced with anti-Kimball baggage. I get the feeling Inmon is kind of like James Martin was back in the 80's, churning out countless cookie-cutter style books of dubious quality.
I've designed a number of dimensional data warehouses and data marts that actually work years later using the Kimball approach, but honestly, every book I've read by Inmon and/or Imhoff has left me wondering who in the world actually uses their approach (if you can call it that) to build real-world data warehouses.
If you want to have a complete library and money is no object, by all means, read everyone's ideas on data warehousing and compare and contrast for yourself (I did - I must own fifty books on the subject - but I rely on only about 5-6 books in my day to day work as a DW architect - the rest are just taking up shelf space and reminding me how nice it is to be able to read reviews at places like Amazon before you buy). If money is an object and/or you are just starting out in the field and trying to learn the basics of DW design, do yourself a big favor and get the three excellent Kimball books (The Data Warehouse Toolkit, The Data Warehouse Lifecycle Toolkit and The Data Warehouse ETL Toolkit). The Adamson/Venerable book: Data Warehouse Design Solutions is a very useful adjunct for additional examples of real-world dimensional designs.
*The back cover says it "addresses head-on" the issues from Ralph's famous letter. I'm familiar with that letter. Either I skimmed over a couple pages too fast - and those pages had some "answer" buried in them, or, they did not really, fully, address many of the issues Ralph wrote about.
*I kept getting confused - some times the book acted like it loved a synergy and partnership between the normalized and the dimensional approaches. Other it seemed to slam the dimensional approach as not working in many areas. In particular, I was shocked at the paragraph in the center of page 386. I've had no problem, using what may appear to be unrelated star schema data, in doing significant analysis and data mining.
*The paragraph on page 394, under "Flexibility", says I can't do sophisticated or advanced analytics from my star schemas. I have. What am I (or, the authors) missing?
*Chapter 6 - Modeling the Calendar... I feel for anyone new to this arena trying to decipher the information. I have no problems with my date or time dimensions and I can explain them to my students in a lot less time than it took me to read that chapter!
*Chapter 7 - Modeling Hierarchies... Seemed a little long. I should not comment on it - when I finished reading it, I realized I had been sleeping through most of it.
*I found the chart on page 100 a little scary - do they really mix the facts in a fact table? The chart shows sales and sales objectives in the same fact table. Is this just a "logical" star? Or, is their basic understanding of the dimensional model in need of an upgrade?
*Not enough real world "how-to" examples.
*Again, either I skimmed a few pages, or, they refer to "we'll address this in a later chapter" a few times and never did.
*I don't know the authors - did not have any pre-conceived opinions about them. Now, I felt like, as a team, they did not always agree on what to write, so they compromised - picked middle ground and sent inconsistent messages. I finished the book with a very unclear picture of what message they were sending.
*Too much extraneous data in many of the examples - tough to weed out the needed from the excess...
*I won't argue with the overall concept of a staging area/data warehouse/data mart philosophy. I do take exception to the inference that I cannot be successful if I don't follow it. I've implemented using that model and variations of the approach, as well as taking real-time transactional data directly into a star.
Final thought. In my experience, anyone taking this book as an "absolute" will spend more time on I/T "stuff" than the users I know will want to put up with. Fifty-some years ago, Aritotle Onasis said the secret of business is in knowing something that no one else knows. That is no longer a reality. This is called the "information age" for a reason. The winners are those that realize they know no more than their competition, but do more, faster, with what they have. In my version of the "real world", executives want results, NOW. I did not feel the authors ever had to deal with "urgency".
But that begs the question. Many a CIF or enterprise-wide project has been launched... yet most are cancelled long before reaching the finish line. This is reality. In the REAL world we have REAL deadlines and REAL budgets imposed by REAL business executives who have REAL problems to solve and it involves... oh by the way... REAL MONEY!
We have to deliver NOW! Well, ok, maybe not quite that fast, but you get the idea. The hard part is getting the data! Or is it? Using simple tools and a powerfully designed, highly detailed dimensional database, we have, for example, clients pulling their own data sets ready for import into statistical and mining packages. They think they have died and gone to heaven!
Foist a third normal form (3NF) design on them and their eyes roll... "Now, which of the available join paths is the right one for this business question?" and "Why is it taking so long for the query?" and "Will you pull the data for me?" Now we hear... "Instead of spending 80% or 90% of my time getting the data prepared, I spend 5% or 10% of my time doing that... so I have that much more time to actually think about the business." We have seen clients' ability to understand and drive their business expand beyond their own wildest imagination in very short order. It shows on their bottom line and they are very happy with that!
The whole point of BI - beyond all the data capture and cleaning and integrating and turning "data into knowledge", and making it easy for the user without dumbing it down, and all that stuff - the point of BI can be distilled down to one word: "Publish!" Booksellers don't hand you a photocopy of a handwritten manuscript. They do a lot of work with the "raw data" - typesetting and page numbers and table of contents and indexing and so on - and turn it into something accessible and useable... something we call a book. That's the point of BI. This book doesn't get it.
Too many CIF or "enterprise" projects have imploded under their own weight to slavishly duplicate the same mistakes. Too many dimensional systems have succeeded with huge return on investment to relegate the ideas to a dark corner.
If we stop the religious discussions (Mac vs. Windows, or the "Inmonites vs the Kimballites") and get to see how truly successful Business Intelligence (BI) systems work, we find the emphasis must be on using proper theory (not arguing it) and applying techniques that work NOW. More often than not, can you say "Dimensional!" Yes, CIF and all that has its place... but not nearly to the degree that this book would have you believe. The most successful clients have been the ones who bypassed all the "modeling wars" and used the data bus architecture of conformed dimensions. They didn't pick and chose a modeling idea or two; they actually studied Kimball and did it the right way. Dr. Codd, while addressing this question one day, asked me this question: "Would you run an OLTP system against a dimensional model?" My obvious answer was: "Of course not." "Why then," he asked, "do so many people try to do the opposite?"
The biggest "problem" with the dimensional approach is that people who do not truly understand it try to pick and chose techniques from it and graft those into their current ways... and fail... and bash it. Or, they don't understand it at all. Uh, sorry, it isn't the technique that is the problem.
The book purports to "answer" a message reply that Ralph Kimball posted on a discussion board some time ago. It does not. One can be certain that Ralph Kimball did not give permission to use his name on or in the book, as is done. Instead, the book does a very poor job of showing how to design and use dimensionally designed databases as a part of a larger architecture, illustrates a complete lack of understanding of the underlying principles, and then criticizes and limits the technique and its application. This does a terrible disservice to the reader... especially a reader who is trying to decide how to meet a real business need and is new to BI. I dislike speaking impolitely like this, but the truth is more important in this context. Also, on the back cover, they state that Ralph Kimball's "letter" was a challenge. It was not. It was merely a listing of many of the crucial issues in a useful BI environment addressed to an individual who had asked legitimate questions about BI. As for addressing these issues "head-on", the book does not do this at all.
Does this matter?
Of course it does. Real people buy this book and are led down a path that rarely leads to success. I realize that much of this review is not directly about specific details of the book. The details in the book are inconsistent, often unfocused, and sometimes downright misleading. The larger issue, and thus the focus of this review, is that the entire book is based on a premise that the CIF is "The Way" and that dependent dimensional data marts are grudgingly "ok". This is not the reality that many of us see in the business and education worlds.
The main reason I bought a copy of this book, even before it arrived in bookstores, was that I was leading a team to figure out how to merge Inmon and Kimball views for data modelling standards.
We had already developed a DW architecture using Inmon's approach, with its associated relational/ERD method, but believed that it lacked rigour in the area of data marts. We also reviewed Kimball's books, and acknowledged the strengths of his dimensional modelling approaches, but were concerned that it lacked rigour for the diversity of analytical requirements in the manufacturing environment, e.g. data exploration/mining on a massive scale. We were struggling to figure out to combine the best of both - and then we discovered the imminent release of "Mastering Data Warehouse Design". After checking the Table of Contents on the publisher's web site, we had the book couriered directly from the publishers warehouse because it would not be available in local bookstores fast enough to meet our work schedule.
Chapter 1 has an impressive 'sound bite' version of Inmon's DW architecture thinking, but extended to include broader Business Intelligence concepts. Chapter 2 does a commendable job of explaining a tiered approach to data models, e.g. subject area model, business model, Operational system model, DW model. At first, this chapter was confusing because we had just finished a rigourous definition of data modelling standards, using more conventional terminology, e.g. logical/entity model, physical/table model. So the book's terminology didn't seem to fit in with our thinking. But after re-reading it, we realized that it added value in forcing us to look at the whole issue of modelling from a deliverables or outcomes perspective, rather than a modelling process perspective.
Chapter 4 discusses how to develop a DW data model. The content outlines the sequence or steps involved in developing a DW data model, and it's rare that I've been able to find as good coverage of the topic as I found in this chapter. Chapters 5 - 11 cover topics like keys, modelling time/hierarchies/transactions, with some solid content on how to model for on-going business change and how to maintain the tiered models. However, I'm not fully conversant with some of these topics, so am not in a good position to evaluate their content.
Chapter 12 has a very good discussion on how to deal with a proliferation of legacy data marts, and strategies for migrating to a central DW that feeds a variety of data marts. It also introduces Chapter 13 which has a classic discussion on comparing the relational and dimensional modelling approaches - including the best discussion I've ever seen on the strengths and weaknesses of each approach. While our team didn't buy into all this chapter's points, the clear logical explanation of strengths and weaknesses helped facilitate a consensus agreement among two groups aligned with the Inmon/relational and Kimball/dimensional approaches. The consensus solution, mostly based on Chapter 13's content, would have been difficult to achieve without this book, i.e. chapter 13's content alone was worth much more than the price of the book.
So if you're struggling with the merits of the Inmon and Kimball architecture/modelling approaches, this book is a valuable resource to help take advantage of the best of both.
Look for similar items by category
- Books > Computing & Internet > Computer Science > Information Systems
- Books > Computing & Internet > Databases > Data Storage & Management > Data Warehousing
- Books > Computing & Internet > Databases > Data Storage & Management > Database Management Systems
- Books > Computing & Internet > Digital Lifestyle > Online Shopping > Amazon
- Books > Computing & Internet > Hardware
- Books > Computing & Internet > Software & Graphics