SQL Tuning Paperback – 29 Nov 2003
|New from||Used from|
- Choose from over 13,000 locations across the UK
- Prime members get unlimited deliveries at no additional cost
- Find your preferred location and add it to your address book
- Dispatch to this address when you check out
Customers Who Bought This Item Also Bought
Enter your mobile number below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.
Getting the download link through email is temporarily not available. Please check back later.
To get the free app, enter your mobile phone number.
About the Author
Dan Tow is an independent consultant, operating under the banner SingingSQL (www.singingsql.com). His experience solving Oracle-related performance problems goes all the way back to his 1989 hire by Oracle Corporation. During most of his tenure at Oracle, Dan focused on the performance of Oracle Applications, managing the performance group for that division. In this role, he found a fertile testing ground for his SQL tuning method applied to the huge set of complex SQL included in those applications, including both online SQL and diverse batch processes. In 1998, Dan left Oracle to lead performance for TenFold Corporation, where he applied the same methods to tuning questions on DB2, and SQL Server, and Sybase, as well as on Oracle. In 2002, Dan started his own business, SingingSQL, through which he offers diverse database-related tuning services, including SQL tuning and systematically analyzing load to learn which SQL should be tuned. He has introduced his SQL tuning method to over 1,000 people in short lectures, and now offers in-depth courses in the material, using this book as textbook. Dan has a Ph.D. in chemical engineering from the University of Wisconsin at Madison. He lives in Palo Alto, California, and can be reached at firstname.lastname@example.org.
Excerpt. © Reprinted by permission. All rights reserved.
CHAPTER 1 Introduction
Well begun is half done.
Politics, Bk. V, Ch. 4
This book is for readers who already know SQL and have an opportunity to tune SQL or the database where the SQL executes. It includes specific techniques for tuning on Oracle, Microsoft SQL Server, and IBM DB2. However, the main problem of SQL tuning is finding the optimum path to the data. (The path to the data is known as the execution plan.) This optimum path is virtually independent of the database vendor, and most of this book covers a vendor-independent solution to that problem.
The least interesting, easiest parts of the SQL tuning problem are vendor-specific techniques for viewing and controlling execution plans. For completeness, this book covers these parts of SQL tuning as well, for Oracle, Microsoft SQL Server, and IBM DB2. Even on other databases, though (and on the original databases, as new releases bring change), the vendor-independent core of this book will still apply. As such, this book is fairly universal and timeless, as computer science texts go. I have used the method at the core of this book for 10 years, on four different vendors databases, and I expect it to apply for at least another 10 years. You can always use your own vendors current documentation (usually available online) to review the comparatively simple, release-dependent, vendor-specific techniques for viewing and controlling execution plans.
Why Tune SQL?
Lets begin with a basic question: should someone tune the SQL in an application, and is that someone you? Since you are reading this book, your answer is at least moderately inclined to the positive side. Since it took me several years to appreciate just how positive my own answer to this question should be, though, this chapter lays my own viewpoint on the table as an example.
Lets describe your application, sight-unseen, from an admittedly datacentric point of view: it exists to allow human beings or possibly another application to see, and possibly to enter and manipulate, in a more or less massaged form, data that your organization stores in a relational database. On the output data, it performs manipulations like addition, multiplication, counting, averaging, sorting, and formatting, operations such as those you would expect to see in a business spreadsheet. It does not solve differential equations or do any other operations in which you might perform billions of calculations even on a compact set of inputs. The work the application must do after it gets data out of the database, or before it puts data into the database, is modest by modern computing standards, because the data volumes handled outside of the database are modest, and the outside-the-database calculation load per datapoint is modest.
Online applications and applications that produce reports for human consumption should produce data volumes fit for human consumption, which are paltry for a computer to handle. Middleware, moving data from one system to another without human intervention, can handle higher data volumes, but even middleware usually performs some sort of aggregation function, reducing data volumes to comparatively
Even if the vast number of end users leads to high calculation loads outside the database, you can generally throw hardware at the application load (the load outside the database, that is), hanging as many application servers as necessary off the single central database. (This costs money, but I assume that a system to support, say, 50,000 simultaneous end users is supported by a substantial budget.)
On the other hand the database behind a business application often examines millions of rows in the database just to return the few rows that satisfy an application query, and this inefficiency can completely dominate the overall system load and performance. Furthermore, while you might easily add application servers, it is usually much harder to put multiple database servers to work on the same consistent set of business data for the same application, so throughput limits on the database server are much more critical. It is imperative to make your system fit your business volumes, not the other way around
Apart from these theoretical considerations, my own experience in over 13 years of performance and tuning, is that the databasemore specifically, the SQL from the applicationis the best place to look for performance and throughput improvements.
Improvements to SQL performance tend to be the safest changes you can make to an application, least likely to break the application somewhere else, and they help both performance and throughput, with no hardware cost or minimal cost at worst (in the case of added indexes, which require disk space). I hope that by the end of this book you will also be persuaded that the labor cost of tuning SQL is minimal, given expertise in the method this book describes. The benefit-to-cost ratio is so high that all significan database-based applications should have their high-load SQL tuned.
What Other Items Do Customers Buy After Viewing This Item?
Most Helpful Customer Reviews on Amazon.com (beta)
This book is refreshing. It doesn't waste time going over all of the stuff you learned years ago.
Knowing how to read an execution plan or when to pick a hash join over a nested loop join is not what this book is about. There are plenty of books on the market that cover basic, vendor specific, query tuning. I personally have about 20 of these books on my bookshelf here at home. (Over the years I've worked on Sybase, SQL Server, Informix XPS, & Oracle.)
Here's the deal...
Anyone who has worked with really big systems will eventually run into an optimization problem that seems to be unsolvable. You can try histograms, compressed key indexes, partitioning, pre-joined indexes, and materialized views, but you still can't get the performance that's being requested. For a DBA, it can be a very frustrating dilemma. This is especially true when you know from the data volume that you should be able to get there.
The truth is, optimizers can't always get the right solution, even with correct statistics. There are some good technical reasons why this is true, but that's out of scope for my review. In any case, that's where this book comes to the rescue. I feel that it gives you some insight into the optimization problem and tells you how to correct the problems that your optimizer can't figure out.
This book is NOT for use on 95% of your queries. Most optimizers will pick the correct access plan if the DBA does his/her job correctly and collects the appropriate statistics.
In my experience, I get two types of problems that I have trouble getting the optimizer to solve:
#1. Joining together a large number (8-14) of tables. At least 1 or 2 of the tables have over 30G of real data. By `real data', I mean that 30G of data is actually populated.
#2. Making high transaction queries read the fewest amount of buffers in order to get rid of latching problems.
After I read the first few diagramming chapters of this book, I thought I'd give it a try on a problem that was recently solved at work. I was surprised. It worked, and even with my clumsiness with the method, it only took me about 2 hours to get a solution. It took us about 3 days at work. Our trial and error solution was slightly better, than what I came up with using the author's method. However, it was so close that had I used his method, I most certainly would have gone on to another problem. (The author's solution was strange because I would never have solved the join order the way that he did. His solution actually had me pick the largest table, out of 10, to drive the query. I thought this was odd because my test query had some very good filters on smaller tables.)
These were the results per execution:
Optimizer - 100K buffers.
Trial & Error - 1700 buffers.
Book method (basic graphing) - 2000 buffers.
If I had known about this book, I could have solved this problem in a few hours. Instead it took several DBA's, a few day's time to come up with a solution that was only marginally better.
Since this method appeared to work so well for OLTP, I went back and looked at a very large OLAP query that I had worked on last year. I and several developers had spent days trying to get it to run faster. Even with the author's method, I still couldn't make it better, but I was happy to see that the join order picked by the optimizer was almost identical to what I had calculated using the book.
I plan on studying this guide over and over until I've memorized this method. I don't think it will solve every problem, but I think it gives you an edge over using experience alone.
Before you purchase this book...
Keep in mind, that SQL tuning is fundamentally *not a simple problem*, so readers should not buy the book expecting an easy list of simple tips and tricks. (If the problem was easy, the optimizer likely would have got it right in the first place, and you wouldn't be tuning!) The correct solution to the problem is fairly complex, as complex as it *needs* to be.
Also, it's not noted anywhere, but I gathered that the author assumes that the reader will have basic high school Algebra skills, and will have taken a basic statistics course in order to fully understand the reasoning behind the explanations. Although, Algebra and statistical knowledge are probably not necessary to learn the tuning method presented. I state this because I told a good friend about this book, and his first question was, "Does it have a lot of math to go through?" So, if statistics is not your thing, don't worry. The author only presents enough information to give you a level of comfort that his method is based on mathematics. He doesn't require that you know "graph theory" or anything like that.
By the way, our change from 100K buffers per transaction to 1700 buffers, dropped the CPU utilization for the website that was running the OLTP query by about 70%. That was on a 16 CPU, Sun/Solaris box. So proper SQL tuning can save you a lot of $$$.
This book is about tuning SQL queries in a systematic and scientific manner. It is above all about determining the optimum order in which the query engine should access the tables involved. My quibble was that join order is only a part of the whole tuning problem and most of the time the query optimizer got it right anyway. While this is true, I see now that it misses the point.
The fact that the optimizer gets it right so often allows us to take join order for granted. But how do we know the optimizer got it right unless we know what the optimum join order is? That is what this book teaches, a methodology and an elegant system of notation that allows us to determine the optimum join order of the most complex query. As the author points out, the number of possible join orders increases factorially with the number of tables involved. An 8 table join has 40,320 possible join orders. That rules out trial and error for all but the simplest queries.
It turns out that analyzing and diagramming according to Tow's method gives you a deep architectural understanding of the query and the problems that face you. It gives you a plan to which you can apply the tools of the trade, indexes, code optimization, etc.
This is a book that will be on the shelf of serious performance tuning professionals for as long as SQL is the language of data manipulation.
1) learning how to do query diagrams is great for people who have never done them and this book is the best (maybe even the only) book for learning this. In learning to do the diagrams, you will understand the basic logic and basic math that an optimizer must do in order to get you a good plan. Do not worry. It is not a deep math book and there are no complex calculations. Indeed Dan shows you how basic it all really is. This gives you a deeper foundation in optimization and brings you closer to having that INFORMATIONAL CRITICAL MASS which is required to do good tuning.
2) the book is outdated. As I said, I have been doing Oracle since 1984. For two decades, Dan's method of tuning and relying mostly on NESTED LOOP JOIN was the only real game in town. And Dan explains it well. Unfortunately about the time Dan published the book (2003), Oracle was entering a revolution in its design. Most notably, databases were increasing in size dramatically and this made the need for PARTITIONING strategies and PARTITION WISE HASH JOINS very high. Dan's techniques of query diagraming are still useful for understanding a query's needs, but the book is today outdated because of its lack of treatement for PARTITIONING and HASH JOIN techniques. Don't get me wrong, even outdated, this book has great value because it clarifies what you need to tell other people, why certain things happened.
3) Even outdated, I learned one neat join trick which alone is worth the 20 dollars I paid for this book. Page 146 has a gem of a trick that I never considered but which I can see as offering massive benefits in a few special situations which I have faced in the past but could not solve. Thanks Dan. It is even easier to do in today's 10g and 11g databases because you need no hints to make it happen.
This brings me to my last commentary. The reason I say the book is outdated is because Oracle has come so far in the last five years with its optimizer that the trick to getting Oracle to give you the right plan is to feed it good information. If you do this then it will do what Dan shows you in this book (only better than you could) and give you a great if not best plan. Thus you will rarely build a query diagram these days. Instead you will try to figure out where information was not provided to Oracle correctly and fix that. If this turns out not to be the case, then the problem is likely because the SQL is crappy and you need to rewrite it. I would say that given the right information (stats collected right, constraints created for PK,UK,FK, indexes built to support the constraints, datatypes correct for columns, not null defined when data is actually required), Oracle will produce a great or best plan 99% of the time. The other 1% of the time you will 99% of the time be doing SQL rewrites. That leaves 1/100th of 1% of queries to be a real problem. Many times your system won't be that complicated.
Not being an expert in DB2 or Sql Server I cannot say how up-to-date the material remains on these databases but since I doubt their Optimizer Technology is as advanced as Oracle's there may yet be significant value retained in these two spaces. I can say that it is true that Dan's diagramming methods are universal for any database and that reading the book will provide you with that boost towards critical mass I mentioned earlier regardless of what database you use.
If you are an Oracle Developer/DBA and you are new to tuning, or want to see what you know and don't know, you can't go wrong by getting this book. After reading the book, start reading about PARTITIONING and PARTITION WISE HASH JOINS. Understanding what is in this book and understanding the details of PARTITIONING and PARTITION WISE HASH JOINS in various scenarios will put you at the forefront of tuners. Maybe its time for Dan to do an updated version of the book.
One last comment. I don't want this to seem like a negative review. Remember I said I have been doing Oracle since 1984. As "expert" as I am with Oracle and in tuning it (I am a reasonably good tuner for most things), I learned stuff from Dan and his book. Twenty Five years of doing it and this book still taught me things. I am very happy to have purchased this book and you will be too.
The first four chapters of the book cover some of the subjects that are found in other SQL tuning books, and focus on the important points without delving too deeply into database internals. The really "good stuff" is covered in chapters 5 through 10, which includes how to "draw" a query diagram, and the procedures to follow to derive an optimal join order. The problems/examples presented and resolved in these chapters help the reader get a start on understanding the methodology, which can then be applied to actual tuning problems as they are encountered.
So far I have applied this technique to about 20 tuning problems I have encountered on the job using the Oracle RDBMS. I still consider myself a relative novice in using the methodology, but for each of the 20 problems I have been able to significantly reduce elapsed time and resource consumption (gets/reads). Also, these results were achieved in a matter or hours, rather than days. The query diagram has even facilitated finding "missing joins" in some of the queries without having a knowledge of the particular application.
This book is a "must have" for DBA's and developers and I highly recommend it.
I have been running high-volume performance tuning projects for the past four years, using both SQL Server and Oracle. Our products generate their SQL, so we cannot edit it; nevertheless, our job is to make that SQL run faster, normally with hints, indexes, and parameter changes.
I have observed Dan apply the method he describes in the book to help us resolve our performance problems, and it really does work as quickly and effectively as other reviewers describe. My teams and I have also tried tuning SQL ourselves, and _that_ is as difficult and ineffective as you might expect - on more than one occasion we've had to resort to hacks like denormalisation. The difference really is a few minutes versus several days.
After reading the book, I finally understand what Dan was doing that let him tune our SQL so effectively. The method is highly algorithmic - you follow a series of fairly simple steps to draw a graph and out pops a near-optimal query plan. Dan also describes how to influence your database to use the query plan you found with his method.
My team and I haven't had the chance to apply Dan's method yet, but it's very simple and I'm confident we will be able to do so. I plan to make the book required reading for the team before our next project.
My only complaint about the book - and it's quite a minor one - is that it doesn't cover open-source alternatives like mySQL or PostgreSQL. Applying the method to these alternative databases would be fairly simple, but coverage of the basics like identifying the query plan or influencing the optimiser would be nice to have under one cover.
Look for similar items by category
- Books > Computing & Internet > Computer Science > Information Systems
- Books > Computing & Internet > Databases > Applications
- Books > Computing & Internet > Databases > Data Storage & Management > Database Management Systems
- Books > Computing & Internet > Databases > SQL
- Books > Computing & Internet > Digital Lifestyle > Online Shopping > Amazon
- Books > Computing & Internet > Programming > Languages & Tools