"The CUDA Handbook" is the largest(480p) and latest( June 2013 ) of NVIDIA's series of GPU programming books. It is also the most comprehensive and useful GPU programming reference for programmers to date. It's a tough world out there for programmers who are trying to keep up with changes in technology and this reference makes the future a much more comfortable place to live. Learn about GPGPU programming and get ahead of the crowd.
For those programmers who haven't had the time to perceive the changes, GPU programming is a current change in programming design that is sweeping the world of network VOIP management, parallel analysis and simulation, and even supercomputing in a single box. I have personally run a Starfield Simulation on a portable with an i7 processor that increased in speed 112 times by using the internal NVIDIA GeForce 570M. The Starfield frame time reduced from about 2 seconds to about .015 sec. Imagine what I could do with a GeForce 690! Charts indicate that it might exceed 700 times the computing speed!!This book not only tells me how to arrange the software to work with the NVIDIA SDK, but it also shows me the important differences in the architecture of many of the NVIDIA cards to obtain optimum performance.
The world of computing is still filled with 32 bit machines( or OS sysstems ) using most of their memory to get their assigned tasks completed. Many of these machines do not have even four core CPUs, forget having over 4GB of memory. They fill computers in production devices, desktops in database support companies, and the racks of IT departments everywhere. The need for faster and more computing does not slow down or stop for these hardware limits. Ant the cost to replace them outright is prohibitive. Now, a demand to manage 5000 computer domains arrives or a messaging demand for 1500 VOIP channels to be mixed in a hundred groups is brought on board or a control simulation to manage six robotic arms in an assembly line needs to be run. Without clustering a dozen to one hundred other computers to manage the computing load, the only practical solution is to employ one or two GPUs. Projects that ignore this message are destined to fail and along with that comes damaged careers and lost jobs.
The solution to avoiding the trap of limited legacy hardware is to use GPUs to take up the load and stop overloading the limited memory and CPU cores to do the increased workload. Each GPU can add 2300 streaming multiprocessors to perform the work. And each GPU cards can add 4GB of high speed memory to the limited program memory on the motherboard, which may only be 2GB.
The book introduces the GPU architecture, GPU device memory usage and loading, and Kernel processor code design. Once you have mastered the terminology and run some of the examples, you will be able to start developing code for specific solutions. The first chapters introduce you to NVIDIA GPU devices. The meat of the book starts in Chapter 5 with proper memory handling procedures. Chapter 7 expands the material on blocks, threads, warps, and lanes will straighten out the terminology and get you headed into constructive code to manage the upcoming design.
If your task goes beyond the capabilities of a single GPU, Chapter 9 introduces multiple GPU programming management. The choice of one of the later client motherboards provides up two four PCIE sockets with the potential of holding four GPUs. That kind of super-computing ability for about $500 a GPU can meet even a gamer's budget. Be aware though that added complexity requires added design refinement. Routines need to be optimized, and Chapter 11 will help you reduce memory usage and Chapter 12 will help you increase the efficiency of Warp usage.
Three more chapters involve reductions for routines used in specialized applications that may become of interest to you and are also helpful in further mastering the concepts needed to master GPU computing.
Personally, I have a financial program that exceeded my i7 CPU capability for prediction using neural networking because it took more than all night to determine ranking for 400,000 stocks. And I thought that the one hour download time off the internet was onerous. Now I have an affordable solution that won't require me to build a shed out in the backyard to hold all the computers that would normally be required to add this feature to my design. All I have to pay for is a bigger power supply and a single GPU card. Happy computing!