Chandra et al. have put together a readable, helpful introduction to parallel programming with OpenMP. Unlike its major competitor, MPI, OpenMP assumes a shared memory model, in which every processor has the same view of the same address space as every other. At least as a start, this cuts the intellectual load way down. The programmer adds just one concept to the problem, parallelism, without adding buffering, communication networks, and lots of other stuff as well.
After two introductory chapters, the authors introduce OpenMP in three stages: loop parallelism, general parallelism, and synchronization, roughly in order of increasing complexity. The authors present the necessary OpenMP pragmas and APIs at each step, showing how they address the immediate problems. An appendix summarizes the pragmas and APIs, in both their C/C++ and Fortran forms. OO C++ programmers may be dismayed by the amount of attention paid to an un-cool language like Fortran, but need to realize that it's still the lingua franca of performance programming. And, in fairness, the authors spend equal time on C++ idiosyncrasies, such as constructor invocations for variables that are silently replicated in each of the parallel threads.
If you've ever done performance programming, you're groaningly aware that getting the parallelism right is actually the easy part. The tricky parts come in breaking dependencies, in scheduling, in ensuring spatial and temporal locality, and in dealing with cache coherency issues of multiprocessors. The authors give great introductions to all of the basics. This includes a patient description of how caches actually work, since there's a new crop of beginners every day. The authors describe performance analysis tools, but only briefly. The tools differ so much between vendors and between one rev and the next, that any detailed description would be useless to most readers immediately and obsolete for all readers very soon.
This won't turn a beginner into the guru of performance computing. It will, however, establish a working competence in one popular parallelization tool, OpenMP, and in the computing technologies that affect parallel performance.