Product Description
A comprehensive guide to understanding the standard and most recent advances in the design of reliable computer systems. It is organized into three sections, beginning with an in-depth review of existing reliability techniques and evaluation criteria for both hardware and software. also examined are the models for detecting faults and predicting failures, and the financial considerations which are inherent in the design, purchase, operation, and maintenance of a reliable system. Part two of the book analyzes case studies of systems designed to meet the challenges of fault tolerance. the studies are grouped into four application areas: general purpose computing; high-availability systems; long-life systems; and critical computations. The final section provides a methodology for the design of a reliable system based on an integration of the essential knowledge presented in part one and the experiences gained in the case studies of part two.