Fault-Tolerance Techniques for High-Performance Computing Fault-Tolerance Techniques for High-Performance Computing
Computer Communications and Networks

Fault-Tolerance Techniques for High-Performance Computing

    • US$84.99
    • US$84.99

출판사 설명

This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC).

The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models.

Topics and features:
Includes self-contained contributions from an international selection of preeminent expertsProvides a survey of resilience methods and performance modelsExamines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and predictionReviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interfaceInvestigates different approaches to replication, comparing these to the traditional checkpoint-recovery approachDiscusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption
This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing.

Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Supérieure de Lyon, France, and a Visiting Research Scholar in the ICL.

장르
컴퓨터 및 인터넷
출시일
2015년
7월 1일
언어
EN
영어
길이
329
페이지
출판사
Springer International Publishing
판매자
Springer Nature B.V.
크기
6.4
MB
Dependable Computing Dependable Computing
2007년
Euro-Par 2011 Parallel Processing Euro-Par 2011 Parallel Processing
2011년
Dependable Computer Systems Dependable Computer Systems
2011년
Architecture of Computing Systems Architecture of Computing Systems
2021년
Reliable Software Technologies – Ada-Europe 2017 Reliable Software Technologies – Ada-Europe 2017
2017년
Service Availability Service Availability
2008년
Asynchronous Many-Task Systems and Applications Asynchronous Many-Task Systems and Applications
2025년
Recent Advances in Parallel Virtual Machine and Message Passing Interface Recent Advances in Parallel Virtual Machine and Message Passing Interface
2007년
Software-Defined Cloud Centers Software-Defined Cloud Centers
2018년
Guide to Computer Network Security Guide to Computer Network Security
2017년
Communication Networks for Smart Grids Communication Networks for Smart Grids
2014년
Guide to Voice and Video over IP Guide to Voice and Video over IP
2013년
Cloud Computing Cloud Computing
2013년
802.11 Wireless Networks 802.11 Wireless Networks
2010년