Iaso: an autonomous fault-tolerant management system for supercomputers
Iaso: an autonomous fault-tolerant management system for supercomputers
About this item
Full title
Author / Creator
LU, Kai , WANG, Xiaoping , LI, Gen , WANG, Ruibo , CHI, Wanqing , LIU, Yongpeng , TANG, Hongwei , FENG, Hua and GAO, Yinghui
Publisher
Heidelberg: Higher Education Press
Journal title
Language
English
Formats
Publication information
Publisher
Heidelberg: Higher Education Press
Subjects
More information
Scope and Contents
Contents
With the increase of system scale, the inherent reliability of supercomputers becomes lower and lower. The cost of fault handling and task recovery increases so rapidly that the reliability issue will soon harm the usability of supercomputers. This issue is referred to as the "reliability wall", which is regarded as a critical problem for current a...
Alternative Titles
Full title
Iaso: an autonomous fault-tolerant management system for supercomputers
Authors, Artists and Contributors
Identifiers
Primary Identifiers
Record Identifier
TN_cdi_proquest_journals_2918717999
Permalink
https://devfeature-collection.sl.nsw.gov.au/record/TN_cdi_proquest_journals_2918717999
Other Identifiers
ISSN
2095-2228
E-ISSN
2095-2236
DOI
10.1007/s11704-014-3503-1