论文部分内容阅读
在E级超级计算机发展过程中系统可靠性问题得到人们高度关注,因此,本文介绍了基于故障预测的数据采集框架,重点探讨了E级超级计算机故障预测的数据采集方法。引言随着科学技术的发展,人们对计算机的要求不断提高,随之出现了超级计算机,如:E级超级计算机,其部件多达数十万,为了避免故障出现,实践中常选用检查点技术,但因其保存与恢复开销较大,未能适应实际发展需求。目前,高性能计算容错方
In the development of E-class supercomputers, the reliability of the system has drawn great attention. Therefore, this paper introduces a data acquisition framework based on fault prediction, and focuses on the data acquisition method of E-class supercomputer fault prediction. Introduction With the development of science and technology, people’s requirements for computers are constantly increasing, followed by the emergence of supercomputers, such as: E-class supercomputer, up to hundreds of thousands of its components, in order to avoid failure, checkpoint technology often used in practice, However, due to its large cost of preservation and restoration, it failed to meet the actual development needs. At present, high-performance computing fault-tolerant side