论文部分内容阅读
EST是携带有表达基因部分遗传信息的cDNA片段,EST聚类是将来自同一个基因的具有重叠部分的EST整合到单一的类中,是进行后续基因表达数据分析的必要步骤。传统的串行聚类方法的计算复杂度高,对内存要求大,不适于进行大规模聚类计算。本文主要介绍了EST聚类的并行处理方式、软硬件支持环境,适用于大规模EST聚类的并行算法和软件,比较了几种现有软件的算法、计算速度和内存要求等,并讨论了现有大规模聚类算法的优缺点。
EST is a cDNA fragment carrying genetic information that expresses some of the genes. EST clustering, which integrates ESTs with overlapping parts from the same gene into a single class, is a necessary step for subsequent gene expression data analysis. The traditional serial clustering method has high computational complexity and large memory requirements and is not suitable for large-scale clustering calculation. This paper introduces the parallel processing of EST clustering, software and hardware support environment, parallel algorithms and software for large-scale EST clustering, compared several existing software algorithms, computing speed and memory requirements, and discussed The advantages and disadvantages of the existing large-scale clustering algorithm.