论文部分内容阅读
随着数字图书馆中信息量的急剧膨胀,一个WebServer已经很难满足众多用户并发查询的要求。文章用机群作WebServer,它由多个处理节点构成,多个节点协同工作,既增加了系统吞吐量,又减轻了每个节点的负担。因此,如何在多个处理节点上分布数据就成为数字图书馆研究的关键问题。该文提出了一种新的数据分布方法,既考虑了图书馆中数据的相似性,又考虑了多个处理节点的工作负载。该方法使一个查询在尽量少的处理节点上完成,减少了网络传输时间。在多个并发用户查询时,查询的效率远远高于传统的数据分布方法。该方法已经用于笔者自行研制的支持数字图书馆的并行文本数据管理系统PDoc中。
With the rapid expansion of the amount of information in the digital library, a WebServer has been difficult to meet the requirements of many users concurrent queries. The article uses a cluster as a WebServer. It consists of multiple processing nodes. Multiple nodes work together to increase system throughput and reduce the burden on each node. Therefore, how to distribute data on multiple processing nodes has become a key issue for digital library research. In this paper, a new method of data distribution is proposed, which not only considers the similarity of data in the library but also considers the workload of multiple processing nodes. This method allows a query to be completed on as few processing nodes as possible, reducing the network transmission time. In multiple concurrent user queries, the query efficiency is much higher than the traditional data distribution methods. This method has been used in PDoc, a parallel text data management system that supports the digital library developed by the author.