论文部分内容阅读
采用关系数据库插件能够实现化合物描述符(FingerPrint)的生成、建立索引和化合物的子结构检索。本文以PubChem有机化合物Molfile为数据源,在Oracle关系数据库上分别安装插件OrChem(JAVA)和Bingo(C~(++))构建了化合物分子结构数据库。本文从FingerPrint的构成和索引策略两方面讨论了OrChem和Bingo的主要差异,并选取10个特征化合物进行子结构检索测试。对存储40万种化合物的分子结构数据库的测试结果显示,OrChem可满足用户检索响应,Bingo则更为快捷。对于存储2600万种化合物的分子结构数据库,针对Bingo通过优化Oracle数据库内存管理、数据表结构、子结构预筛选参数,实现了满足用户的高效检索。
The use of relational database plug-ins enables the generation of compound fingerprints (FingerPrint), indexing and substructure search of compounds. In this paper, we use PubChem organic compound Molfile as data source, and insert the OrChem (JAVA) and Bingo (C ~ (++)) plugin into the Oracle relational database to construct the compound molecular structure database. In this paper, the main differences between OrChem and Bingo are discussed from the aspects of FingerPrint composition and indexing strategy. Ten characteristic compounds are selected for substructure search. Test results of a molecular structure database that stores 400,000 compounds show that OrChem meets user search responses and Bingo is faster. For molecular structure databases storing 26 million compounds, Bingo achieved efficient retrieval for users by optimizing Oracle database memory management, data table structure, and substructure pre-screening parameters.