论文部分内容阅读
传统的计算数字文档之间的结构相似度(DSS)的方法是基于树的编辑距离或 Fourier 变换.本文提出利用查询问题的结构化描述树 Q 与文档元数据描述树 T 之间的部分-整体匹配求解 DSS.给出用字符串表示有向标记树的方法,并把上述树之间的相似度计算转化为对应 Q 和 T 的字符串表示之间的匹配计算,从而导出高效的DSS 算法.实验表明,对给定的结构化查询,本文算法在查全率和查准率上优于树编辑距离算法.
The traditional method to calculate the structural similarity (DSS) between digital documents is based on the tree edit distance or Fourier transform.This paper proposes that the part between the structured description tree (Q) and the document metadata description tree (T) Matching Solves DSS. We present a method of representing a directed labeled tree with strings, and convert the similarity between the above trees into a matching calculation between the string representations corresponding to Q and T to derive an efficient DSS algorithm. Experiments show that for a given structured query, the proposed algorithm outperforms the tree edit distance algorithm in look-up rate and precision.