Load Balance Strategy of Data Routing Algorithm Using Semantics for Deduplication Clusters

来源 :Journal of Electronic Science and Technology | 被引量 : 0次 | 上传用户:tanwenbin89
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
The backup requirement of data centres is tremendous as the size of data created by human is massive and is increasing exponentially.Single node deduplication cannot meet the increasing backup requirement of data centres.A feasible way is the deduplication cluster,which can meet it by adding storage nodes.The data routing strategy is the key of the deduplication cluster.DRSS(data routing strategy using semantics) improves the storage utilization of MCS(minimum chunk signature) data routing strategy a lot.However,for the large deduplication cluster,the load balance of DRSS is worse than MCS.To improve the load balance of DRSS,we propose a load balance strategy used for DRSS,namely DRSSLB.When a node is overloaded,DRSSLB iteratively migrates the current smallest container of the node to the smallest node in the deduplication cluster until this overloaded node becomes non-overloaded.A container is the minimum unit of data migration.Similar files sharing the same features or file names are stored in the same container.This ensures the similar data groups are still in the same node after rebalancing the nodes.We use the dataset from the real world to evaluate DRSSLB.Experimental results show that,for various numbers of nodes of the deduplication cluster,the data skews of DRSSLB are under predefined value while the storage utilizations of DRSSLB do not nearly increase compared with DRSS,with the low penalty(the data migration rate is only6.5% when the number of nodes is 64). The backup requirement of data centers is tremendous as the size of data created by human is massive and is increasing exponentially.Single node deduplication can not meet the increasing backup requirement of data centers. A feasible way is the deduplication cluster, which can meet it by adding storage nodes. the data routing strategy is the key of the deduplication cluster. DRSS (data routing strategy using semantics) improves the storage utilization of MCS (minimum chunk signature) data routing strategy a lot. balance of DRSS is worse than MCS.To improve the load balance of DRSS, we propose a load balance strategy used for DRSS, that DRSSLB.When a node is overloaded, DRSSLB iteratively migrates the current smallest container of the node to the smallest node in the deduplication cluster until this overloaded node becomes non-overloaded. A container is the minimum unit of data migration. Similar files sharing the same features or file names are stored in the same container.This ensures the similar data groups are still in the same node after rebalancing the nodes. We use the dataset from the real world to evaluate DRSSLB.Experimental results show that, for various numbers of nodes of the deduplication cluster, the data skews of DRSSLB are under predefined value while the storage utilizations of DRSSLB do not nearly increase compared with DRSS, with the low penalty (the data migration rate is only 6.5% when the number of nodes is 64).
其他文献
依托2015年设置的红壤坡耕地耕作深度试验,选取免耕(NT)和梯度耕作深度处理(P10:翻耕10 cm;P20:翻耕20 cm;P30:翻耕30 cm),研究了耕作深度对花生根长及根系活力的影响.结果表
由于多方面因素的影响,架桥机设备故障发生率不断增高,大大降低了架桥机调度与使用效率。新时期建筑工程机械化水平不断提高,架桥机设备检验与运行成为行业关注的焦点。架桥
为了降低建筑设计低碳排放量,从而保证建筑工程施工处于可持续性发展状态,进而为施工单位节约更多成本,如何将低碳概念融合于建筑设计中成了重中之重的一项工作。因此,在建筑
该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥
期刊
众所周知,当电梯发生意外停电或故障、事故时,电梯轿厢内的乘客 ( 或司机 ) 存在被困的危险.为了使轿厢内乘客 ( 或司机 ) 在电梯发生意外停电或故障、事故时被困的风险降至
期刊
作为建筑施工活动的主要场所,施工现场的管理在建筑工程的建设过程汇总占据着非常重要的地位。施工现场是将建设目标转化为建筑成果的主要场所,因此施工现场的管理工作对整个
该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥
期刊
今年是中国养猪业的金猪年,这种持续高温的涨势不但让国内猪市一路飙高,同时也影响到了国外市场的猪价,美国和欧洲的猪价均出现了一定的涨幅.6月份开始,生猪市场的供求关系已
该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥
期刊
水利工程建设监理制是为了适应我国社会主义市场经济的发展,在水利工程建设领域推行的一项科学的管理制度。近十年来,水利工程建设在推行建设监理制方面做了努力,也取得了一