,TextGen：a realistic text data content generation method for modern storage system benchmarks

来源 :信息与电子工程前沿（英文版） | 被引量 : 0次 | 上传用户：ff303

【摘要】

：

Mode storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of

【作者】

：

Long-xiang WANG Xiao-she DONG Xing-jun ZHANG Yin-feng WANG Tao JU Guo-fu FENG

【机构】

：

School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China“,”S

【出处】

：

信息与电子工程前沿（英文版）

【发表日期】

：

2016年10期

【关键词】

：

Benchmark Storage system Word-based compression

下载到本地 , 更方便阅读

下载此文赞助VIP

声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架

论文部分内容阅读

Mode storage systems incorporate data compressors to improve their performance and capacity. As a result, data content can significantly influence the result of a storage system benchmark. Because real-world proprietary datasets are too large to be copied onto a test storage system, and most data cannot be shared due to privacy issues, a benchmark needs to generate data synthetically. To ensure that the result is accurate, it is necessary to generate data content based on the characterization of real-world data properties that influence the storage system performance during the execution of a benchmark. The existing approach, called SDGen, cannot guarantee that the benchmark result is accurate in storage systems that have built-in word-based compressors. The reason is that SDGen characterizes the properties that influence compression performance only at the byte level, and no properties are characterized at the word level. To address this problem, we present TextGen, a realistic text data content generation method for mode storage system benchmarks. TextGen builds the word corpus by segmenting real-world text datasets, and creates a word-frequency distribution by counting each word in the corpus. To improve data generation performance, the word-frequency distribution is fitted to a lognormal distribution by maximum likelihood estimation. The Monte Carlo approach is used to generate synthetic data. The running time of TextGen generation depends only on the expected data size, which means that the time complexity of TextGen isO(n). To evaluate TextGen, four real-world datasets were used to perform an experiment. The experimental results show that, compared with SDGen, the compression performance and compression ratio of the datasets generated by TextGen deviate less from real-world datasets when end-tagged dense code, a representative of word-based compressors, is evaluated.

其他文献

她有一颗艰苦创业之心——记四川大学新闻系教师王绿萍制作教学幻灯片事迹

四川大学新闻系教师王绿萍是一位有心之人。她有一颗艰苦创业之心。她所主持制作的一套新闻史教学幻灯片终于获得成功。新闻历史的教学,本来是枯燥的。它需要借助于形象,用

期刊

新闻系学术讨论会学术论文上海复旦大学有彩色中国新闻史藏书楼珍藏品协作精神科研项目

充满喜庆的观果植物——火棘

火棘也称火把果、救军粮,其秋冬季节红果满枝,鲜红似火,经冬不落,作为年宵花中的观果植物,红艳艳的果实充满喜庆之感,很能营造节日的欢乐氛围。此外,火棘的果实含有丰富的有

期刊

观果植物火棘火把果军粮红果有机酸钝锯齿茎皮复伞房花序长枝

高校本科研究性教学探索--以“中国古代政治制度史”教学为例

研究性教学在高校教学中的重要性日益提高。我们在“中国古代政治制度史”这门历史学专业的选修课中对研究性教学进行探索，既可发挥教师的主导作用，又重视了学生的主体地位，引导

期刊

研究性教学主导作用主体地位

发挥周日例会在大学生思想政治教育中的作用研究

周日例会是实施大学生思想政治教育工作不可忽视的一种重要途径。为了充分发挥周日例会制度在大学生思想政治教育工作中的作用，提高大学生的思想政治教育工作的针对性、预见性

期刊

思想政治教育周日例会作用研究

粳稻二元不育系和三交种应用的研究

该文主要利用2个BT型粳稻雄性不育系六千辛A和3726A与17个常规粳稻品种(系)配制二元不育系,然后再和相应的恢复系77302-1、C堡或六千辛R配组获得的19个三交种,对二元不育系

学位

粳稻二元不育系三交种配子选择

,Fast uniform content-based satellite image registration using the scale-invariant feature transform

Content-based satellite image registration is a difficult issue in the fields of remote sensing and image processing. The difficulty is more significant in the

期刊

Content-based image retrievalFeature point distributionImage registrationLine

几类异质1BL/1RS小麦雄性不育系诱导单倍体机理及其育性恢复性的再研究

通过4种异源细胞质(粘果山羊草、易变山羊草、偏凸山羊草、二角山羊草细胞质)1BL/1RS小麦雄性不育系和对应普通细胞质亲本系及其与生产上已推广的优良品种(系)杂交,从杂种F的

学位

异质小麦1BL/1RS易位系单价体雄性不育恢复性单倍体

,A framework for an integrated unified modeling language

The unified modeling language(UML) is one of the most commonly used modeling languages in the software industry.It simplifies the complex process of design by providing a set of graphical notations,wh

期刊

Unified modeling language (UML)IntegrationModelingSystem analysis and design

水稻短光低温诱导雄性不育的特性及其遗传利用研究

该试验通过对短光低温不育水稻宜DS育性转换特性、不育性和叶缘颜色的遗传基因,花培后代的整齐度分析等几方面进行研究后得到以下主要结论:1.宜D1S是一个光敏性较强的材料,在

学位

短光低温诱导雄性不育特性水稻种质遗传育性转换特性叶缘颜色花培群体

读者问卷活动及3月部分中奖名单

1.儿时小儿麻痹,32岁时仍孑然一身。身体虽受禁锢,灵魂却无比倔强,天无绝人之路,卖烧烤开花Q果,一年盈利十余万元。请问,发明开花Q果小吃的是哪一家企业?2.他出身贫寒,初中辍

期刊

小儿麻痹出身贫寒小吃问卷第一年周飞板面河北法制报张伟企业名称

,TextGen：a realistic text data content generation method for modern storage system benchmarks

与本文相关的学术论文