Lossless Compression of Random Forests

来源 :计算机科学技术学报(英文版) | 被引量 : 0次 | 上传用户:pengqiuyu1990
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
Ensemble methods are among the state-of-the-art predictive modeling approaches.Applied to mod big data,these methods often require a large number of sub-leers,where the complexity of each leer typically grows with the size of the dataset.This phenomenon results in an increasing demand for storage space,which may be very costly.This problem mostly manifests in a subscriber-based environment,where a user-specific ensemble needs to be stored on a personal device with strict storage limitations (such as a cellular device).In this work we introduce a novel method for lossless compression of tree-based ensemble methods,focusing on random forests.Our suggested method is based on probabilistic modeling of the ensemble’s trees,followed by model clustering via Bregman divergence.This allows us to find a minimal set of models that provides an accurate description of the trees,and at the same time is small enough to store and maintain.Our compression scheme demonstrates high compression rates on a variety of mod datasets.Importantly,our scheme enables predictions from the compressed format and a perfect reconstruction of the original ensemble.In addition,we introduce a theoretically sound lossy compression scheme,which allows us to control the trade-off between the distortion and the coding rate.
其他文献
2003年,滨州市科研中试基地,利用50亩对虾养殖池塘,进行海参养殖试验,取得较好效益,现将试验情况介绍如下.
应用国际上森林碳汇项目中普遍采用的碳计量方法,研究了湖南省不同立地条件的马尾松林碳储量变化过程,提出了碳储量成熟的定义,并分析了立地条件对碳储量成熟的影响,最后就碳
Different from a general density estimation,the crime density estimation usually has one important factor:the geographical constraint.In this paper,a new crime
期刊
跟准热点如何找准热点?一是从新上找。时刻关注媒体相关报道,如突出报道某行业或领域新发生的变化等。二是从位上找。通过公布的统计数字分析,如涨幅排名连续出现,则可能成为
期刊
期刊
患者 男 ,3 2岁。患者小时候常自觉左上腹隐痛 ,多次诊断无果 ,于 3年前无明显原因及诱因出现咳嗽、咳痰 ,痰呈黄色 ,质粘稠 ,量多 ,无血痰 ;胸痛 ,胸闷 ,心悸、气短及夜间