【摘 要】
:
Approximations based on random Fourier features have recently emerged as an efficient and elegant methodology for designing large-scale machine learning task.Unlike approaches used by the Nystr(o)m me
【机 构】
:
College of Data Science Taiyuan university of technology,China
【出 处】
:
第六届中国计算机学会大数据学术会议
论文部分内容阅读
Approximations based on random Fourier features have recently emerged as an efficient and elegant methodology for designing large-scale machine learning task.Unlike approaches used by the Nystr(o)m method which are randomly sampled from the training examples and are therefore data dependent,we take use of random Fourier features,whose basis functions(i.e.,cosine and sine functions)are sampled from a distribution independent from the training sample set,to cluster preference data which appear extensively in recommender system.The main idea of our method consists of employing random Fourier features to explicitly represent preference data in feature space.Our explicitly mapping method can significantly speed up eigenvector approximation and benefit prediction speed in preference clustering.The advantage of our proposed two-stage method is that we can save computing time and memory space.Compared with traditional preference clustering,our method solve the problem of insufficient memory and improve the efficiency of the operation greatly.At last,the experimental on movie data sets which containing 100000 ratings,show that the proposed method is effectiveness in clustering accuracy than Nystr(o)m methods and K-means while its speed is faster than these clustering approaches.
其他文献
机器学习在网络入侵检测中的应用已经受到各界广泛关注,应用的算法主要是决策树、随机森林、logit、KNN等机器学习模型,这些算法发布时间较长、应用成熟、发掘潜力有限.Xgboost算法推出时间相对较晚,在网络入侵检测中的研究较少.文章以此为研究对象,基于入侵检测数据集KDD99,使用logit、KNN、决策树、随机森林、Xgboost等机器学习模型分别进行5折交叉验证,计算和比较这些算法的识别效果
High-quality image deconvolution is required for many image processing applications.Our work concentrates on portraying a new image deconvolution method based on Retinex prior knowledge.We build a new
每年秋冬季节,也是中国北方大部分城市的雾霾多发季节,而且随着最近几年城市私家车保有量的提升以及冬季取暖等其它污染源的增加,雾霾呈现出越来越严重的态势,严重影响人们的生活和健康.当前对于雾霾的主要防治措施主要为污染源的治理和生活办公区域的空气净化.该文设计出一种新型的“负离子雾霾收集器”,该设备充分利用电子流动裹着雾霾颗粒除霾,纯物理过程,非常安全,利用电子流动裹着雾霾颗粒除霾,纯物理过程,非常安全
问句实体链接是问答系统的重要步骤之一,传统的方法都是先识别出问句中的命名实体,然后再链接到知识库,这需要大量的数据处理和特征选择工作,而且容易造成错误累积,降低链接效果.针对这种情况,本文提出基于注意力机制的编码器—解码器实体链接模型AMEDEL.该模型使用双向的长短期记忆网络对问句进行编码,经过注意力机制解码,生成对应的实体指称和消歧信息输出,最后再链接到知识库实体.在有关汽车领域车系产品问句和
Sleep staging has attracted significant attention as a critical step in auxiliary diagnosis of sleep disease.To avoid subjectivity in the process of doctors manual sleep staging,and to realize scienti
Many software projects use bug tracking systems to collect and allocate the bug reports,but the priority assignment tasks become difficult to be completed because of the increasing bug reports.In orde
Network embedding is a very important task to represent the high-dimensional network in a low-dimensional vector space,which aims to capture and preserve the network structure.Most existing network em
在监督学习中,标签噪声对模型建立有较大的影响.目前对于标签噪声的处理方法主要有基于模型预测的过滤方法和鲁棒性建模方法,然而这些方法要么过滤效果不好,要么过滤效率不高.针对该问题,本文提出一种基于数据分布的标签噪声过滤方法.首先对于数据集中的每一个样本,根据其近邻内样本的分布,将其及邻域样本形成的区域划分为高密度区域和低密度区域,然后针对不同的区域采用不同的噪声过滤规则进行过滤.与已有方法相比,本文
由工业设备产生、采集和处理的数据大多是时间序列、空间序列、高维矩阵等非结构化数据.目前单机分析环境如R、Matlab等提供了优质丰富的算法库,但随着数据生成速度和规模的不断升级,上述工具在处理大规模序列和矩阵运算时呈现低效甚至失效的现象.针对可处理数据规模和算法可移植性问题,本文设计了一种大规模时间序列分析框架LTSAF(Large-scale Time Series Analysis Frame
微分进化(DE)是一种基于种群的简单有效的全局优化方法,已在多目标优化领域得到了广泛关注.本文提出一种基于极大极小关联密度的多目标微分进化(MODEMCD)算法.新算法定义了极大极小关联密度,在严格遵守Pareto支配规则基础上,给出了基于极大极小关联密度的外部档案集维护方法,从而避免或减少最终解集的多样性损失.此外,设计了一种自适应选择策略,该策略通过评价个体的关联密度来指导个体优劣的选择过程,