Mining Device-Specific Apps Usage Patterns From Appstore Big Data

来源 :第六届中国计算机学会大数据学术会议 | 被引量 : 0次 | 上传用户:yaoyaoqi
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  When smartphones,applications(a.k.a,apps),and app stores have been widely adopted by the billions,an interesting debate emerges: whether and to what extent do device models in uence the behaviors of their users? The answer to this question is critical to almost every stakeholder in the smartphone app ecosystem,including app store operators,developers,end-users,and network providers.To approach this question,we collect a longitudinal data set of app usage through a leading Android app store in China,called Wandoujia.The data set covers the detailed behavioral profiles of 0.7 million(761,262)unique users who use 500 popular types of Android devices and about 0.2 million(228,144)apps,including their app management activities,daily network access time,and network traffic of apps.We present a comprehensive study on investigating how the choices of device models affect user behaviors such as the adoption of app stores,app selection and abandonment,data plan usage,online time length,the tendency to use paid/free apps,and the preferences to choosing competing apps.Some significant correlations between device models and app usage are derived from appstore big data,leading to important findings on the various user behaviors.For example,users owning different device models have a substantial diversity of selecting competing apps,and users owning lower-end devices spend more money to purchase apps and spend more time under cellular network.
其他文献
Network embedding is a very important task to represent the high-dimensional network in a low-dimensional vector space,which aims to capture and preserve the network structure.Most existing network em
在监督学习中,标签噪声对模型建立有较大的影响.目前对于标签噪声的处理方法主要有基于模型预测的过滤方法和鲁棒性建模方法,然而这些方法要么过滤效果不好,要么过滤效率不高.针对该问题,本文提出一种基于数据分布的标签噪声过滤方法.首先对于数据集中的每一个样本,根据其近邻内样本的分布,将其及邻域样本形成的区域划分为高密度区域和低密度区域,然后针对不同的区域采用不同的噪声过滤规则进行过滤.与已有方法相比,本文
由工业设备产生、采集和处理的数据大多是时间序列、空间序列、高维矩阵等非结构化数据.目前单机分析环境如R、Matlab等提供了优质丰富的算法库,但随着数据生成速度和规模的不断升级,上述工具在处理大规模序列和矩阵运算时呈现低效甚至失效的现象.针对可处理数据规模和算法可移植性问题,本文设计了一种大规模时间序列分析框架LTSAF(Large-scale Time Series Analysis Frame
微分进化(DE)是一种基于种群的简单有效的全局优化方法,已在多目标优化领域得到了广泛关注.本文提出一种基于极大极小关联密度的多目标微分进化(MODEMCD)算法.新算法定义了极大极小关联密度,在严格遵守Pareto支配规则基础上,给出了基于极大极小关联密度的外部档案集维护方法,从而避免或减少最终解集的多样性损失.此外,设计了一种自适应选择策略,该策略通过评价个体的关联密度来指导个体优劣的选择过程,
Approximations based on random Fourier features have recently emerged as an efficient and elegant methodology for designing large-scale machine learning task.Unlike approaches used by the Nystr(o)m me
Fault diagnosis techniques based on probabilistic graphical models are often used for uncertain information reasoning.Among them,Bayesian network,an effective tool which has strong characteristics of
将豆瓣短评内容作为分析样本,从用户在线评论数据中挖掘用户喜好,探索适用于中国动漫品牌个性维度研究中各维度权重大小的评价方法,以助于中国动漫企业发现品牌个性维度构建中的不足之处.首先以前人构建好的中国本土品牌个性维度模型“仁、智、勇、乐、雅”作为研究基础,通过《同义词词林》词典对基础特征词进行拓展.其次对样本进行数据预处理,各维度对应的特征词语词频统计与归一化处理,然后运用熵权法计算各品牌个性维度的
网络空间中具有纷繁复杂的多种态势要素、要素属性,以及要素之间的错综关系.对这些信息能否清晰准确地分析并描述,直接关系到所建立的网络空间可视化模型的准确性、完备性、有效性.本文采用知识表示方法,对网络空间中的关键态势信息要素进行描述,主要研究内容包括以下三个方面.首先分析了网络空间态势信息知识的特点,提出了对网络空间态势信息进行知识表示的重要作用.其次研究了基于本体的知识表示理论,分析了采用本体表示
In order to solve the problems of poor portability,complex implemen-tation,and low efficiency in the traditional parameter training of the Belief rule-base,an artificial bee colony algorithm combined
The existing keyword-based search algorithms based on streaming data are hard to meet the needs of users for real-time data processing.To solve this problem,multi-keyword parallel search algorithm for