论文部分内容阅读
音乐信息检索系统中一个主要的部分就是音乐流派的自动分类,文章将听觉图像置入音乐流派的自动分类中,用其模型来进行人耳耳蜗结构的模拟。GTZAN是音乐流派经常使用的一个数据库,通过实现一维音频信号到二维听觉图像的一个转换,在不改变尺度的情况下来进行特征的转换以及空间金字塔的匹配。这样能够从整体到部分的采集图像纹理特点,最后应用中线性函数来实现音乐流派的自动分类。听觉图像背景下的音乐流派分类的准确率要比美尔频率倒谱系数的流派分类高15%。
Music information retrieval system is a major part of the automatic classification of music genres, the article will be placed into the auditory music category automatic classification, with its model for human ear cochlear structure simulation. GTZAN is a frequently used database of music genres, which can transform features and match spatial pyramids by changing one-dimensional audio signals to two-dimensional auditory images without changing the scale. This can be collected from the whole to part of the image texture features, the final application of the linear function to achieve the automatic classification of music genres. The accuracy of the music genre classification in the auditory image context is 15% higher than the genre classification of the United States frequency cepstral coefficient.