基于物流信息的分类算法的研究及其应用

发布时间：2018-07-10 08:24

本文选题：物流 + 数据挖掘　；参考：《北京邮电大学》2015年硕士论文

【摘要】：近年来,信息技术的发展推动了信息化在企业物流管理应用中的兴起,使得企业中存储的数据呈现爆炸式增长。以数据作为资源,充分合理的利用数据挖掘技术深化企业物流管理,重点进行基于物流信息的数据挖掘技术及其应用的研究,可以帮助企业提高运作效率、降低成本、及时决策,已成为提升企业竞争力的有效途径。本文以数据挖掘分类算法中的K近邻算法为研究对象,在阐述了经典K近邻算法的核心思想与研究现状的基础上,总结出其两方面不足：(1)传统算法假设样本的不同属性对分类的重要性相同,导致不相关属性引起分类误判,影响算法准确率。(2)传统算法在选取待分类样本的近邻时需计算其与所有训练样本的距离,计算开销大且结果易受到噪声样本的影响,影响算法效率及准确率。针对以上两方面不足,分别提出两种改进策略： (1)提出基于属性约简的改进算法,体现不同属性对分类结果的差异性。该算法利用信息熵计算条件属性与决策属性间的相关系数,区分条件属性在分类过程中的重要性,并通过调整相关系数的阈值适当约简样本属性。数值分析显示,改进算法可在一定程度上提升分类准确率。 (2)提出基于聚类的样本裁剪改进算法,从而有效处理海量数据集,降低算法时间复杂度。此算法利用层次聚类限定K-means聚类的初始聚类中心,避免其随机选择影响聚类结果,同时引入K-means聚类修正层次聚类结果并从中选择具有代表性的样本集进行分类测试。仿真实验证明,通过以上的样本裁剪,改进算法可在提高或保持分类准确率的前提下,有效地降低分类器的计算量,提高分类效率。最后,本文在上述研究工作的基础上设计了一个改进的K近邻协同过滤推荐模型。该模型以北京市物流线路评分数据为应用对象,验证该模型在解决实际问题中的有效性和可行性。实验证明,改进算法推荐结果准确率显著提高,通过该模型能够帮助客户从大量专业信息中快速找到适合的物流公司,具有实际应用性。
[Abstract]:In recent years, the development of information technology has promoted the rise of information technology in the application of enterprise logistics management, making the data stored in the enterprise explosive growth. Taking data as the resource, making full and reasonable use of data mining technology to deepen enterprise logistics management, focusing on the research of data mining technology and its application based on logistics information, can help enterprises improve their operational efficiency and reduce their costs. Timely decision-making has become an effective way to enhance the competitiveness of enterprises. In this paper, the K-nearest neighbor algorithm in the classification algorithm of data mining is taken as the research object, and the core idea and research status of the classical K-nearest neighbor algorithm are expounded. The main conclusions are as follows: (1) the traditional algorithm assumes that the different attributes of the samples are of the same importance to the classification, which leads to the classification misjudgment caused by the unrelated attributes. (2) the traditional algorithm needs to calculate the distance between the nearest neighbor of the sample to be classified and all the training samples. The computation cost is large and the results are easily affected by the noise samples, which affects the efficiency and accuracy of the algorithm. In view of the above two shortcomings, two improved strategies are proposed: (1) an improved algorithm based on attribute reduction is proposed to reflect the difference of classification results between different attributes. The algorithm uses information entropy to calculate the correlation coefficients between conditional attributes and decision attributes to distinguish the importance of conditional attributes in the classification process and to reduce the sample attributes appropriately by adjusting the threshold of correlation coefficients. Numerical analysis shows that the improved algorithm can improve the classification accuracy to some extent. (2) an improved algorithm of sample clipping based on clustering is proposed to deal with massive data sets effectively and reduce the time complexity of the algorithm. This algorithm uses hierarchical clustering to define the initial clustering center of K-means clustering to avoid its random selection to affect the clustering results. At the same time, K-means clustering is introduced to modify the hierarchical clustering results and representative sample sets are selected for classification test. The simulation results show that the improved algorithm can effectively reduce the amount of computation and improve the classification efficiency on the premise of improving or maintaining the accuracy of classification. Finally, an improved K-nearest neighbor collaborative filtering recommendation model is designed based on the above work. The model is applied to the Beijing logistics line scoring data to verify the effectiveness and feasibility of the model in solving practical problems. The experimental results show that the accuracy of the improved recommendation algorithm is significantly improved and the model can help customers quickly find the suitable logistics company from a large number of professional information and it has practical application.
【学位授予单位】：北京邮电大学
【学位级别】：硕士
【学位授予年份】：2015
【分类号】：TP311.13

【参考文献】