多元统计与数据挖掘是管理科学、大数管管理与应用的专业核心主干专业课,将为大数据时代学生构建数据分析知识结构奠定基础,提升专业能力;该课程以高等数学、概率与统计理论为基础,其核心是针对现实不同场景的实际问题收集数据,并通过统计方法和数据挖掘的技术进行数据预处理、数据挖掘与知识发现,能用相关的软件解决数据分析问题,提升学生对数据的敏感性与探查能力,对于培养学生观察、分析实际问题、数据分析、建模的能力与素质有很大的帮助。 课程主要讲授数据挖掘的基本理论与概念、多元统计的各个分支及应用。包括DM的定义、流程、数据预处理、描述性统计分析、数据可视化、关联分析、聚类、分类、异常点分析等几个主要的部分,其中数据预处理包括数据的标准化、空缺值处理、噪音数据处理、数据规约;数据规约部分介绍主成分分析和因子分析;关联分析主要讲授关联规则、Aproiri算法、相关分析;聚类部分主要讲授K-means和层次聚类分析;分类部分主要讲授判别分析、决策树、基于优化的分类、多元回归等算法与应用;多元统计部分讲授多元正态分别、判别分析、方差分析等。该课程通过案例分析、上机练习提高学生应用数据挖掘和多元统计的方法解决实际问题的动手能力。 (课程介绍英文版) Multivariate statistics and data mining are core professional courses in management science, big data management, and application. They will lay the foundation for students to build a knowledge structure of data analysis and enhance their professional abilities in the era of big data; This course is based on advanced mathematics, probability, and statistical theory. Its core is to collect data for practical problems in different real-life scenarios, and to use statistical methods and data mining techniques for data preprocessing, data mining, and knowledge discovery. Students can use relevant software to solve data analysis problems, enhance their sensitivity and exploration ability to data, and greatly help cultivate their abilities and qualities in observing, analyzing practical problems, data analysis, and modeling. The course mainly teaches the basic theories and concepts of data mining, various branches and applications of multivariate statistics. It includes several main parts such as the definition, process, data preprocessing, descriptive statistical analysis, data visualization, correlation analysis, clustering, classification, and outlier analysis of DM. Among them, data preprocessing includes data standardization, missing value processing, noise data processing, and data reduction; The data specification section introduces principal component analysis and factor analysis; Association analysis mainly teaches association rules, Apriori algorithm, and correlation analysis; The clustering section mainly teaches K-means and hierarchical clustering analysis; The classification section mainly teaches algorithms and applications such as discriminant analysis, decision trees, optimization based classification, and multiple regression; The section on multivariate statistics covers topics such as multivariate normality, discriminant analysis, and analysis of variance. This course enhances students' hands-on ability to apply data mining and multivariate statistics methods to solve practical problems through case analysis and computer practice. |