基于邻近重采样和分类器排序的信用卡欺诈检测中不平衡数据研究

发布时间：2024-02-21 00:23

　　信用卡交易的普遍化,导致全球信用卡交易欺诈愈发严重,每年造成的损失高达数十亿美元。有效的信用卡欺诈检测算法可以有效地降低财务风险和金融风险。这种算法在很大程度上依赖于机器学习和数据挖掘技术,但由于信用卡交易数据分布并不均匀,使得设计欺诈检测系统具有挑战性。这种非静态分布使得正常的信用卡交易数据远多于欺诈交易数据,一般称之为不平衡数据。这种不均衡的数据分布通常会导致分类器被多数类(合法交易)数据所淹没,并且会因为不能预测少类数据(欺诈性交易)而失去预测功能。为解决这个问题,一种可能的解决方案是在数据级使用预处理技术。预处理技术是数据挖掘任务的关键步骤,处理后的数据直接应用于分类技术从而建立预测模型。预处理过程包括数据清洗,数据集成,数据变换,数据重采样等。本文主要从数据清洗和数据重采样两个方面进行研究。噪声数据指存在异常变化或错误的数据,会严重影响数据分类性能。重采样则是用于产生构建预测模型的训练数据,预测模型的质量很大程度上取决于在模型的训练中使用什么样的样本。重采样技术通过减少多数类(欠采样)或增加少数类(过采样)来产生均衡的训练集,通过这样的平衡训练集可以建立性能更高的预测模型。现...

【文章页数】：137 页

【学位级别】：博士

【文章目录】：
摘要
abstract
Chapter 1 Introduction
    1.1 Introduction
        1.1.1 Credit Card Fraud
        1.1.2 Types of Credit Card Fraud
            1.1.2.1 Bankruptcy Fraud
            1.1.2.2 Theft fraud/counterfeit Fraud
            1.1.2.3 Application Fraud
            1.1.2.4 Behavioral Fraud
        1.1.3 Losses Generated by Credit Card Fraud
    1.2 Fraud Analytics and Predictive Analytics
    1.3 Predictive Analytics for Credit Card Fraud
    1.4 Pre-processing Techniques for Class Imbalance
    1.5 Research Motivation and Problem Statement
    1.6 Contribution
    1.7 Software Implementation for Experimentation
    1.8 Layout of Thesis
Chapter 2 Literature Review
    2.1 Machine Learning
        2.1.1 Unsupervised Learning
        2.1.2 Supervised Learning
            2.1.2.1 Supervised Learning for Credit Card Fraud Detection
        2.1.3 Classification Techniques for Credit Card Fraud
            2.1.3.1 Decision Tree
            2.1.3.2 Support Vector Machine (SVM)
            2.1.3.3 IBK
            2.1.3.4 Voted Perceptron
            2.1.3.5 Linear Logistic
            2.1.3.6 Na?ve Bayes
            2.1.3.7 Bayesian Network
    2.2 Single & Multi-algorithm Classification Techniques used for CCFD
    2.3 General Framework of Credit Card Fraud Detection
    2.4 Techniques for Handling Class Imbalanced Datasets
        2.4.1 Algorithm Level Techniques
        2.4.2 Data Level Techniques
            2.4.2.1 Under-sampling Techniques
            2.4.2.2 Over-sampling Techniques
            2.4.2.3 Ensemble Techniques
            2.4.2.4 Cost Based Techniques
    2.5 Related Work
        2.5.1 Literature Survey for Resampling Techniques and Limitations
        2.5.2 Literature Survey for Ranking Classification Algorithms using MCDM
Chapter 3 A Novel Resampling Approach for Credit Card Fraud
    3.1 Motivation for the Novel Resampling Approach
    3.2 Locally Centered Mahalanobis Distance
    3.3 Algorithm for Noisy and Borderline Samples
        3.3.1 Algorithm for Noisy and Borderline samples
    3.4 Novel Resampling Approach
        3.4.1 Novel Under-sampling Approach
        3.4.2 Over-sampling Approach
            3.4.2.1 Over-sampling Algorithm
    3.5 Experimentation
        3.5.1 Credit Card Data Sets
            3.5.1.1 Australian Credit Approval (ACA)
            3.5.1.2 German Credit Data (GCD)
            3.5.1.3 Give Me Some Credit (GMSC)
            3.5.1.4 PAKDD 2010
            3.5.1.5 Indonesian Credit Card Dataset (ICCD)
        3.5.2 Dataset Preparation for Supervised Classification
            3.5.2.1 Training and Cross-validation Sets
            3.5.2.2 Testing Set
        3.5.3 Evaluation Criteria for Credit Card Datasets
            3.5.3.1 Performance Measures
        3.5.4 Experimental Procedure
    3.6 Results and Discussion
        3.6.1 Under-sampling Results
        3.6.2 Over-sampling Results
Chapter 4 Impact of Class Imbalance in Ranking Classifiers
    4.1 A Comparative Study of Decision Tree Algorithms for Credit Card Fraud
        4.1.1 Experimental Design
        4.1.2 Resampling the Datasets
        4.1.3 Feature selection and Classification
        4.1.4 Parameter Tuning of Classifiers
        4.1.5 Results & Discussion
    4.2 Ranking Classifiers Using MCDM for Imbalanced CCFD
        4.2.1 Proposed Scheme
            4.2.1.1 Pre-Processing Phase
            4.2.1.2 Data Mining Phase
            4.2.1.3 Ranking Phase
        4.2.2 Experimental Design
        4.2.3 Results and Discussion
            4.2.3.1 MCDM Phase
    4.3 Comparison of Different Ranking Approaches for Classifiers
Chapter 5 Conclusion
    5.1 Contributions and Conclusions
    5.2 Future Work
Acknowledgement
References
Research Results Obtained During the Study for Doctoral Degree

本文编号：3904750

资料下载

论文发表

支付宝下载
微信下载
会员下载

本文链接：https://www.wllwen.com/shoufeilunwen/jjglbs/3904750.html

上一篇：经济全球化中的国际物流影响因素及中国的应对策略研究
下一篇：企业社交媒体、知识分享和员工创造力