https://sugatagh.github.io/dsml/projects/credit-card-fraud-detection/
Part
Part
-
Detection of a fraudulent credit card transaction can be helped by a number of factors such as the time and amount of the transaction.
-
In this project, we build classification models to predict whether a credit card transaction is authentic or fraudulent, based on the data regarding time, amount and a set of PCA-transformed features for a large number of transactions.
-
A detailed exploratory data analysis on the dataset is carried out.
-
We observe that the data is imbalanced with respect to the target variable. After splitting the data into training set and test set, we consider three undersampling and three oversampling techniques to balance the training set.
-
We scale the features appropriately through a modified version of the min-max normalization.
-
We employ a number of classifiers, namely logistic regression, k-nearest neighbors classifier, decision tree, support vector machine with linear kernel, naive Bayes classifier, random forest, linear discriminant analysis, stochastic gradient descent, and ridge classifier.
-
The performance of these classifiers, trained separately on the unaltered training set as well as the training set obtained from each of the six resampling approaches, are evaluated through a number of evaluation metrics. Considering the nature of the problem, we use
$F_2$ -score as the primary metric to evaluate the models. -
The random forest algorithm applied on the training set obtained after oversampling the minority class (fraudulent transactions) via synthetic minority over-sampling technique (SMOTE) appears to perform best, in terms of
$F_2$ -score on the test set. It achieves a test$F_2$ -score of$0.880783$ .