CIND820-Bank-Account-Fraud

The dataset I used is this link: https://www.kaggle.com/datasets/sgpjesus/bank-account-fraud-dataset-neurips-2022?select=Base.csv It’s the base of the whole bank account fraud dataset suite. This is a tabular dataset with 1 million instances and 31 features.

First, I did data understanding, and found there’s a column “device_fraud_count” just has one value for all instances, so I drop this attribute.
Then I checked if there are some attributes’ values are mostly missing. I found “prev_address_months_count”, “intended_balcon_amount”, so I drop these two attributes.
Then I impute the rest attributes with missing value. Some use -1 to represent missing values. Some use negative value as missing values. When impute numerical data, I use median. When impute categorical data, I use mode.
After imputation, I do train-test split based on attribute “month”, [0:5] as training, and [6:7] as test.
Because of the imbalance characteristic, I applied SMOTE oversampling techniques, and made two labels have equal quantity.
Then I did feature selection using domain and correlation.
After that, I did 1-in-100 systematic sampling.
After sampling, I used time-series validation.
To do modeling, I applied three techniques: Decision Tree, Random Forest, and Logistic Regression.
About measures, I use confusion matrix, Precision, Recall, F1-score, ROC_AUC, Matthew’s correlation coefficient to do comparison for effectiveness.
For Efficiency, I compared each model’s execution time.
For stability, I changed seed to 10, 500, 5000 to check the change of the metrics’ results.

All python codes are in Code folder.

Please feel free to request the full and updated version of the report by sending your inquiry to my email address: k10lu@torontomu.ca. I encourage open communication, and I am here to assist with any information or queries you may have. Your engagement is greatly appreciated.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Code		Code
Output		Output
Final Presentation.pdf		Final Presentation.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CIND820-Bank-Account-Fraud

About

Releases

Packages

Languages

MacroTechKai/Bank-Account-Fraud

Folders and files

Latest commit

History

Repository files navigation

CIND820-Bank-Account-Fraud

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages