Skip to content

Fraud detection using Deep Neural Networks to predict fraudulent transactions in financial data. ๐Ÿšจ๐Ÿค– Complete process from EDA and data preprocessing to model training and evaluation. ๐Ÿ“Š๐Ÿ”

License

Notifications You must be signed in to change notification settings

sergio11/online_payment_fraud

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

24 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Fraud Detection Model with Deep Neural Networks (DNN)

This project focuses on developing an advanced Fraud Detection model using Deep Neural Networks (DNN) to identify fraudulent transactions in financial data. Fraud detection is an essential problem in the financial industry, where identifying fraudulent activities promptly can save significant amounts of money and protect customers.

The goal of this project is to build an accurate, efficient, and scalable model capable of detecting fraud in a highly imbalanced dataset, where fraudulent transactions represent only a small fraction of all transactions. This project follows a comprehensive approach to solving the problem by employing Exploratory Data Analysis (EDA), data preprocessing, and leveraging deep learning techniques to train a model capable of distinguishing between fraudulent and legitimate transactions.

๐Ÿ™ I would like to extend my heartfelt gratitude to Santiago Hernรกndez, an expert in Cybersecurity and Artificial Intelligence. His incredible course on Deep Learning, available at Udemy, was instrumental in shaping the development of this project. The insights and techniques learned from his course were crucial in crafting the neural network architecture used in this classifier.

We would like to express our gratitude to Jainil Shah for creating and sharing the Online Payment Fraud Detection dataset on Kaggle. This dataset, which contains detailed historical information about fraudulent transactions, has been invaluable in building and training the machine learning model for detecting fraud in online payments.

๐ŸŒŸ The dataset can be found on Kaggle. Your contribution is greatly appreciated! ๐Ÿ™Œ

โš ๏ธ Disclaimer

This project was developed for educational and research purposes only. It is an experimental implementation of deep learning techniques for fraud detection and should not be used in production systems or real-world financial applications.

The model presented in this repository has not been audited for regulatory compliance, financial security, or operational robustness. Fraud detection in real financial environments requires rigorous testing, domain expertise, and compliance with legal and ethical standards.

Users should not rely on this project for real-time fraud prevention or financial decision-making. Always consult industry professionals and use verified fraud detection solutions in real-world applications.

๐ŸŒŸ Explore My Other Cutting-Edge AI Projects! ๐ŸŒŸ

If you found this project intriguing, I invite you to check out my other AI and machine learning initiatives, where I tackle real-world challenges across various domains:

๐Ÿง‘โ€๐Ÿ”ฌ Exploratory Data Analysis (EDA)

During the EDA phase, the following insights were gathered from the dataset:

  • Fraud Percentage: Fraud transactions account for only 0.13% of the total transactions in the dataset, making it highly imbalanced.
  • Fraud by Transaction Type: Fraud occurred mostly in cashout and transfer transaction types. Fraud was rare in payment types.
  • Fraud Flagging (isFlaggedFraud): Very few fraud transactions were flagged (only 16 out of 8,213 fraud transactions), indicating a need for improved fraud detection.
  • Incorrectly Flagged Transactions: 99.805% of fraud transactions were incorrectly flagged as non-fraud, underlining a significant issue in fraud detection algorithms.
  • Fraud Transaction Amount Range: Fraudulent transactions predominantly occurred in the range of โ‚น1.3 Lakh - โ‚น3.6 Lakh, with the majority falling between โ‚น3.4 Lakh - โ‚น3.6 Lakh.

Key Conclusions:

  • Targeted Fraudulent Amounts: Focus on high-value transactions, particularly in the โ‚น1 - โ‚น4 Lakh range.
  • Fraud Mode: Fraud is most prevalent in cashout and transfer modes, which should be prioritized in fraud prevention efforts.
  • Improvements Needed: There is a significant gap in fraud detection, especially in the flagging process.

๐Ÿงน Data Preprocessing

The dataset required significant cleaning before being fed into the model:

Steps taken:

  1. Removed Irrelevant Features:

    • Removed columns like nameDest and nameOrig due to their high cardinality and low impact on fraud detection.
  2. Feature Engineering:

    • Created new features, such as balance_diff_org and balance_diff_dest, by calculating the difference between the old and new balances for both the origin and destination accounts.
  3. Encoded Categorical Variables:

    • Applied One-Hot Encoding to the type column to convert it into binary variables for the model to process.
  4. Scaled Numerical Features:

    • Normalized and scaled features like amount, balance_diff_org, and balance_diff_dest using MinMaxScaler to ensure they are within a similar range.

โš–๏ธ Handling Class Imbalance

Given that fraud accounts for only 0.13% of the total transactions, handling the class imbalance was a crucial step.

Steps taken:

  1. SMOTE (Synthetic Minority Over-sampling Technique):

    • Applied SMOTE to oversample the minority class (fraud) and balance the dataset.
    • Ensured that both fraud and non-fraud classes had a similar number of instances for training.
  2. Class Weights:

    • Used class weighting during model training to give higher importance to the minority class (fraud) so that the model doesn't get biased toward the majority class.

๐Ÿง  Neural Network Architecture

The final model architecture chosen was a Deep Neural Network (DNN). The choice of DNN was based on the following factors:

  • Non-linear relationships: Fraud detection involves complex patterns that linear models may struggle to capture.
  • Large dataset: With a large dataset and multiple features, DNNs are effective at learning intricate patterns.
  • Feature Interactions: DNNs are capable of learning interactions between features, which is crucial in fraud detection.

Architecture:

  • Input Layer: 20 nodes, one for each feature in the dataset.
  • Hidden Layers:
    • First Hidden Layer: 64 nodes with ReLU activation to capture complex patterns.
    • Second Hidden Layer: 32 nodes to reduce dimensionality and focus on essential features.
    • Third Hidden Layer: 16 nodes for further abstraction.
  • Output Layer: 1 node with a sigmoid activation function to predict the probability of fraud (1 for fraud, 0 for non-fraud).
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import Adam

# Initialize the Neural Network model
model = Sequential()

# Input Layer (First Hidden Layer)
model.add(Dense(128, input_dim=X_train.shape[1], activation='relu'))

# Dropout layer to prevent overfitting
model.add(Dropout(0.2))

# Additional Hidden Layers
model.add(Dense(64, activation='relu'))  # Second Hidden Layer
model.add(Dropout(0.2))  # Dropout layer for regularization

model.add(Dense(32, activation='relu'))  # Third Hidden Layer
model.add(Dropout(0.2))  # Dropout layer for regularization

# Output Layer (for binary classification)
model.add(Dense(1, activation='sigmoid'))  # Sigmoid for binary classification

# Compile the model
model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])

# Summary of the model to verify architecture
model.summary()

โšก Model Training with Early Stopping

To ensure optimal training and avoid overfitting, EarlyStopping was implemented.

Key points:

  • Monitor: We monitored the validation loss during training.
  • Patience: Set to 10 epochs, meaning the model would stop training if the validation loss didn't improve after 10 consecutive epochs.
  • Restoring Best Weights: The modelโ€™s best weights were restored from the epoch where validation loss was the lowest, ensuring the model was in the best state at the end of training.

๐Ÿ“Š Model Evaluation: Confusion Matrix & Classification Report

Confusion Matrix:

The confusion matrix provides a detailed breakdown of the model's predictions, showcasing the model's performance in distinguishing between fraud and non-fraud transactions.

The confusion matrix for the model's predictions is as follows:

[[593518 41923] [ 14886 620555]]

Explanation:

  • True Negatives (TN): 593,518 โ€“ Non-fraud transactions correctly identified as non-fraud.
  • False Positives (FP): 41,923 โ€“ Non-fraud transactions incorrectly classified as fraud.
  • False Negatives (FN): 14,886 โ€“ Fraud transactions incorrectly classified as non-fraud.
  • True Positives (TP): 620,555 โ€“ Fraud transactions correctly identified as fraud.

Classification Report:

The classification report provides key metrics for evaluating the model's performance across both classes (fraud and non-fraud).

  • Precision (Fraud): 0.9367 โ€“ The model correctly identified 93.67% of the fraudulent transactions.

  • Recall (Fraud): 0.9766 โ€“ The model correctly detected 97.66% of all fraudulent transactions.

  • F1-score (Fraud): 0.9562 โ€“ The harmonic mean of precision and recall, reflecting strong model performance.

  • Precision (Non-Fraud): 0.9755 โ€“ The model correctly identified 97.55% of non-fraudulent transactions.

  • Recall (Non-Fraud): 0.9340 โ€“ The model correctly detected 93.40% of non-fraudulent transactions.

  • F1-score (Non-Fraud): 0.9543 โ€“ A balanced score for non-fraud detection.

Overall Performance:

  • Accuracy: 0.9553 โ€“ The model achieved an accuracy of 95.53% on the test set.
  • Macro Average:
    • Precision: 0.9561
    • Recall: 0.9553
    • F1-score: 0.9553
  • Weighted Average:
    • Precision: 0.9561
    • Recall: 0.9553
    • F1-score: 0.9553

Key Insights from Evaluation:

  1. High Recall for Fraud Detection (Class 1.0): The model performs exceptionally well at detecting fraud, capturing 97.66% of fraudulent transactions with high precision (93.67%).

  2. Good Performance for Non-Fraud (Class 0.0): While the recall for non-fraud is slightly lower (93.40%), the precision remains high (97.55%), indicating that the model is generally accurate in identifying non-fraudulent transactions as well.

  3. Class Imbalance Handling: Despite the significant class imbalance (fraud transactions representing only 0.13% of total transactions), the model successfully managed the imbalance using SMOTE and class weights, ensuring that fraud detection was effective without overfitting the non-fraud class.

  4. Error Analysis: There are still some false positives (non-fraud transactions incorrectly predicted as fraud) and false negatives (fraud transactions missed by the model), but these errors are relatively low, suggesting that the model performs well for both classes.

๐Ÿ”ฎ Conclusion:

  • The model is highly effective at identifying fraudulent transactions while maintaining a strong balance between precision and recall for both fraud and non-fraud transactions.
  • The DNN model has shown its capability to handle complex patterns in fraud detection, especially with the help of SMOTE for balancing the dataset and class weighting during training.

๐Ÿ“š Requirements

  • Python 3.x
  • TensorFlow (Keras)
  • scikit-learn
  • pandas
  • imbalanced-learn
  • matplotlib
  • seaborn

โš ๏ธ Disclaimer

This project was developed for educational and research purposes only. It is an experimental implementation of deep learning techniques for fraud detection and should not be used in production systems or real-world financial applications.

The model presented in this repository has not been audited for regulatory compliance, financial security, or operational robustness. Fraud detection in real financial environments requires rigorous testing, domain expertise, and compliance with legal and ethical standards.

Users should not rely on this project for real-time fraud prevention or financial decision-making. Always consult industry professionals and use verified fraud detection solutions in real-world applications.

๐Ÿ™ Acknowledgments

We would like to express our gratitude to Jainil Shah for creating and sharing the Online Payment Fraud Detection dataset on Kaggle. This dataset, which contains detailed historical information about fraudulent transactions, has been invaluable in building and training the machine learning model for detecting fraud in online payments.

Thanks to this comprehensive dataset, we were able to explore key features and gain valuable insights into the patterns that distinguish fraudulent transactions from legitimate ones. We highly appreciate Jainil Shah's contribution to the data science community by providing this resource for further research and development in fraud detection.

A huge thank you to jainilcoder for providing the dataset that made this project possible! ๐ŸŒŸ The dataset can be found on Kaggle. Your contribution is greatly appreciated! ๐Ÿ™Œ

๐Ÿ™ I would like to extend my heartfelt gratitude to Santiago Hernรกndez, an expert in Cybersecurity and Artificial Intelligence. His incredible course on Deep Learning, available at Udemy, was instrumental in shaping the development of this project. The insights and techniques learned from his course were crucial in crafting the neural network architecture used in this classifier.

References

Visitors Count

Please Share & Star the repository to keep me motivated.

License โš–๏ธ

This project is licensed under the MIT License, an open-source software license that allows developers to freely use, copy, modify, and distribute the software. ๐Ÿ› ๏ธ This includes use in both personal and commercial projects, with the only requirement being that the original copyright notice is retained. ๐Ÿ“„

Please note the following limitations:

  • The software is provided "as is", without any warranties, express or implied. ๐Ÿšซ๐Ÿ›ก๏ธ
  • If you distribute the software, whether in original or modified form, you must include the original copyright notice and license. ๐Ÿ“‘
  • The license allows for commercial use, but you cannot claim ownership over the software itself. ๐Ÿท๏ธ

The goal of this license is to maximize freedom for developers while maintaining recognition for the original creators.

MIT License

Copyright (c) 2024 Dream software - Sergio Sรกnchez 

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
``

About

Fraud detection using Deep Neural Networks to predict fraudulent transactions in financial data. ๐Ÿšจ๐Ÿค– Complete process from EDA and data preprocessing to model training and evaluation. ๐Ÿ“Š๐Ÿ”

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published