Skip to content

TanyaSingh103/PlagiarismCheck

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Code Plagiarism Detection

This repository contains the implementation of a machine learning-based code plagiarism detection tool. The model identifies plagiarized code submissions by comparing the similarities between pairs of code files using a Random Forest classifier. The project includes preprocessing, data augmentation, model training, and evaluation steps.

Contents

  • notebook.ipynb: Jupyter notebook with the complete implementation.
  • data/: Directory containing C++ code files used for training and testing.
  • README.md: This README file.
  • alternate/: Directory containing previous versions of the model and other alternate methods tried

Overview

Problem Statement

The objective of this project is to develop a plagiarism detection tool that checks students' code submissions for similarities and identifies the plagiarized ones.

Methodology

  1. Data Collection: Pairs of code files labeled as either plagiarized (1) or not plagiarized (0).
  2. Preprocessing: Removing comments, includes, and other non-essential parts of the code to focus on logic.
  3. Data Augmentation: Creating additional samples by shuffling code lines to increase the dataset size.
  4. Feature Extraction: Using TF-IDF to convert code samples into numerical vectors.
  5. Model Training: Using a Random Forest classifier with hyperparameter tuning.
  6. Evaluation: Assessing the model using cross-validation and classification metrics.

Results

  • Best Parameters:
    • max_depth: None
    • min_samples_leaf: 1
    • min_samples_split: 2
    • n_estimators: 100
  • Cross-Validation Scores: [0.859, 0.953, 0.671]
  • Mean Cross-Validation Score: 0.828
  • Classification Report:
    • Precision (Class 0): 1.00
    • Recall (Class 0): 0.83
    • F1-Score (Class 0): 0.91
    • Precision (Class 1): 0.87
    • Recall (Class 1): 1.00
    • F1-Score (Class 1): 0.93
    • Overall Accuracy: 0.92

Usage

To use this project, follow these steps:

  1. Clone the repository:
    git clone https://github.com/yourusername/plagiarism-detection.git
  2. Navigate to the project directory:
    cd plagiarism-detection
  3. Open the Jupyter notebook:
    jupyter notebook notebook.ipynb

Dependencies

  • Python 3.6+
  • Scikit-learn
  • Imbalanced-learn
  • Numpy
  • Pandas
  • Jupyter Notebook

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published