Skip to content

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

License

Notifications You must be signed in to change notification settings

YerbaPage/MGDebugger

Repository files navigation

MGDebugger: Multi-Granularity LLM Debugger

License: MIT Python Version

Table of Contents

Introduction

MGDebugger is a hierarchical LLM code debugging method designed to isolate, identify, and resolve errors at various levels of granularity. Using a hierarchical bottom-up debugging approach, MGDebugger systematically progresses from individual subfunctions to the overall system, enabling precise error detection and correction.

With MGDebugger, developers can efficiently debug complex codes and functions by performing granular analysis, reducing debugging time, and improving the success rate of resolving complex issues.

MGDebugger Overview

MGDebugger System Architecture Overview

Subfunction Debugging

Subfunction Debugging Module

Getting Started

Prerequisites

Before running MGDebugger, ensure your environment meets the following requirements:

  • Python: Version 3.8 or later.

  • vLLM: Version 0.6.0 or later. Required for model loading and inference. You can follow the official installation guide to set it up.

  • Additional dependencies: Install all necessary Python packages using the following command:

    pip install -r requirements.txt

Configuring the vLLM Server

To launch the vLLM server with the DeepSeek-Coder-V2-Lite-Instruct model, execute the following command:

python -m vllm.entrypoints.openai.api_server \
    --model deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct \
    --trust-remote-code \
    --dtype auto \
    --api-key token-abc123s \
    --port 18889

This will initialize the model and start the server on port 18889.

Usage

Running the Demo

We've prepared a demo code snippet to showcase MGDebugger's debugging capabilities. You can run the demo by executing the following command after starting the vLLM server:

python demo.py

Running Experiments

Once the vLLM server is up and running, start MGDebugger by executing:

python main.py

Tip: You can modify the MODEL and input_seeds parameters in the config.py file to test different models and input configurations.

Log Management

MGDebugger automatically stores all debugging and error logs in the output_data directory. You can review these logs to gain deeper insights into debugging details and performance analysis.

Performance

The table below highlights the performance of different methods compared to the baseline (No-Debugging) on the HumanEval and MBPP datasets using the DeepSeek-Coder-V2-Lite model.

Method HumanEval Acc. (%) Δ Acc. (%) HumanEval RSR (%) MBPP Acc. (%) Δ Acc. (%) MBPP RSR (%)
No-Debugging 76.8 -- -- 67.2 -- --
Simple Feedback 82.3 +5.5 23.7 69.4 +2.2 6.7
Self-Edit 82.9 +6.1 26.3 71.2 +4.0 12.2
LDB (Block) 84.1 +7.3 31.6 74.0 +6.8 20.7
Self-Debugging (Expl.) 87.2 +10.4 44.7 73.4 +6.2 18.9
Self-Debugging (Trace) 86.0 +9.2 39.5 72.6 +5.3 16.5
Reflexion 90.9 +14.1 60.5 76.6 +9.4 28.7
Our Approach 94.5 +17.7 76.3 80.0 +12.8 39.0

Our approach achieved the highest accuracy on both HumanEval and MBPP datasets, with a remarkable improvement of +17.7% and +12.8% in accuracy over the baseline, respectively. The Repair Success Rate (RSR) was also significantly higher than other methods, demonstrating the effectiveness of our debugging strategy in fixing diverse code issues.

Contributing

We warmly welcome contributions to MGDebugger! We appreciate your feedback and look forward to building MGDebugger together with the community!

About

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages