Skip to content

AleMidolo/GPTIdiomRefactoring

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Refactoring of Non-Idiomatic Python Code: A Differentiated Replication with LLMs

Abstract

In the Python ecosystem, the adoption of idiomatic constructs has been fostered because of their expressiveness, increasing productivity and even efficiency, despite controversial arguments concerning familiarity or understandability issues. Recent research contributions have proposed approaches---based on static code analysis and transformation---to automatically identify and enact refactoring opportunities of non-idiomatic code into idiomatic ones. Given the potential recently offered by Large Language Models (LLMs) for code-related tasks, in this paper, we present the results of a replication study in which we investigate GPT-4 effectiveness in recommending and suggesting idiomatic refactoring actions. Our results reveal that GPT-4 not only identifies idiomatic constructs effectively but frequently exceeds the benchmark in proposing refactoring actions where the existing baseline failed. A manual analysis of a random sample shows the correctness of the obtained recommendations. Overall, our findings underscore the potential of LLMs to achieve tasks where, in the past, implementing recommenders based on complex code analyses was required.

Run the Experiment

  1. Unzip both Data.zip and Results.zip.

  2. To generate Pythonic idioms using GPT-4, run the main.py file.

  3. For performance evaluation, execute the metrics.py file after the generation process is complete.
    Note: Refer to the documentation within each file before running them.


Results

The results of the generation are stored in the Results directory. Its structure is as follows:

1. all_refactorings

This directory contains all generation output files. Each file corresponds to a specific Pythonic idiom and is structured as follows:

  • file_html: URL of the original code.
  • method_content: Code of the original method.
  • file_name: Name of the file containing the original method.
  • lineno: Starting line of the method in the file.
  • old_code: Code within method_content that needs refactoring.
  • bench_code: Refactored code proposed in the benchmark.
  • count_bench: Number of refactorings proposed by the benchmark.
  • gpt_code: Refactored code generated by GPT-4.
  • count_gpt: Number of refactorings proposed by GPT-4.
  • text: GPT-4's response excluding the code (text-only).
  • answer: Complete response from GPT-4 (code + text).

2. performance_evaluation

This directory contains filtered output files categorized by the number of refactorings proposed:

  • bench_more: Cases where the benchmark proposed more refactorings than GPT-4.
  • equals: Cases where GPT-4 and the benchmark proposed the same number of refactorings.
  • gpt_more: Cases where GPT-4 proposed more refactorings than the benchmark.
  • zero: Cases where GPT-4 did not propose any refactoring.

3. correctness_evaluation

This directory contains two .csv files—one for GPT-4 and one for the benchmark. Each file contains the same columns as in all_refactorings, with the following additional columns:

  • correct: Number of refactorings that are correct.
  • wrong: Number of refactorings that are incorrect.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages