Skip to content

Commit

Permalink
Tree Search support (#39)
Browse files Browse the repository at this point in the history
  • Loading branch information
aorwall authored Jan 13, 2025
1 parent 29062b3 commit 1184309
Show file tree
Hide file tree
Showing 130 changed files with 53,881 additions and 4,871 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,5 @@ notebooks/local_experiments.ipynb
playground
logs
Pipfile
experiments
evals
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# Moatless Tools
Moatless Tools is a hobby project where I experiment with some ideas I have about how LLMs can be used to edit code in large existing codebases. I believe that rather than relying on an agent to reason its way to a solution, it is crucial to build good tools to insert the right context into the prompt and handle the response.

_Right now I'm focusing on [moatless-tree-search](https://github.com/aorwall/moatless-tree-search), an extended version of moatless-tools that builds a tree structure of nodes with parallel solutions and uses tree search to find the optimal trajectory. The code in moatless-tools has been simplified and is now a streamlined version of this expanded codebase._
_For the implementation used in the paper [SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement](https://arxiv.org/abs/2410.20285), please see [moatless-tree-search](https://github.com/aorwall/moatless-tree-search)._

## SWE-Bench
I use the [SWE-bench benchmark](https://www.swebench.com/) as a way to verify my ideas.

### Version 0.0.4: Deepseek V3
With version 0.0.4 I get 30.7% solve rate (92 instances) using the open-source Deepseek V3 model. The most notable aspect of this is the extremely low cost - the entire evaluation run costs less than $4 ($0.0127 per instance), achieving **24 resolved instances per dollar spent**.

* [Deepseek V3 evaluation results](https://experiments.moatless.ai/evaluations/moatless_tools_v4_deepseek_chat_3_temp_0_iter_20_fmt_react)
* [Claude 3.5 Sonnet v20241022 evaluation results](https://experiments.moatless.ai/evaluations/moatless_tools_v4_claude_3_5_sonnet_20241022_temp_0_iter_20_fmt_tool_call)

### Version 0.0.3: Claude 3.5 Sonnet v20241022
With version 0.0.3 I get 38.3% solve rate with Claude 3.5 Sonnet v20241022. Average cost per instance is $0.30.

Expand Down
11 changes: 11 additions & 0 deletions datasets/easy_dataset.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"name": "easy",
"description": "Instances with 35 or more resolved solutions that exist in both lite and verified datasets",
"instance_ids": [
"django__django-11099",
"django__django-11133",
"django__django-13658",
"django__django-16255",
"django__django-16527"
]
}
99 changes: 99 additions & 0 deletions datasets/lite_and_verified_dataset.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
{
"name": "lite_and_verified",
"description": "All instances that exist in both lite and verified datasets",
"instance_ids": [
"astropy__astropy-12907",
"astropy__astropy-14182",
"astropy__astropy-14365",
"astropy__astropy-14995",
"django__django-10914",
"django__django-11099",
"django__django-11133",
"django__django-11179",
"django__django-11815",
"django__django-11848",
"django__django-11964",
"django__django-11999",
"django__django-12125",
"django__django-12308",
"django__django-12708",
"django__django-13028",
"django__django-13033",
"django__django-13158",
"django__django-13315",
"django__django-13401",
"django__django-13551",
"django__django-13590",
"django__django-13658",
"django__django-13925",
"django__django-13933",
"django__django-13964",
"django__django-14017",
"django__django-14155",
"django__django-14238",
"django__django-14534",
"django__django-14580",
"django__django-14608",
"django__django-14672",
"django__django-14752",
"django__django-14787",
"django__django-14855",
"django__django-14915",
"django__django-14999",
"django__django-15252",
"django__django-15695",
"django__django-15814",
"django__django-15851",
"django__django-16139",
"django__django-16255",
"django__django-16527",
"django__django-16595",
"django__django-17087",
"matplotlib__matplotlib-23299",
"matplotlib__matplotlib-23314",
"matplotlib__matplotlib-23476",
"matplotlib__matplotlib-24149",
"matplotlib__matplotlib-24970",
"matplotlib__matplotlib-25311",
"matplotlib__matplotlib-25332",
"psf__requests-2317",
"pydata__xarray-4094",
"pylint-dev__pylint-7080",
"pytest-dev__pytest-7432",
"pytest-dev__pytest-7490",
"scikit-learn__scikit-learn-10297",
"scikit-learn__scikit-learn-13142",
"scikit-learn__scikit-learn-13439",
"scikit-learn__scikit-learn-13496",
"scikit-learn__scikit-learn-13779",
"scikit-learn__scikit-learn-14087",
"scikit-learn__scikit-learn-14894",
"scikit-learn__scikit-learn-14983",
"scikit-learn__scikit-learn-25747",
"sphinx-doc__sphinx-11445",
"sphinx-doc__sphinx-8595",
"sphinx-doc__sphinx-8721",
"sympy__sympy-12419",
"sympy__sympy-12481",
"sympy__sympy-13031",
"sympy__sympy-13480",
"sympy__sympy-13647",
"sympy__sympy-15345",
"sympy__sympy-16792",
"sympy__sympy-17139",
"sympy__sympy-17630",
"sympy__sympy-17655",
"sympy__sympy-18189",
"sympy__sympy-18199",
"sympy__sympy-18698",
"sympy__sympy-20154",
"sympy__sympy-20590",
"sympy__sympy-21379",
"sympy__sympy-21612",
"sympy__sympy-21847",
"sympy__sympy-22714",
"sympy__sympy-23262",
"sympy__sympy-24066",
"sympy__sympy-24213"
]
}
90 changes: 90 additions & 0 deletions datasets/lite_and_verified_solvable_dataset.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
{
"name": "lite_and_verified_solvable",
"description": "Instances that exist in both lite and verified datasets and have at least one solution",
"instance_ids": [
"astropy__astropy-12907",
"astropy__astropy-14182",
"astropy__astropy-14365",
"astropy__astropy-14995",
"django__django-10914",
"django__django-11099",
"django__django-11133",
"django__django-11179",
"django__django-11815",
"django__django-11848",
"django__django-11964",
"django__django-11999",
"django__django-12125",
"django__django-12308",
"django__django-12708",
"django__django-13028",
"django__django-13033",
"django__django-13158",
"django__django-13315",
"django__django-13401",
"django__django-13551",
"django__django-13590",
"django__django-13658",
"django__django-13925",
"django__django-13933",
"django__django-13964",
"django__django-14017",
"django__django-14155",
"django__django-14238",
"django__django-14534",
"django__django-14580",
"django__django-14608",
"django__django-14672",
"django__django-14752",
"django__django-14787",
"django__django-14855",
"django__django-14915",
"django__django-14999",
"django__django-15814",
"django__django-15851",
"django__django-16139",
"django__django-16255",
"django__django-16527",
"django__django-16595",
"django__django-17087",
"matplotlib__matplotlib-23299",
"matplotlib__matplotlib-23314",
"matplotlib__matplotlib-24149",
"matplotlib__matplotlib-24970",
"matplotlib__matplotlib-25311",
"matplotlib__matplotlib-25332",
"psf__requests-2317",
"pydata__xarray-4094",
"pylint-dev__pylint-7080",
"pytest-dev__pytest-7432",
"pytest-dev__pytest-7490",
"scikit-learn__scikit-learn-10297",
"scikit-learn__scikit-learn-13142",
"scikit-learn__scikit-learn-13439",
"scikit-learn__scikit-learn-13496",
"scikit-learn__scikit-learn-13779",
"scikit-learn__scikit-learn-14087",
"scikit-learn__scikit-learn-14894",
"scikit-learn__scikit-learn-14983",
"scikit-learn__scikit-learn-25747",
"sphinx-doc__sphinx-8595",
"sphinx-doc__sphinx-8721",
"sympy__sympy-12481",
"sympy__sympy-13031",
"sympy__sympy-13480",
"sympy__sympy-13647",
"sympy__sympy-15345",
"sympy__sympy-16792",
"sympy__sympy-17139",
"sympy__sympy-17655",
"sympy__sympy-18189",
"sympy__sympy-20154",
"sympy__sympy-20590",
"sympy__sympy-21379",
"sympy__sympy-21847",
"sympy__sympy-22714",
"sympy__sympy-23262",
"sympy__sympy-24066",
"sympy__sympy-24213"
]
}
Loading

0 comments on commit 1184309

Please sign in to comment.