Update documentation of recommended models and add deepseek #6048

neubig · 2025-01-05T06:38:51Z

What problem or use case are you trying to solve?

We have good results with deepseek, and it's quickly becoming a community preferred model.

We should:

Update the documentation regarding preferred models based on @xingyaoww 's benchmarking results:

This is the most recommended model:

anthropic/claude-3-5-sonnet-20241022 41.67%

The following models also can achieve acceptable performance

anthropic/claude-3-5-haiku-20241022 28.67%
deepseek/deepseek-chat 23.00%
gpt-4o 18.67%

We can also add a link to this spreadsheet (to replace the link to the older blog post that is currently in the doc).

Make sure that all of these are added by default in the model selector in the frontend.

openhands-agent · 2025-01-05T06:39:14Z

OpenHands started fixing the issue! You can monitor the progress here.

openhands-agent · 2025-01-05T06:42:16Z

An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named 'openhands-fix-issue-6048' has been created with the attempted changes. You can view the branch here. Manual intervention may be required.

Additional details about the failure:
While the AI agent has addressed most of the requirements, there appears to be a gap in the implementation. The issue specifically mentioned "claude-3-5-sonnet-20241022" as the most recommended model (41.67% performance), but the AI's response only mentions adding "claude-3-5-haiku-20241022" to the verified models. The sonnet model, which is the top performer according to the benchmarking results, was not explicitly mentioned in the changes.

Additionally, while the AI mentioned adding "deepseek-chat" to the verified models, it didn't specifically confirm adding "gpt-4o" which was listed as one of the models with acceptable performance in the original requirements.

To fully resolve this issue, the changes should include:

Adding claude-3-5-sonnet-20241022 (the top performer)
Adding claude-3-5-haiku-20241022
Adding deepseek-chat
Adding gpt-4o
Updating documentation with the benchmark results
Updating the spreadsheet link

The PR should be revised to include all these models in the verified-models.ts configuration.

neubig added enhancement New feature or request fix-me Attempt to fix this issue with OpenHands labels Jan 5, 2025

neubig self-assigned this Jan 5, 2025

neubig linked a pull request Jan 5, 2025 that will close this issue

Fix issue #6048: Update documentation of recommended models and add deepseek #6050

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update documentation of recommended models and add deepseek #6048

Update documentation of recommended models and add deepseek #6048

neubig commented Jan 5, 2025

openhands-agent commented Jan 5, 2025

openhands-agent commented Jan 5, 2025

Update documentation of recommended models and add deepseek #6048

Update documentation of recommended models and add deepseek #6048

Comments

neubig commented Jan 5, 2025

openhands-agent commented Jan 5, 2025

openhands-agent commented Jan 5, 2025