Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation of recommended models and add deepseek #6048

Open
neubig opened this issue Jan 5, 2025 · 2 comments · May be fixed by #6050
Open

Update documentation of recommended models and add deepseek #6048

neubig opened this issue Jan 5, 2025 · 2 comments · May be fixed by #6050
Assignees
Labels
enhancement New feature or request fix-me Attempt to fix this issue with OpenHands

Comments

@neubig
Copy link
Contributor

neubig commented Jan 5, 2025

What problem or use case are you trying to solve?

We have good results with deepseek, and it's quickly becoming a community preferred model.

We should:

  1. Update the documentation regarding preferred models based on @xingyaoww 's benchmarking results:
Screenshot 2025-01-05 at 3 34 00 PM

This is the most recommended model:

  • anthropic/claude-3-5-sonnet-20241022 41.67%

The following models also can achieve acceptable performance

  • anthropic/claude-3-5-haiku-20241022 28.67%
  • deepseek/deepseek-chat 23.00%
  • gpt-4o 18.67%

We can also add a link to this spreadsheet (to replace the link to the older blog post that is currently in the doc).

  1. Make sure that all of these are added by default in the model selector in the frontend.
@neubig neubig added enhancement New feature or request fix-me Attempt to fix this issue with OpenHands labels Jan 5, 2025
@neubig neubig self-assigned this Jan 5, 2025
@openhands-agent
Copy link
Contributor

OpenHands started fixing the issue! You can monitor the progress here.

@openhands-agent
Copy link
Contributor

An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named 'openhands-fix-issue-6048' has been created with the attempted changes. You can view the branch here. Manual intervention may be required.

Additional details about the failure:
While the AI agent has addressed most of the requirements, there appears to be a gap in the implementation. The issue specifically mentioned "claude-3-5-sonnet-20241022" as the most recommended model (41.67% performance), but the AI's response only mentions adding "claude-3-5-haiku-20241022" to the verified models. The sonnet model, which is the top performer according to the benchmarking results, was not explicitly mentioned in the changes.

Additionally, while the AI mentioned adding "deepseek-chat" to the verified models, it didn't specifically confirm adding "gpt-4o" which was listed as one of the models with acceptable performance in the original requirements.

To fully resolve this issue, the changes should include:

  1. Adding claude-3-5-sonnet-20241022 (the top performer)
  2. Adding claude-3-5-haiku-20241022
  3. Adding deepseek-chat
  4. Adding gpt-4o
  5. Updating documentation with the benchmark results
  6. Updating the spreadsheet link

The PR should be revised to include all these models in the verified-models.ts configuration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fix-me Attempt to fix this issue with OpenHands
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants