Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: tests - slow inference causes timeout on the CI #5715

Merged
merged 4 commits into from
Feb 25, 2024

Conversation

phymbert
Copy link
Collaborator

@phymbert phymbert commented Feb 25, 2024

Context

Since we fixed // issue on server.cpp, step the server is busy is faster and the server is idle timeout after 3 seconds on the CI:
https://github.com/ggerganov/llama.cpp/actions/runs/8038306408/job/21954067563#step:11:131

Fix

This fix increases the timeout to 10s instead of 3s before considering the inference failed.

@phymbert phymbert requested review from ngxson and ggerganov February 25, 2024 17:24
@phymbert
Copy link
Collaborator Author

@ggerganov My bad, I merged #5708 without waiting for CI tests 👎

@phymbert
Copy link
Collaborator Author

@ggerganov Can we remove this log in the llama.log, it's annoying:

LOG("sampled token: %5d: '%s'\n", id, llama_token_to_piece(ctx_main, id).c_str());

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this log in the llama.log, it's annoying:

Yes, just comment the line

@phymbert phymbert changed the title server: tests - longer inference timeout for CI server: tests - slow inference causes timeout on the CI Feb 25, 2024
@phymbert
Copy link
Collaborator Author

phymbert commented Feb 25, 2024

Fixed and tested here: https://github.com/phymbert/llama.cpp/actions/runs/8040928813/job/21959594752
I am merging it now since it blocks master and there is so much job queued on the project.

@ggerganov each llama.cpp push on PRs is now triggering 110 workflow jobs, we need hours before to have CI checks passed. Do you really need to test all the arch on the Server ci workflow ? at least I can remove GPU based BLAS builds ? as we cannot start the server anyway on ubuntu GitHub CPU based runners.

@phymbert phymbert merged commit e3965cf into master Feb 25, 2024
51 of 110 checks passed
@phymbert phymbert deleted the hotfix/server-test-increase-timeout-in-idle branch February 25, 2024 21:48
@ggerganov
Copy link
Owner

ggerganov commented Feb 26, 2024

@ggerganov each llama.cpp push on PRs is now triggering 110 workflow jobs, we need hours before to have CI checks passed. Do you really need to test all the arch on the Server ci workflow ? at least I can remove GPU based BLAS builds ? as we cannot start the server anyway on ubuntu GitHub CPU based runners.

We need just 4 builds:

  • cmake --config Debug -DLLAMA_SANITIZE_ADDRESS
  • cmake --config Debug -DLLAMA_SANITIZE_THREAD
  • cmake --config Debug -DLLAMA_SANITIZE_UNDEFINED
  • cmake --config Release

For now, we cannot test the GPU builds with Github CI. We can potentially add them to ggml-ci in the future

jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
* server: tests - longer inference timeout for CI
hodlen pushed a commit to hodlen/llama.cpp that referenced this pull request Apr 1, 2024
* server: tests - longer inference timeout for CI
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants