-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] LLM Integration Tests through pytest suite #2023
Conversation
The last run was https://github.com/deepjavalibrary/djl-serving/actions/runs/9325391053. It was still failing on the An example of a failing test would be https://github.com/deepjavalibrary/djl-serving/actions/runs/9324373604/job/25669550839. You can see how it shows a summary at the bottom and then the full output above. I have made a few CI changes that I haven't merged because the CI isn't stable. But, I think it may be better to go ahead and merge first and then we can do minor fixes afterwards to ensure we can actually make progress. So, I suggest merging this right after the release is done |
@zachgk I took a look at the change you made to tests. Currently it is not reporting which model failed and/or share any logs to mention what model is being tested. How could we easily find out such information with the current change? |
@lanking520 The first example isn't reporting anything because it seems like it is crashing the entire actions runner and aborting the job/pytest call. I believe it is failing on the first test because I think pytest would otherwise begin printing out a basic progress marker with As a comparison, take a look at the second example. This is what a failure is intended to look like |
ef3b0d7
to
7fccd76
Compare
It is updated to now upload logs for all tests. I also rebased it for updated tests. The latest run is https://github.com/deepjavalibrary/djl-serving/actions/runs/9473108790. After that run, I also had it start uploading logs for failing test jobs as well |
# Runs on g5.12xl | ||
def test_llama2_13b_tp4(self): | ||
with Runner('tensorrt-llm', 'llama2-13b') as r: | ||
prepare.build_trtllm_handler_model("llama2-13b") | ||
r.launch("CUDA_VISIBLE_DEVICES=0,1,2,3") | ||
client.run("trtllm llama2-13b".split()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I can better understand this section. We are adding env's to the tensorrt-llm container launch for CUDA device visibility, which makes sense when we limit the number of GPUs, but here we are attaching all GPUs -- is this necessary? If it is we may want to address the launch containers afterwards, unless there is something else that I am missing when we use trtllm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept this to maintain parity with the previous version, so I didn't spend a lot of time thinking about it. It may not be necessary but I think that should be a separate PR/discussion
r.launch() | ||
client.run("huggingface gpt-neo-2.7b".split()) | ||
class TestHfHandler: | ||
# Runs on g5.12xl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to actually specify the instance, or should we just note the max accelerator count (max TP degree) for the test class? Reason I ask is when we move up to g6, do we want to need to update these with every instance change per se but do want the comment to help us debug issues when we do change an instance.
# Runs on GPU - max TP=4
or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, this is just a comment to help keep track of things. The plan is to eventually change this into a pytest mark so we could do something like run all tests with mark gpu-4
. I'll fix this when I start adding the marks
This moves the llm_integration test suite from actions into the combined pytest runner. As future work, other action test suites can also be combined into the unified file.
This moves the llm_integration test suite from actions into the pytest runner. This means that it can be run locally and is somewhat more maintainable.
As future work, other action test suites can also be combined into the unified file. I plan to leave this action dedicated to spinning up many instances and running the entire suite. Later, I plan on adding a new action that will spin up a single instance and run a configurable part of the suite (using pytest classes or marks to specify which part of the suite to run)
@tosterberg