Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] LLM Integration Tests through pytest suite #2023

Merged
merged 1 commit into from
Jun 18, 2024
Merged

[CI] LLM Integration Tests through pytest suite #2023

merged 1 commit into from
Jun 18, 2024

Conversation

zachgk
Copy link
Contributor

@zachgk zachgk commented Jun 4, 2024

This moves the llm_integration test suite from actions into the pytest runner. This means that it can be run locally and is somewhat more maintainable.

As future work, other action test suites can also be combined into the unified file. I plan to leave this action dedicated to spinning up many instances and running the entire suite. Later, I plan on adding a new action that will spin up a single instance and run a configurable part of the suite (using pytest classes or marks to specify which part of the suite to run)

@tosterberg

@zachgk zachgk requested review from frankfliu and a team as code owners June 4, 2024 22:28
@zachgk
Copy link
Contributor Author

zachgk commented Jun 4, 2024

The last run was https://github.com/deepjavalibrary/djl-serving/actions/runs/9325391053. It was still failing on the TestTrtLlmHandler2 due to what I believe is OOM (github actions doesn't display it clearly), but passed every other test

An example of a failing test would be https://github.com/deepjavalibrary/djl-serving/actions/runs/9324373604/job/25669550839. You can see how it shows a summary at the bottom and then the full output above.

I have made a few CI changes that I haven't merged because the CI isn't stable. But, I think it may be better to go ahead and merge first and then we can do minor fixes afterwards to ensure we can actually make progress. So, I suggest merging this right after the release is done

@lanking520
Copy link
Contributor

@zachgk I took a look at the change you made to tests. Currently it is not reporting which model failed and/or share any logs to mention what model is being tested. How could we easily find out such information with the current change?

@zachgk
Copy link
Contributor Author

zachgk commented Jun 5, 2024

@lanking520 The first example isn't reporting anything because it seems like it is crashing the entire actions runner and aborting the job/pytest call. I believe it is failing on the first test because I think pytest would otherwise begin printing out a basic progress marker with . to indicate successful tests and F for failed ones. But it needs some further investigation

As a comparison, take a look at the second example. This is what a failure is intended to look like

@zachgk zachgk force-pushed the tests-1 branch 2 times, most recently from ef3b0d7 to 7fccd76 Compare June 11, 2024 23:08
@zachgk
Copy link
Contributor Author

zachgk commented Jun 11, 2024

It is updated to now upload logs for all tests. I also rebased it for updated tests. The latest run is https://github.com/deepjavalibrary/djl-serving/actions/runs/9473108790. After that run, I also had it start uploading logs for failing test jobs as well

Comment on lines +100 to +111
# Runs on g5.12xl
def test_llama2_13b_tp4(self):
with Runner('tensorrt-llm', 'llama2-13b') as r:
prepare.build_trtllm_handler_model("llama2-13b")
r.launch("CUDA_VISIBLE_DEVICES=0,1,2,3")
client.run("trtllm llama2-13b".split())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can better understand this section. We are adding env's to the tensorrt-llm container launch for CUDA device visibility, which makes sense when we limit the number of GPUs, but here we are attaching all GPUs -- is this necessary? If it is we may want to address the launch containers afterwards, unless there is something else that I am missing when we use trtllm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept this to maintain parity with the previous version, so I didn't spend a lot of time thinking about it. It may not be necessary but I think that should be a separate PR/discussion

r.launch()
client.run("huggingface gpt-neo-2.7b".split())
class TestHfHandler:
# Runs on g5.12xl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to actually specify the instance, or should we just note the max accelerator count (max TP degree) for the test class? Reason I ask is when we move up to g6, do we want to need to update these with every instance change per se but do want the comment to help us debug issues when we do change an instance.

# Runs on GPU - max TP=4 or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, this is just a comment to help keep track of things. The plan is to eventually change this into a pytest mark so we could do something like run all tests with mark gpu-4. I'll fix this when I start adding the marks

This moves the llm_integration test suite from actions into the combined pytest
runner. As future work, other action test suites can also be combined into the
unified file.
@zachgk zachgk merged commit 13fd025 into master Jun 18, 2024
9 checks passed
@zachgk zachgk deleted the tests-1 branch June 18, 2024 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants