[CI] LLM Integration Tests through pytest suite #2023

zachgk · 2024-06-04T22:28:58Z

This moves the llm_integration test suite from actions into the pytest runner. This means that it can be run locally and is somewhat more maintainable.

As future work, other action test suites can also be combined into the unified file. I plan to leave this action dedicated to spinning up many instances and running the entire suite. Later, I plan on adding a new action that will spin up a single instance and run a configurable part of the suite (using pytest classes or marks to specify which part of the suite to run)

@tosterberg

zachgk · 2024-06-04T23:32:44Z

The last run was https://github.com/deepjavalibrary/djl-serving/actions/runs/9325391053. It was still failing on the TestTrtLlmHandler2 due to what I believe is OOM (github actions doesn't display it clearly), but passed every other test

An example of a failing test would be https://github.com/deepjavalibrary/djl-serving/actions/runs/9324373604/job/25669550839. You can see how it shows a summary at the bottom and then the full output above.

I have made a few CI changes that I haven't merged because the CI isn't stable. But, I think it may be better to go ahead and merge first and then we can do minor fixes afterwards to ensure we can actually make progress. So, I suggest merging this right after the release is done

lanking520 · 2024-06-05T01:05:08Z

@zachgk I took a look at the change you made to tests. Currently it is not reporting which model failed and/or share any logs to mention what model is being tested. How could we easily find out such information with the current change?

zachgk · 2024-06-05T18:27:31Z

@lanking520 The first example isn't reporting anything because it seems like it is crashing the entire actions runner and aborting the job/pytest call. I believe it is failing on the first test because I think pytest would otherwise begin printing out a basic progress marker with . to indicate successful tests and F for failed ones. But it needs some further investigation

As a comparison, take a look at the second example. This is what a failure is intended to look like

zachgk · 2024-06-11T23:09:40Z

It is updated to now upload logs for all tests. I also rebased it for updated tests. The latest run is https://github.com/deepjavalibrary/djl-serving/actions/runs/9473108790. After that run, I also had it start uploading logs for failing test jobs as well

tosterberg · 2024-06-17T21:35:02Z

tests/integration/tests.py

+    # Runs on g5.12xl
+    def test_llama2_13b_tp4(self):
+        with Runner('tensorrt-llm', 'llama2-13b') as r:
+            prepare.build_trtllm_handler_model("llama2-13b")
+            r.launch("CUDA_VISIBLE_DEVICES=0,1,2,3")
+            client.run("trtllm llama2-13b".split())


So I can better understand this section. We are adding env's to the tensorrt-llm container launch for CUDA device visibility, which makes sense when we limit the number of GPUs, but here we are attaching all GPUs -- is this necessary? If it is we may want to address the launch containers afterwards, unless there is something else that I am missing when we use trtllm.

I kept this to maintain parity with the previous version, so I didn't spend a lot of time thinking about it. It may not be necessary but I think that should be a separate PR/discussion

tosterberg · 2024-06-17T21:37:47Z

tests/integration/tests.py

-        r.launch()
-        client.run("huggingface gpt-neo-2.7b".split())
+class TestHfHandler:
+    # Runs on g5.12xl


Do we want to actually specify the instance, or should we just note the max accelerator count (max TP degree) for the test class? Reason I ask is when we move up to g6, do we want to need to update these with every instance change per se but do want the comment to help us debug issues when we do change an instance.

# Runs on GPU - max TP=4 or something like that

For now, this is just a comment to help keep track of things. The plan is to eventually change this into a pytest mark so we could do something like run all tests with mark gpu-4. I'll fix this when I start adding the marks

This moves the llm_integration test suite from actions into the combined pytest runner. As future work, other action test suites can also be combined into the unified file.

zachgk requested review from frankfliu and a team as code owners June 4, 2024 22:28

zachgk force-pushed the tests-1 branch 2 times, most recently from ef3b0d7 to 7fccd76 Compare June 11, 2024 23:08

tosterberg approved these changes Jun 17, 2024

View reviewed changes

[CI] LLM Integration Tests through pytest suite

dd076bc

This moves the llm_integration test suite from actions into the combined pytest runner. As future work, other action test suites can also be combined into the unified file.

zachgk force-pushed the tests-1 branch from 7fccd76 to dd076bc Compare June 18, 2024 20:59

zachgk merged commit 13fd025 into master Jun 18, 2024
9 checks passed

zachgk deleted the tests-1 branch June 18, 2024 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] LLM Integration Tests through pytest suite #2023

[CI] LLM Integration Tests through pytest suite #2023

zachgk commented Jun 4, 2024

zachgk commented Jun 4, 2024

lanking520 commented Jun 5, 2024

zachgk commented Jun 5, 2024

zachgk commented Jun 11, 2024

tosterberg Jun 17, 2024

zachgk Jun 18, 2024

tosterberg Jun 17, 2024

zachgk Jun 18, 2024

[CI] LLM Integration Tests through pytest suite #2023

[CI] LLM Integration Tests through pytest suite #2023

Conversation

zachgk commented Jun 4, 2024

zachgk commented Jun 4, 2024

lanking520 commented Jun 5, 2024

zachgk commented Jun 5, 2024

zachgk commented Jun 11, 2024

tosterberg Jun 17, 2024

Choose a reason for hiding this comment

zachgk Jun 18, 2024

Choose a reason for hiding this comment

tosterberg Jun 17, 2024

Choose a reason for hiding this comment

zachgk Jun 18, 2024

Choose a reason for hiding this comment