Tokenization with truncation, offset support and tests #47

fialhocoelho · 2024-06-14T23:48:15Z

Support for Truncation and Offset: The tokenize() function has been edited to include support for truncation and offset.
New Tests: Tests have been added to evaluate the tokenization output, focusing on truncation and offset functionality to ensure replicability.
Makefile Addition: A Makefile has been added to facilitate the generation of necessary files for gRPC import for vllm/tests/test_server_tokenize_truncate.py
Direct Use of tokenizer.encode_plus(): The tokenizer.encode_plus() function is used directly to avoid modifying the tokenizer_group.encode_async function, which is part of the upstream code/project. This ensures compatibility and maintains the integrity of the upstream codebase. This implementation won't affect performance because tokenizer_group.encode_async doesn't run Tokenizer.encode in a background thread.

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

maxdebayser · 2024-06-16T00:44:50Z

vllm/entrypoints/grpc/grpc_server.py

+            batch_encoding = self.tokenizer.encode_plus(
+                text=req.text,
+                return_offsets_mapping=request.return_offsets
+            )


@njhill, here the code is calling encode_plus directly because adding an async wrapper would require changing the vLLM code. But I don't think it has a performance impact because the actual encoding in encode_async not offloaded to a thread pool or anything:

async def encode_async( self, prompt: str, request_id: Optional[str] = None, lora_request: Optional[LoRARequest] = None) -> List[int]: tokenizer = await self.get_lora_tokenizer_async(lora_request) ret = tokenizer.encode(prompt) self._raise_if_input_too_long(ret, lora_request) return ret

joerunde · 2024-06-20T16:43:39Z

for 515

joerunde · 2024-06-25T16:56:44Z

@prashantgupta24 can you help coordinate getting this into the tgis adapter?

prashantgupta24 · 2024-06-26T18:28:18Z

vllm/entrypoints/grpc/grpc_server.py

+                tokenized results.
+        """
+        # Log the incoming tokenization request for metrics
+        service_metrics.observe_tokenization_request(request)


Suggested change

service_metrics.observe_tokenization_request(request)

service_metrics.count_tokenization_request(request)

I'm going to be making this change in the adapter repo

njhill · 2024-07-12T23:40:21Z

Closing since this was ported to the vllm-tgis-adapter repo. Thanks @fialhocoelho.

…U issue (#47) This PR contains some changes: - It enables the `only_last_token` option when calling FMS. This only copies the logits from the last token back in the first iteration, which leads to faster TTFT. Will need to check if this creates issues for some vLLM features like `prompt_logprobs` at a later stage. - There is a bug in torch 2.3.1 which gives incorrect results on CPU when using prompts of length > 512 tokens. If we set this `attn_algorithm=math` it works around this for this time being. - Change the mask format to what FMS is currently using (note I have note tested the right-padding part of the code, sine we are not using it) - Load the model with precision specified by vLLM user (rather than hard-coded). - Remove some unused stuff. - Refresh aiu_setup to latest version from aiu-fms --------- Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

fialhocoelho requested review from maxdebayser, joerunde and njhill June 14, 2024 23:48

fialhocoelho mentioned this pull request Jun 14, 2024

Add tokenization and truncation funcionality to Tokenize() function. #39

Closed

fialhocoelho changed the title ~~Tokenization with Truncation and Offset Support ad tests~~ Tokenization with truncation, offset support and tests Jun 14, 2024

fialhocoelho added 7 commits June 15, 2024 02:45

Added tests to evaluate tokenization with truncation and offset

b2b443b

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

Added Makefile for generating files necessary for gRPC import

5add976

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

Edited tokenize() function to support truncation and offset

0f2076e

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

remove unsed lib

c0bab62

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

Move test to correct folder

f979495

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

add test to correct folder and solve isort issues

c682057

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

script format.sh alterations

4fbef92

Signed-off-by: Jefferson Fialho <jfialho@ibm.com>

fialhocoelho force-pushed the jeff-token-truncate branch from 6a521af to 4fbef92 Compare June 15, 2024 05:46

fialhocoelho marked this pull request as ready for review June 15, 2024 05:53

maxdebayser reviewed Jun 16, 2024

View reviewed changes

prashantgupta24 mentioned this pull request Jun 26, 2024

✨ add tokenization with truncation, offset support (IBM #47) opendatahub-io/vllm-tgis-adapter#19

Merged

prashantgupta24 requested changes Jun 26, 2024

View reviewed changes

njhill closed this Jul 12, 2024

prashantgupta24 deleted the jeff-token-truncate branch July 12, 2024 23:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenization with truncation, offset support and tests #47

Tokenization with truncation, offset support and tests #47

fialhocoelho commented Jun 14, 2024

maxdebayser Jun 16, 2024

joerunde commented Jun 20, 2024 •

edited

Loading

joerunde commented Jun 25, 2024

prashantgupta24 Jun 26, 2024

prashantgupta24 Jun 26, 2024

njhill commented Jul 12, 2024

	service_metrics.observe_tokenization_request(request)
	service_metrics.count_tokenization_request(request)

Tokenization with truncation, offset support and tests #47

Tokenization with truncation, offset support and tests #47

Conversation

fialhocoelho commented Jun 14, 2024

maxdebayser Jun 16, 2024

Choose a reason for hiding this comment

joerunde commented Jun 20, 2024 • edited Loading

joerunde commented Jun 25, 2024

prashantgupta24 Jun 26, 2024

Choose a reason for hiding this comment

prashantgupta24 Jun 26, 2024

Choose a reason for hiding this comment

njhill commented Jul 12, 2024

joerunde commented Jun 20, 2024 •

edited

Loading