-
Notifications
You must be signed in to change notification settings - Fork 731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add docs for Offline Engine token-in token-out #2968
Labels
documentation
Improvements or additions to documentation
good first issue
Good for newcomers
RLHF
Using SGLang for post training
Comments
zhaochenyang20
added
documentation
Improvements or additions to documentation
good first issue
Good for newcomers
RLHF
Using SGLang for post training
labels
Jan 18, 2025
Also, a bug report: class SGLangLLMRayActor:
def __init__(self, *args, **kwargs):
# Some of the parameters leads to error in token-in-token-out mode
self.llm = sglang.Engine(
model_path=args[0],
trust_remote_code=kwargs.get("trust_remote_code", True),
dtype=kwargs.get("dtype", "auto"),
tp_size=kwargs.get("tensor_parallel_size", 1),
device="cuda",
random_seed=kwargs.get("seed", 42),
# disable_radix_cache=not kwargs.get("enable_prefix_caching", False),
# disable_cuda_graph=not kwargs.get("enforce_eager", False),
# disable_cuda_graph_padding=not kwargs.get("enable_prefix_caching", False),
# context_length=kwargs.get("max_model_len", None),
log_level="info",
skip_tokenizer_init=True,
)
def generate(self, sampling_params, prompt_token_ids, stop_token_ids):
# min_tokens, include_stop_str_in_output is not used in sglang
sampling_params = dict(
max_new_tokens=sampling_params.get("max_tokens", 1024),
top_p=sampling_params.get("top_p", 1),
top_k=sampling_params.get("top_k", 50),
temperature=sampling_params.get("temperature", 1),
repetition_penalty=sampling_params.get("repetition_penalty", 1),
skip_special_tokens=sampling_params.get("skip_special_tokens", False),
stop_token_ids=stop_token_ids,
)
outputs = self.llm.generate(input_ids=prompt_token_ids, sampling_params=sampling_params)
return outputs In this part, if we uncomment all the parameters, the engine would be wrong like:
Here is a code snift for debugging:
|
@shuaills Thanks for help! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
documentation
Improvements or additions to documentation
good first issue
Good for newcomers
RLHF
Using SGLang for post training
Checklist
Motivation
We have token-in-token-out pipeline in Sever already. But we need it for engine also.
Also, I added
skip_special_tokens=False
, but there is still noeos
at the end.Related resources
No response
The text was updated successfully, but these errors were encountered: