Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cancel() method to interrupt a stream #733

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

simonchatts
Copy link

Fixes #599.

Thanks for all your work on this project!

@tk-master
Copy link
Contributor

please accept this pr @abetlen

@tk-master
Copy link
Contributor

Actually.. I found an issue with this method.. this will only cancel after a token is generated but if the llm is slow or gets stuck processing the prompt, this doesn't cancel it..

We need a better method.

@tk-master
Copy link
Contributor

I'm coming back to this because I need to figure out a better method to interrupt the generation programmatically..

For a console-based scenario it's pretty easy in python, all I have to do is surround the code with try except KeyboardInterrupt: .. then I can just press ctrl+c at any point to gracefully interrupt the llm..

But.. if I'm using a front-end user interface, I haven't managed to make it work properly let's say with a button "Stop generating" that can call a python function.. because of the issue I mentioned in the previous post..

@abetlen sorry to bother again but do you have any suggestions/ideas on how to accomplish this?

@abetlen abetlen force-pushed the main branch 2 times, most recently from 8c93cf8 to cc0fe43 Compare November 14, 2023 20:24
@woheller69
Copy link

Why not add it now and improve if there is a better solution. For now this would work in most cases.

@woheller69
Copy link

has anyone found a reasonable solution for this? Or am I the only one not willing to wait until the model finishes without killing the job and losing context?

@jewser
Copy link

jewser commented May 11, 2024

Any chance this gets merged for now?

@madprops
Copy link

It indeed blocks until the first token is produced, but cancelling it after that is trivial. The other similar issue is cancelling a model that is loading.

@woheller69
Copy link

gpt4all python bindings offer a similar way which allows stopping with the next token

@ekcrisp
Copy link

ekcrisp commented Nov 21, 2024

+1 can we merge this?

@kingbri1
Copy link

Take a look at ggerganov/llama.cpp#10509 which should permanently solve this problem on lcpp's side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dynamically intterupt token generation
7 participants