[Feature Request] Add a token cutoff #94

Reledia · 2024-04-10T21:18:14Z

Hello. Sometimes, some pages I save contains essays or stories that can span aboce 20k+ tokens. This does not only used much more credits/money when doing a request, but it consumes a lot of time (from 4s to >50s) and the prompt get lost in the way.

Would it be possible to add an env variable so we can set the max number of tokens to send in a single request?

MohamedBassem · 2024-04-11T05:33:23Z

Hi, there's already a hardcoded token cutoff of 1500 tokens (to fit in the context window of some models). But making this configurable makes a ton of sense :) Should be easy to implement and include in the next release. Thanks for the suggestion!

Reledia · 2024-04-11T11:00:31Z

I'm happy to hear this was already implemented, but I observe a different thing in my logs

My version of hoarder should be the latest available

MohamedBassem · 2024-04-11T11:11:42Z

Oh wow, that's a lot of tokens. Hmmm, I have a guess. I'm currently approximating tokens to words. So technically speaking I'm truncating the content to 1500 words not tokens. Maybe that's where the difference is coming from? Do you have a link that you are willing to share for me to debug this? Thanks!

Reledia · 2024-04-11T11:15:29Z

I don't wanna share the actual link because it's very explicit, but it was a chapter of a fanfiction from AO3, so very heavy on words

MohamedBassem · 2024-04-15T21:51:07Z

Turned out, I had a bug in the content truncating logic. Sending a fix now.

Reledia · 2024-04-15T22:31:29Z

By looking at the commit, was the truncate function sending words only after the 1500th one?

MohamedBassem · 2024-04-15T22:32:52Z

Yeah, noob mistake :)

Reledia · 2024-04-15T22:34:23Z

Dont worry, it happens to everyone :)
But maybe in the next release it would be better to inform users to re-process all previous bookmarks?

MohamedBassem · 2024-04-15T22:36:56Z

The bug only affects bookmarks with content larger than 1500 though, so it's not all bad. I'll mention it in the release note and let people decide whether they want to re-process or not (given that re-processing can be expensive for those using openai for example).

Reledia · 2024-04-15T22:37:55Z

Seems like the best solution. Thanks for the time

MohamedBassem · 2024-04-15T23:10:11Z

Hotfix 0.12.1 released with the fix. Thanks for the report!

MohamedBassem added feature request New feature or request good first issue Good for newcomers labels Apr 11, 2024

MohamedBassem closed this as completed in 4629dac Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add a token cutoff #94

[Feature Request] Add a token cutoff #94

Reledia commented Apr 10, 2024

MohamedBassem commented Apr 11, 2024

Reledia commented Apr 11, 2024

MohamedBassem commented Apr 11, 2024

Reledia commented Apr 11, 2024

MohamedBassem commented Apr 15, 2024

Reledia commented Apr 15, 2024

MohamedBassem commented Apr 15, 2024

Reledia commented Apr 15, 2024

MohamedBassem commented Apr 15, 2024

Reledia commented Apr 15, 2024

MohamedBassem commented Apr 15, 2024

[Feature Request] Add a token cutoff #94

[Feature Request] Add a token cutoff #94

Comments

Reledia commented Apr 10, 2024

MohamedBassem commented Apr 11, 2024

Reledia commented Apr 11, 2024

MohamedBassem commented Apr 11, 2024

Reledia commented Apr 11, 2024

MohamedBassem commented Apr 15, 2024

Reledia commented Apr 15, 2024

MohamedBassem commented Apr 15, 2024

Reledia commented Apr 15, 2024

MohamedBassem commented Apr 15, 2024

Reledia commented Apr 15, 2024

MohamedBassem commented Apr 15, 2024