Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support chunking strategy in file_search tool in openai_dart #496

Merged
merged 1 commit into from
Jul 20, 2024

Conversation

davidmigloz
Copy link
Owner

@davidmigloz davidmigloz commented Jul 20, 2024

By default, max_chunk_size_tokens is set to 800 and chunk_overlap_tokens is set to 400, meaning every file is indexed by being split up into 800-token chunks, with 400-token overlap between consecutive chunks.

You can adjust this by setting chunking_strategy when adding files to the vector store. There are certain limitations to chunking_strategy:

  • max_chunk_size_tokens must be between 100 and 4096 inclusive.
  • chunk_overlap_tokens must be non-negative and should not exceed max_chunk_size_tokens / 2.

Ref: https://platform.openai.com/docs/assistants/tools/file-search

@davidmigloz davidmigloz self-assigned this Jul 20, 2024
@davidmigloz davidmigloz added t:enhancement New feature or request p:openai_dart openai_dart package. labels Jul 20, 2024
@davidmigloz davidmigloz added this to the v0.8.0 milestone Jul 20, 2024
@davidmigloz davidmigloz merged commit cfa974a into main Jul 20, 2024
1 check passed
@davidmigloz davidmigloz deleted the chunking-strategy branch July 20, 2024 09:00
KennethKnudsen97 pushed a commit to KennethKnudsen97/langchain_dart that referenced this pull request Oct 1, 2024
KennethKnudsen97 pushed a commit to KennethKnudsen97/langchain_dart that referenced this pull request Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p:openai_dart openai_dart package. t:enhancement New feature or request
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant