Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter #28373

Merged
merged 7 commits into from
Dec 18, 2024

Conversation

chkaty
Copy link
Contributor

@chkaty chkaty commented Nov 27, 2024

  • Description:
    This PR resolves an issue with the ExperimentalMarkdownSyntaxTextSplitter class, which retains the internal state across multiple calls to the split_text method. This behaviour caused an unintended accumulation of chunks in self variables, leading to incorrect outputs when processing multiple Markdown files sequentially.

    • Modified libs\text-splitters\langchain_text_splitters\markdown.py to reset the relevant internal attributes at the start of each split_text invocation. This ensures each call processes the input independently.
    • Added unit tests in libs\text-splitters\tests\unit_tests\test_text_splitters.py to verify the fix and ensure the state does not persist across calls.
  • Issue:
    Fixes #26440.

  • Dependencies:
    No additional dependencies are introduced with this change.

  • Unit tests were added to verify the changes.

  • Updated documentation where necessary.

  • Ran make format, make lint, and make test to ensure compliance with project standards.

Copy link

vercel bot commented Nov 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Dec 18, 2024 8:25pm

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. Ɑ: text splitters Related to text splitters package 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Nov 27, 2024
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Dec 18, 2024
# Conflicts:
#	libs/text-splitters/langchain_text_splitters/markdown.py
@ccurme ccurme enabled auto-merge (squash) December 18, 2024 20:26
@ccurme ccurme merged commit 3256b5d into langchain-ai:master Dec 18, 2024
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature lgtm PR looks good. Use to confirm that a PR is ready for merging. size:L This PR changes 100-499 lines, ignoring generated files. Ɑ: text splitters Related to text splitters package
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

ExperimentalMarkdownSyntaxTextSplitter mixes text between split_text calls
4 participants