text-splitters: fix state persistence issue in ExperimentalMarkdownSyntaxTextSplitter #28373
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This PR resolves an issue with the
ExperimentalMarkdownSyntaxTextSplitter
class, which retains the internal state across multiple calls to thesplit_text
method. This behaviour caused an unintended accumulation of chunks inself
variables, leading to incorrect outputs when processing multiple Markdown files sequentially.libs\text-splitters\langchain_text_splitters\markdown.py
to reset the relevant internal attributes at the start of eachsplit_text
invocation. This ensures each call processes the input independently.libs\text-splitters\tests\unit_tests\test_text_splitters.py
to verify the fix and ensure the state does not persist across calls.Issue:
Fixes #26440.
Dependencies:
No additional dependencies are introduced with this change.
Unit tests were added to verify the changes.
Updated documentation where necessary.
Ran
make format
,make lint
, andmake test
to ensure compliance with project standards.