docs/how_to/extraction_long_text/ #27487
Replies: 4 comments
-
Pydantic schema method from this example has issues, described on GitHub #24225 |
Beta Was this translation helpful? Give feedback.
-
How to specify enum values for a Pydantic schema field? The following pseudo-code leads to class TestAttribute(BaseModel):
type: Literal["attribute1", "attribute2", "attribute3", "attribute4"]
class TestExtract(BaseModel):
test_attributes: list[TestAttribute] |
Beta Was this translation helpful? Give feedback.
-
Great tutorial. Thanks! There could be a combined strategy where brute force is applied, but if required, relevant context from other areas are added. For example in PDFs where some section asks to refer to another section. Also, other splitting methods could also be referenced in the tutorial. |
Beta Was this translation helpful? Give feedback.
-
If this happens: UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2 in position 36460: illegal multibyte sequence Try this: loader = BSHTMLLoader("car.html", open_encoding="utf-8") |
Beta Was this translation helpful? Give feedback.
-
docs/how_to/extraction_long_text/
When working with files, like PDFs, you're likely to encounter text that exceeds your language model's context window. To process this text, consider these strategies:
https://python.langchain.com/docs/how_to/extraction_long_text/
Beta Was this translation helpful? Give feedback.
All reactions