-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add s3 file loader #599
Add s3 file loader #599
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@jasondotparse is attempting to deploy a commit to the LangChain Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few comments
langchain/src/document_loaders/s3.ts
Outdated
public async load() { | ||
const { S3Client, GetObjectCommand } = await S3LoaderImports(); | ||
|
||
const tempDir = this._fs.mkdtempSync( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd want to not use fs
and work with a Buffer in memory. This would enable using this outside of Node, eg in Edge environments
I'm going to open a PR very soon to make the Unstructured loader work with Buffers directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I just followed the logic for how the python package implemented the feature so that's why I did it this way. I wouldn't be opposed to coming back to this later and making the change to avoid using fs after the Unstructured class has been updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
This reverts commit 2954cd7.
Summary
Add a new loader type exactly like this one which already exists in the python repo.
Notes
unstructured
python package which we do not have at our disposal in node. Therefore, the best I could do was allow a user to pass in the url of an existing unstructured API endpoint and make use of that within the S3 loader. I'd be open to suggestions if someone thinks a better solution is possible, but for now this seemed like the optimal path forward.S3Loader
for unit testing purposes. After multiple days of trying to mock these imports for a unit test, I finally fell back on just using dependency injection and passing them into the S3Loader class constructor so they could easily be stubbed in the test file.