Skip to content
This repository has been archived by the owner on Jan 9, 2025. It is now read-only.

feat(document): improve document operator #287

Merged
merged 5 commits into from
Aug 15, 2024

Conversation

chuang8511
Copy link
Contributor

@chuang8511 chuang8511 commented Aug 15, 2024

Because

  • For ins-5826

    • we want the filename can be passed to the artifact component or other components in the future
  • For ins-5821

    • we want to convert PDF to images
  • For ins-5823

    • we want to convert xlsx to markdown

This commit

  • For ins-5826

    • add filename (optional) for the input of
      • Convert To Markdown
      • Convert To Text
    • add filename for the output of Convert To Markdown
      • .md if the input.filename is set
    • add filename for the output of Convert To Text
      • .txt if the input.filename is set
  • For ins-5821

    • It converts pdf to images with filenames
  • For ins-5823

    • It converts xlsx to markdown to several tables for sheets
  • other

    • take out jpg to text from Convert to text because of unknown bug

Copy link

linear bot commented Aug 15, 2024

@chuang8511 chuang8511 marked this pull request as draft August 15, 2024 13:29
@chuang8511 chuang8511 changed the title feat: improve document operator feat(document): improve document operator Aug 15, 2024
@chuang8511 chuang8511 marked this pull request as ready for review August 15, 2024 15:33
@chuang8511 chuang8511 merged commit c0d8d31 into main Aug 15, 2024
8 checks passed
@chuang8511 chuang8511 deleted the chunhao/ins-5821-demo-for-document-operator branch August 15, 2024 16:05
donch1989 pushed a commit that referenced this pull request Aug 29, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.26.0-beta](v0.25.0-beta...v0.26.0-beta)
(2024-08-29)


### Features

* add milvus component
([#299](#299))
([a48c211](a48c211))
* add zilliz component
([#297](#297))
([92726ef](92726ef))
* **artifact:** improve artifact component
([#289](#289))
([44ea196](44ea196))
* **document:** improve document operator
([#287](#287))
([c0d8d31](c0d8d31))
* **hubspot:** add 4 tasks and modify Retrieve Association task and Get
Thread task ([#265](#265))
([62903ec](62903ec))
* introduce interfaces InputReader and OutputWriter
([#294](#294))
([e26ecef](e26ecef))
* **jira:** add action tasks
([#241](#241))
([e756e31](e756e31))
* make the API key be optional for Instill-Credit-supported component
([#305](#305))
([0f9a7b2](0f9a7b2))
* **openai:** revert go-openai and add support for streaming
([#301](#301))
([aa605fa](aa605fa))
* **openai:** use go-openai client
([#295](#295))
([aa20a16](aa20a16))
* **sql:** add ssl/tls input as base64 encoded and move engine to setup
([#282](#282))
([390e2b8](390e2b8))
* use error type for component definition not found error
([#302](#302))
([cfcee78](cfcee78))
* **web:** improve web operator
([#292](#292))
([1da84af](1da84af))


### Bug Fixes

* **document:** catch the error if there is no data in sheet
([#296](#296))
([21bebbd](21bebbd))
* **hubspot:** fix test code
([#298](#298))
([98c4261](98c4261))
* **text:** fix chunk position bugs
([#307](#307))
([cfc9076](cfc9076))
* **text:** fix the bug if there are 2 exact same chunks
([#308](#308))
([b58909f](b58909f))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: 👋 Done
Development

Successfully merging this pull request may close these issues.

2 participants