Skip to content
This repository has been archived by the owner on Jan 9, 2025. It is now read-only.

feat: add token count for each chunk #235

Merged
merged 4 commits into from
Jul 26, 2024
Merged

feat: add token count for each chunk #235

merged 4 commits into from
Jul 26, 2024

Conversation

chuang8511
Copy link
Contributor

Because

  • to build RAG, it is crucial to know the token count for each chunk
  • it is good to analyse the raw text with the token count

This commit

  • add token count for each chunk
  • differentiate two token count between sum up token counts in chunks and raw text token count

Copy link

linear bot commented Jul 23, 2024

@donch1989 donch1989 merged commit bb69104 into main Jul 26, 2024
8 checks passed
@donch1989 donch1989 deleted the chunhao/ins-5406 branch July 26, 2024 08:22
donch1989 pushed a commit that referenced this pull request Jul 31, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.24.0-beta](v0.23.0-beta...v0.24.0-beta)
(2024-07-31)


### Features

* add audio operator
([#236](#236))
([fe8abff](fe8abff))
* add handler to auto-fill missing default values
([#210](#210))
([dcad3f0](dcad3f0))
* add HubSpot component
([#199](#199))
([b3936a8](b3936a8))
* add Jira component
([#205](#205))
([51f3ed7](51f3ed7))
* add Ollama component
([#224](#224))
([810f850](810f850))
* add sql component
([#193](#193))
([9a373f3](9a373f3))
* add token count for each chunk
([#235](#235))
([bb69104](bb69104))
* add video operator to fulfil unstructured data process
([#238](#238))
([a1459d7](a1459d7))
* **document:** add docx doc pptx ppt html to transform to text in
markdown format
([#232](#232))
([2932db9](2932db9))
* **document:** move ConvertToText task from text operator to document
operator ([#248](#248))
([699ca70](699ca70))
* introduce event handler interface
([#253](#253))
([9599b42](9599b42))
* **restapi:** recategorize the restapi component as a generic component
([#249](#249))
([fbfc3a3](fbfc3a3))
* **website:** add scrape sitemap function
([#239](#239))
([8648326](8648326))


### Bug Fixes

* bug of duplicate document
([#256](#256))
([e028a6e](e028a6e))
* bug of json without setting array for images
([#259](#259))
([4aeae69](4aeae69))
* change md format to html tag for correct frontend link
([#240](#240))
([7e16b2b](7e16b2b))
* revert the alias because they are same as package name
([#243](#243))
([1d9c42d](1d9c42d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: 👋 Done
Development

Successfully merging this pull request may close these issues.

3 participants