update docs

jparkerweb · Nov 6, 2024 · 1fd5ce6 · 1fd5ce6
1 parent fcbabfc
commit 1fd5ce6
Show file tree

Hide file tree

Showing 2 changed files with 26 additions and 24 deletions.
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # 🍱 semantic-chunking
 
-Semantically create chunks from large texts. Useful for workflows involving large language models (LLMs).
+NPM Package for Semantically creating chunks from large texts. Useful for workflows involving large language models (LLMs).
 
 ## Features
 
@@ -12,6 +12,16 @@ Semantically create chunks from large texts. Useful for workflows involving larg
 - Chunk prefix support for RAG workflows
 - Web UI for experimenting with settings
 
+## Semantic Chunking Workflow
+_how it works_
+
+1. **Sentence Splitting**: The input text is split into an array of sentences.
+2. **Embedding Generation**: A vector is created for each sentence using the specified ONNX model.
+3. **Similarity Calculation**: Cosine similarity scores are calculated for each sentence pair.
+4. **Chunk Formation**: Sentences are grouped into chunks based on the similarity threshold and max token size.
+5. **Chunk Rebalancing**: Optionally, similar adjacent chunks are combined into larger ones up to the max token size.
+6. **Output**: The final chunks are returned as an array of objects, each containing the properties described above.
+
 ## Installation
 
 ```bash
@@ -83,15 +93,6 @@ The output is an array of chunks, each containing the following properties:
 - `embedding`: Array - The embedding vector (if `returnEmbedding` is `true`).
 - `token_length`: Integer - The token length (if `returnTokenLength` is `true`).
 
-## Semantic Chunking Workflow
-
-1. **Sentence Splitting**: The input text is split into an array of sentences.
-2. **Embedding Generation**: A vector is created for each sentence using the specified ONNX model.
-3. **Similarity Calculation**: Cosine similarity scores are calculated for each sentence pair.
-4. **Chunk Formation**: Sentences are grouped into chunks based on the similarity threshold and max token size.
-5. **Chunk Rebalancing**: Optionally, similar adjacent chunks are combined into larger ones up to the max token size.
-6. **Output**: The final chunks are returned as an array of objects, each containing the properties described above.
-
 ## Examples
 
 Example 1: Basic usage with custom similarity threshold:
@@ -219,6 +220,7 @@ The behavior of the `chunkit` function can be finely tuned using several optiona
 | Xenova/all-MiniLM-L6-v2                      | true      | [https://huggingface.co/Xenova/all-MiniLM-L6-v2](https://huggingface.co/Xenova/all-MiniLM-L6-v2)                                           | 23 MB   |
 | Xenova/all-MiniLM-L6-v2                      | false     | [https://huggingface.co/Xenova/all-MiniLM-L6-v2](https://huggingface.co/Xenova/all-MiniLM-L6-v2)                                           | 90.4 MB |
 | Xenova/paraphrase-multilingual-MiniLM-L12-v2 | true      | [https://huggingface.co/Xenova/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/Xenova/paraphrase-multilingual-MiniLM-L12-v2) | 118 MB  |
+| thenlper/gte-base                            | false     | [https://huggingface.co/thenlper/gte-base](https://huggingface.co/thenlper/gte-base)                                                       | 436 MB  |
 | Xenova/all-distilroberta-v1                  | true      | [https://huggingface.co/Xenova/all-distilroberta-v1](https://huggingface.co/Xenova/all-distilroberta-v1)                                   | 82.1 MB |
 | Xenova/all-distilroberta-v1                  | false     | [https://huggingface.co/Xenova/all-distilroberta-v1](https://huggingface.co/Xenova/all-distilroberta-v1)                                   | 326 MB  |
 | BAAI/bge-base-en-v1.5                        | false     | [https://huggingface.co/BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5)                                               | 436 MB  |
@@ -241,8 +243,7 @@ The Semantic Chunking Web UI allows you to experiment with the chunking paramete
 - Example texts for testing
 - Dark mode interface
 
-
-
+![Semantic Chunking Web UI](./img/semantic-chunking_web-ui.gif)
 
 ---
 

diff --git a/webui/README.md b/webui/README.md
@@ -1,6 +1,6 @@
 # 🍱 Semantic Chunking Web UI
 
-A web-based interface for experimenting with and tuning Semantic Chunking settings. This tool provides a visual way to test and configure the `semantic-chunking` library's settings to get optimal results for your specific use case.
+A web-based interface for experimenting with and tuning Semantic Chunking settings. This tool provides a visual way to test and configure the `semantic-chunking` library's settings to get optimal results for your specific use case. Once you've found the best settings, you can generate code to implement them in your project.
 
 ## Features
 
@@ -13,6 +13,8 @@ A web-based interface for experimenting with and tuning Semantic Chunking settin
 - Example texts for testing
 - Dark mode interface
 
+![semantic-chunking_web-ui](../img/semantic-chunking_web-ui.gif)
+
 ## Getting Started
 
 ### Prerequisites
@@ -22,36 +24,30 @@ A web-based interface for experimenting with and tuning Semantic Chunking settin
 ### Installation
 
 1. Clone the repository: 
----bash
+```bash
 git clone https://github.com/jparkerweb/semantic-chunking.git
 ```
 
-
 2. Navigate to the webui directory:
----bash
+```bash
 cd semantic-chunking/webui
 ```
 
-
 3. Install dependencies:
----bash
+```bash
 npm install
 ```
 
 4. Start the server:
----bash
+```bash
 npm start
 ```
 
-
 5. Open your browser and visit:
----bash
+```bash
 http://localhost:3000
 ```
 
----
-
-
 ## Usage
 
 ### Basic Controls
@@ -104,3 +100,8 @@ The web UI is built with:
 ## License
 
 This project is licensed under the MIT License - see the LICENSE file for details.
+
+## Appreciation
+
+If you enjoy this package please consider sending me a tip to support my work 😀
+# [🍵 tip me here](https://ko-fi.com/jparkerweb)