Fixed up readme

rmusser01 · May 8, 2024 · 8512db7 · 8512db7
1 parent 1efdf0d
commit 8512db7
Show file tree

Hide file tree

Showing 3 changed files with 128 additions and 64 deletions.
diff --git a/README.md b/README.md
@@ -16,14 +16,41 @@ Original: `YouTube contains an incredible amount of knowledge, much of which is
 * **Download Audio+Video from a list of videos in a text file (can be file paths or URLs) and have them all summarized:**
   * `python diarize.py ./local/file_on_your/system --api_name <API_name>`
 
-### What?
+### Table of Contents
+- [What?](#what)
+- [Using](#using)
+- [Setup](#setup)
+- [Pieces/What's in the Repo](#what)
+- [Setting up a Local LLM Inference Engine](#localllm)
+- [Credits](#credits)
+
+### <a name="what"></a>What?
 - Use the script to transcribe a local file or remote url. 
   * Any url youtube-dl supports _should_ work.
   * If you pass an API name (anthropic/cohere/grok/openai/) as a second argument, and add your API key to the config file, you can have your resulting transcriptions summarized as well. 
     * Alternatively, you can pass `llama`/`ooba`/`kobold`/`tabby` and have the script perform a request to your local API endpoint for summarization. You will need to modify the `llama_api_IP` value in the `config.txt` to reflect the `IP:Port` of your local server.
     * Or pass the `--api_url` argument with the `IP:Port` to avoid making changes to the `config.txt` file.
     * If the self-hosted server requires an API key, modify the appropriate api_key variable in the `config.txt` file.
   * The current approach to summarization is currently 'dumb'/naive, and will likely be replaced or additional functionality added to reflect actual practices and not just 'dump txt in and get an answer' approach. This works for big context LLMs, but not everyone has access to them, and some transcriptions may be even longer, so we need to have an approach that can handle those cases.
+- **APIs Supported**
+  1. Anthropic
+  2. Cohere
+  3. Groq
+  4. Llama.cpp
+  5. Kobold.cpp
+  5. TabbyAPI
+  6. OpenAI
+  7. Oobabooga
+
+
+### <a name="using"></a>Using
+- Single file (remote URL) transcription
+  * Single URL: `python diarize.py https://example.com/video.mp4`
+- Single file (local) transcription)
+  * Transcribe a local file: `python diarize.py /path/to/your/localfile.mp4`
+- Multiple files (local & remote)
+  * List of Files(can be URLs and local files mixed): `python diarize.py ./path/to/your/text_file.txt"`
+
 
 Save time and use the `config.txt` file, it allows you to set these settings and have them used when ran.
 ```
@@ -54,51 +81,19 @@ options:
                         Log level (default: INFO)
 
 >python diarize.py ./local/file_on_your/system --api_name anthropic
+-
 >python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name anthropic
+-
 >python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name openai 
+-
 >python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name anthropic --api_key lolyearight
+-
 >python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name openai --api_key lolyearight
 
 By default videos, transcriptions and summaries are stored in a folder with the video's name under './Results', unless otherwise specified in the config file.
 ```
 
-
-### Pieces
-- **Workflow**
-  1. Setup python + packages
-  2. Setup ffmpeg
-  3. Run `python diarize.py <video_url>` or `python diarize.py <List_of_videos.txt>`
-  4. If you want summarization, add your API keys (if not using a local LLM) to the `config.txt` file, and then re-run the script, passing in the name of the API [or URL endpoint - to be added] to the script.
-    * `python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name anthropic` - This will attempt to download the video, then upload the resulting json file to the anthropic API endpoint, referring to values set in the config file (API key and model) to request summarization.
-    - Anthropic:
-      * Opus: `claude-3-opus-20240229`
-      * Sonnet: `claude-3-sonnet-20240229`
-      * Haiku: `claude-3-haiku-20240307`
-    - Cohere: 
-      * `command-r`
-      * `command-r-plus`
-    - OpenAI:
-      * `gpt-4-turbo`
-      * `gpt-4-turbo-preview`
-      * `gpt-4`
-
-
-### What's in the repo?
-- `diarize.py` - download, transcribe and diarize audio
-  1. First uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to download audio(optionally video) from supplied URL
-  2. Next, it uses [ffmpeg](https://github.com/FFmpeg/FFmpeg) to convert the resulting `.m4a` file to `.wav`
-  3. Then it uses [faster_whisper](https://github.com/SYSTRAN/faster-whisper) to transcribe the `.wav` file to `.txt`
-  4. After that, it uses [pyannote](https://github.com/pyannote/pyannote-audio) to perform 'diarorization'
-  5. Finally, it'll send the resulting txt to an LLM endpoint of your choice for summarization of the text.
-    * Goal is to support OpenAI/Claude/Cohere/Groq/local OpenAI endpoint (oobabooga/llama.cpp/exllama2) so you can either do a batch query to X endpoint, or just feed them one at a time. Your choice.
-- `chunker.py` - break text into parts and prepare each part for LLM summarization
-- `roller-*.py` - rolling summarization
-  - [can-ai-code](https://github.com/the-crypt-keeper/can-ai-code) - interview executors to run LLM inference
-- `compare.py` - prepare LLM outputs for webapp
-- `compare-app.py` - summary viewer webapp
-
-
-### Setup
+### <a name="setup"></a>Setup
 - **Linux**
     1. Download necessary packages (Python3, ffmpeg[sudo apt install ffmpeg / dnf install ffmpeg], ?)
     2. Create a virtual env: `python -m venv ./`
@@ -121,6 +116,41 @@ By default videos, transcriptions and summaries are stored in a folder with the
       * FIXME
     8. For feeding the transcriptions to the API of your choice, simply use the corresponding script for your API provider.
       * FIXME: add scripts for OpenAI api (generic) and others
+
+
+### <a name="pieces"></a>Pieces & What's in the repo?
+- **Workflow**
+  1. Setup python + packages
+  2. Setup ffmpeg
+  3. Run `python diarize.py <video_url>` or `python diarize.py <List_of_videos.txt>`
+  4. If you want summarization, add your API keys (if not using a local LLM) to the `config.txt` file, and then re-run the script, passing in the name of the API [or URL endpoint - to be added] to the script.
+    * `python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name anthropic` - This will attempt to download the video, then upload the resulting json file to the anthropic API endpoint, referring to values set in the config file (API key and model) to request summarization.
+    - Anthropic:
+      * Opus: `claude-3-opus-20240229`
+      * Sonnet: `claude-3-sonnet-20240229`
+      * Haiku: `claude-3-haiku-20240307`
+    - Cohere: 
+      * `command-r`
+      * `command-r-plus`
+    - OpenAI:
+      * `gpt-4-turbo`
+      * `gpt-4-turbo-preview`
+      * `gpt-4`
+- **What's in the repo?**
+  - `diarize.py` - download, transcribe and diarize audio
+    1. First uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to download audio(optionally video) from supplied URL
+    2. Next, it uses [ffmpeg](https://github.com/FFmpeg/FFmpeg) to convert the resulting `.m4a` file to `.wav`
+    3. Then it uses [faster_whisper](https://github.com/SYSTRAN/faster-whisper) to transcribe the `.wav` file to `.txt`
+    4. After that, it uses [pyannote](https://github.com/pyannote/pyannote-audio) to perform 'diarorization'
+    5. Finally, it'll send the resulting txt to an LLM endpoint of your choice for summarization of the text.
+  - `chunker.py` - break text into parts and prepare each part for LLM summarization
+  - `roller-*.py` - rolling summarization
+    - [can-ai-code](https://github.com/the-crypt-keeper/can-ai-code) - interview executors to run LLM inference
+  - `compare.py` - prepare LLM outputs for webapp
+  - `compare-app.py` - summary viewer webapp
+
+
+### <a name="localllm"></a>Setting up a Local LLM Inference Engine
 - **Setting up Local LLM Runner**
   - **Llama.cpp**
     - **Linux & Mac**
@@ -142,29 +172,7 @@ By default videos, transcriptions and summaries are stored in a folder with the
 
 
 
-### Usage
-- Single file (remote URL) transcription
-  * Single URL: `python diarize.py https://example.com/video.mp4`
-- Single file (local) transcription)
-  * Transcribe a local file: `python diarize.py /path/to/your/localfile.mp4`
-- Multiple files (local & remote)
-  * List of Files(can be URLs and local files mixed): `python diarize.py ./path/to/your/text_file.txt"`
-
-
-
-### APIs supported:
-1. Anthropic
-2. Cohere
-3. Groq
-4. Llama.cpp
-5. Kobold.cpp
-5. TabbyAPI
-6. OpenAI
-7. Oobabooga
-
-
-
-### Credits
+### <a name="credits"></a>Credits
 - [original](https://github.com/the-crypt-keeper/tldw)
 - [yt-dlp](https://github.com/yt-dlp/yt-dlp)
 - [ffmpeg](https://github.com/FFmpeg/FFmpeg)

diff --git a/config.txt b/config.txt
@@ -9,9 +9,11 @@ openai_api_key = <openai_api_key>
 openai_model = gpt-4-turbo
 
 [Local-API]
+kobold_api_key = <kobold api key>
+kobold_api_IP = http://127.0.0.1:5001/api/v1/generate
 llama_api_key = <llama.cpp api key>
 llama_api_IP = http://127.0.0.1:8080/completion
-ooba_api_key = <llama.cpp api key>
+ooba_api_key = <ooba api key>
 ooba_api_IP = http://127.0.0.1:5000/api/v1/generate
 
 [Paths]

diff --git a/diarize.py b/diarize.py
@@ -77,6 +77,8 @@
 openai_model = config.get('API', 'openai_model', fallback='gpt-4-turbo')
 
 # Local-Models
+kobold_api_IP = config.get('Local-API', 'kobold_api_IP', fallback='http://127.0.0.1:5000/api/v1/generate')
+kobold_api_key = config.get('Local-API', 'kobold_api_key', fallback='')
 llama_api_IP = config.get('Local-API', 'llama_api_IP', fallback='http://127.0.0.1:8080/v1/chat/completions')
 llama_api_key = config.get('Local-API', 'llama_api_key', fallback='')
 ooba_api_IP = config.get('Local-API', 'ooba_api_IP', fallback='http://127.0.0.1:5000/api/v1/generate')
@@ -961,6 +963,55 @@ def summarize_with_llama(api_url, file_path, token):
 
 
 
+# https://lite.koboldai.net/koboldcpp_api#/api%2Fv1/post_api_v1_generate
+def summarize_with_kobold(api_url, file_path):
+    try:
+        logging.debug("kobold: Loading JSON data")
+        with open(file_path, 'r') as file:
+            segments = json.load(file)
+
+        logging.debug(f"kobold: Extracting text from segments file")
+        text = extract_text_from_segments(segments)
+
+        headers = {
+            'accept': 'application/json',
+            'content-type': 'application/json',
+        }
+        # FIXME
+        prompt_text = f"{text} \n\nAs a professional summarizer, create a concise and comprehensive summary of the above text."
+        logging.debug(prompt_text)
+        # Values literally c/p from the api docs....
+        data = {
+            "max_context_length": 8096,
+            "max_length": 4096,
+            "prompt": prompt_text,
+        }
+
+        logging.debug("kobold: Submitting request to API endpoint")
+        print("kobold: Submitting request to API endpoint")
+        response = requests.post(api_url, headers=headers, json=data)
+        response_data = response.json()
+        logging.debug("kobold: API Response Data: %s", response_data)
+
+        if response.status_code == 200:
+            if 'results' in response_data and len(response_data['results']) > 0:
+                summary = response_data['results'][0]['text'].strip()
+                logging.debug("kobold: Summarization successful")
+                print("Summarization successful.")
+                return summary
+            else:
+                logging.error("Expected data not found in API response.")
+                return "Expected data not found in API response."
+        else:
+            logging.error(f"kobold: API request failed with status code {response.status_code}: {response.text}")
+            return f"kobold: API request failed: {response.text}"
+
+    except Exception as e:
+        logging.error("kobold: Error in processing: %s", str(e))
+        return f"kobold: Error occurred while processing summary with kobold: {str(e)}"
+
+
+
 # https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API
 def summarize_with_oobabooga(api_url, file_path):
     try:
@@ -1104,11 +1155,14 @@ def main(input_path, api_name=None, api_key=None, num_speakers=2, whisper_model=
                         token = llama_api_key
                         llama_ip = llama_api_IP
                         summary = summarize_with_llama(llama_ip, json_file_path, token)
+                    elif api_name.lower() == 'kobold':
+                        token = kobold_api_key
+                        kobold_ip = kobold_api_IP
+                        summary = summarize_with_kobold(kobold_ip, json_file_path)
                     elif api_name.lower() == 'oobabooga':
                         token = ooba_api_key
                         ooba_ip = ooba_api_IP
-                        oobabooga_url = "http://localhost:5000/api/v1/generate"  # Replace with your oobabooga API URL
-                        summary = summarize_with_oobabooga(oobabooga_url, json_file_path)
+                        summary = summarize_with_oobabooga(oobabooga_ip, json_file_path)
                     else:
                         logging.warning(f"Unsupported API: {api_name}")
                         summary = None