Skip to content

Commit

Permalink
Fixed up readme
Browse files Browse the repository at this point in the history
  • Loading branch information
rmusser01 committed May 8, 2024
1 parent 1efdf0d commit 8512db7
Show file tree
Hide file tree
Showing 3 changed files with 128 additions and 64 deletions.
130 changes: 69 additions & 61 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,41 @@ Original: `YouTube contains an incredible amount of knowledge, much of which is
* **Download Audio+Video from a list of videos in a text file (can be file paths or URLs) and have them all summarized:**
* `python diarize.py ./local/file_on_your/system --api_name <API_name>`

### What?
### Table of Contents
- [What?](#what)
- [Using](#using)
- [Setup](#setup)
- [Pieces/What's in the Repo](#what)
- [Setting up a Local LLM Inference Engine](#localllm)
- [Credits](#credits)

### <a name="what"></a>What?
- Use the script to transcribe a local file or remote url.
* Any url youtube-dl supports _should_ work.
* If you pass an API name (anthropic/cohere/grok/openai/) as a second argument, and add your API key to the config file, you can have your resulting transcriptions summarized as well.
* Alternatively, you can pass `llama`/`ooba`/`kobold`/`tabby` and have the script perform a request to your local API endpoint for summarization. You will need to modify the `llama_api_IP` value in the `config.txt` to reflect the `IP:Port` of your local server.
* Or pass the `--api_url` argument with the `IP:Port` to avoid making changes to the `config.txt` file.
* If the self-hosted server requires an API key, modify the appropriate api_key variable in the `config.txt` file.
* The current approach to summarization is currently 'dumb'/naive, and will likely be replaced or additional functionality added to reflect actual practices and not just 'dump txt in and get an answer' approach. This works for big context LLMs, but not everyone has access to them, and some transcriptions may be even longer, so we need to have an approach that can handle those cases.
- **APIs Supported**
1. Anthropic
2. Cohere
3. Groq
4. Llama.cpp
5. Kobold.cpp
5. TabbyAPI
6. OpenAI
7. Oobabooga


### <a name="using"></a>Using
- Single file (remote URL) transcription
* Single URL: `python diarize.py https://example.com/video.mp4`
- Single file (local) transcription)
* Transcribe a local file: `python diarize.py /path/to/your/localfile.mp4`
- Multiple files (local & remote)
* List of Files(can be URLs and local files mixed): `python diarize.py ./path/to/your/text_file.txt"`


Save time and use the `config.txt` file, it allows you to set these settings and have them used when ran.
```
Expand Down Expand Up @@ -54,51 +81,19 @@ options:
Log level (default: INFO)
>python diarize.py ./local/file_on_your/system --api_name anthropic
-
>python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name anthropic
-
>python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name openai
-
>python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name anthropic --api_key lolyearight
-
>python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name openai --api_key lolyearight
By default videos, transcriptions and summaries are stored in a folder with the video's name under './Results', unless otherwise specified in the config file.
```


### Pieces
- **Workflow**
1. Setup python + packages
2. Setup ffmpeg
3. Run `python diarize.py <video_url>` or `python diarize.py <List_of_videos.txt>`
4. If you want summarization, add your API keys (if not using a local LLM) to the `config.txt` file, and then re-run the script, passing in the name of the API [or URL endpoint - to be added] to the script.
* `python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name anthropic` - This will attempt to download the video, then upload the resulting json file to the anthropic API endpoint, referring to values set in the config file (API key and model) to request summarization.
- Anthropic:
* Opus: `claude-3-opus-20240229`
* Sonnet: `claude-3-sonnet-20240229`
* Haiku: `claude-3-haiku-20240307`
- Cohere:
* `command-r`
* `command-r-plus`
- OpenAI:
* `gpt-4-turbo`
* `gpt-4-turbo-preview`
* `gpt-4`


### What's in the repo?
- `diarize.py` - download, transcribe and diarize audio
1. First uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to download audio(optionally video) from supplied URL
2. Next, it uses [ffmpeg](https://github.com/FFmpeg/FFmpeg) to convert the resulting `.m4a` file to `.wav`
3. Then it uses [faster_whisper](https://github.com/SYSTRAN/faster-whisper) to transcribe the `.wav` file to `.txt`
4. After that, it uses [pyannote](https://github.com/pyannote/pyannote-audio) to perform 'diarorization'
5. Finally, it'll send the resulting txt to an LLM endpoint of your choice for summarization of the text.
* Goal is to support OpenAI/Claude/Cohere/Groq/local OpenAI endpoint (oobabooga/llama.cpp/exllama2) so you can either do a batch query to X endpoint, or just feed them one at a time. Your choice.
- `chunker.py` - break text into parts and prepare each part for LLM summarization
- `roller-*.py` - rolling summarization
- [can-ai-code](https://github.com/the-crypt-keeper/can-ai-code) - interview executors to run LLM inference
- `compare.py` - prepare LLM outputs for webapp
- `compare-app.py` - summary viewer webapp


### Setup
### <a name="setup"></a>Setup
- **Linux**
1. Download necessary packages (Python3, ffmpeg[sudo apt install ffmpeg / dnf install ffmpeg], ?)
2. Create a virtual env: `python -m venv ./`
Expand All @@ -121,6 +116,41 @@ By default videos, transcriptions and summaries are stored in a folder with the
* FIXME
8. For feeding the transcriptions to the API of your choice, simply use the corresponding script for your API provider.
* FIXME: add scripts for OpenAI api (generic) and others


### <a name="pieces"></a>Pieces & What's in the repo?
- **Workflow**
1. Setup python + packages
2. Setup ffmpeg
3. Run `python diarize.py <video_url>` or `python diarize.py <List_of_videos.txt>`
4. If you want summarization, add your API keys (if not using a local LLM) to the `config.txt` file, and then re-run the script, passing in the name of the API [or URL endpoint - to be added] to the script.
* `python diarize.py https://www.youtube.com/watch?v=4nd1CDZP21s --api_name anthropic` - This will attempt to download the video, then upload the resulting json file to the anthropic API endpoint, referring to values set in the config file (API key and model) to request summarization.
- Anthropic:
* Opus: `claude-3-opus-20240229`
* Sonnet: `claude-3-sonnet-20240229`
* Haiku: `claude-3-haiku-20240307`
- Cohere:
* `command-r`
* `command-r-plus`
- OpenAI:
* `gpt-4-turbo`
* `gpt-4-turbo-preview`
* `gpt-4`
- **What's in the repo?**
- `diarize.py` - download, transcribe and diarize audio
1. First uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to download audio(optionally video) from supplied URL
2. Next, it uses [ffmpeg](https://github.com/FFmpeg/FFmpeg) to convert the resulting `.m4a` file to `.wav`
3. Then it uses [faster_whisper](https://github.com/SYSTRAN/faster-whisper) to transcribe the `.wav` file to `.txt`
4. After that, it uses [pyannote](https://github.com/pyannote/pyannote-audio) to perform 'diarorization'
5. Finally, it'll send the resulting txt to an LLM endpoint of your choice for summarization of the text.
- `chunker.py` - break text into parts and prepare each part for LLM summarization
- `roller-*.py` - rolling summarization
- [can-ai-code](https://github.com/the-crypt-keeper/can-ai-code) - interview executors to run LLM inference
- `compare.py` - prepare LLM outputs for webapp
- `compare-app.py` - summary viewer webapp


### <a name="localllm"></a>Setting up a Local LLM Inference Engine
- **Setting up Local LLM Runner**
- **Llama.cpp**
- **Linux & Mac**
Expand All @@ -142,29 +172,7 @@ By default videos, transcriptions and summaries are stored in a folder with the



### Usage
- Single file (remote URL) transcription
* Single URL: `python diarize.py https://example.com/video.mp4`
- Single file (local) transcription)
* Transcribe a local file: `python diarize.py /path/to/your/localfile.mp4`
- Multiple files (local & remote)
* List of Files(can be URLs and local files mixed): `python diarize.py ./path/to/your/text_file.txt"`



### APIs supported:
1. Anthropic
2. Cohere
3. Groq
4. Llama.cpp
5. Kobold.cpp
5. TabbyAPI
6. OpenAI
7. Oobabooga



### Credits
### <a name="credits"></a>Credits
- [original](https://github.com/the-crypt-keeper/tldw)
- [yt-dlp](https://github.com/yt-dlp/yt-dlp)
- [ffmpeg](https://github.com/FFmpeg/FFmpeg)
Expand Down
4 changes: 3 additions & 1 deletion config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,11 @@ openai_api_key = <openai_api_key>
openai_model = gpt-4-turbo

[Local-API]
kobold_api_key = <kobold api key>
kobold_api_IP = http://127.0.0.1:5001/api/v1/generate
llama_api_key = <llama.cpp api key>
llama_api_IP = http://127.0.0.1:8080/completion
ooba_api_key = <llama.cpp api key>
ooba_api_key = <ooba api key>
ooba_api_IP = http://127.0.0.1:5000/api/v1/generate

[Paths]
Expand Down
58 changes: 56 additions & 2 deletions diarize.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@
openai_model = config.get('API', 'openai_model', fallback='gpt-4-turbo')

# Local-Models
kobold_api_IP = config.get('Local-API', 'kobold_api_IP', fallback='http://127.0.0.1:5000/api/v1/generate')
kobold_api_key = config.get('Local-API', 'kobold_api_key', fallback='')
llama_api_IP = config.get('Local-API', 'llama_api_IP', fallback='http://127.0.0.1:8080/v1/chat/completions')
llama_api_key = config.get('Local-API', 'llama_api_key', fallback='')
ooba_api_IP = config.get('Local-API', 'ooba_api_IP', fallback='http://127.0.0.1:5000/api/v1/generate')
Expand Down Expand Up @@ -961,6 +963,55 @@ def summarize_with_llama(api_url, file_path, token):



# https://lite.koboldai.net/koboldcpp_api#/api%2Fv1/post_api_v1_generate
def summarize_with_kobold(api_url, file_path):
try:
logging.debug("kobold: Loading JSON data")
with open(file_path, 'r') as file:
segments = json.load(file)

logging.debug(f"kobold: Extracting text from segments file")
text = extract_text_from_segments(segments)

headers = {
'accept': 'application/json',
'content-type': 'application/json',
}
# FIXME
prompt_text = f"{text} \n\nAs a professional summarizer, create a concise and comprehensive summary of the above text."
logging.debug(prompt_text)
# Values literally c/p from the api docs....
data = {
"max_context_length": 8096,
"max_length": 4096,
"prompt": prompt_text,
}

logging.debug("kobold: Submitting request to API endpoint")
print("kobold: Submitting request to API endpoint")
response = requests.post(api_url, headers=headers, json=data)
response_data = response.json()
logging.debug("kobold: API Response Data: %s", response_data)

if response.status_code == 200:
if 'results' in response_data and len(response_data['results']) > 0:
summary = response_data['results'][0]['text'].strip()
logging.debug("kobold: Summarization successful")
print("Summarization successful.")
return summary
else:
logging.error("Expected data not found in API response.")
return "Expected data not found in API response."
else:
logging.error(f"kobold: API request failed with status code {response.status_code}: {response.text}")
return f"kobold: API request failed: {response.text}"

except Exception as e:
logging.error("kobold: Error in processing: %s", str(e))
return f"kobold: Error occurred while processing summary with kobold: {str(e)}"



# https://github.com/oobabooga/text-generation-webui/wiki/12-%E2%80%90-OpenAI-API
def summarize_with_oobabooga(api_url, file_path):
try:
Expand Down Expand Up @@ -1104,11 +1155,14 @@ def main(input_path, api_name=None, api_key=None, num_speakers=2, whisper_model=
token = llama_api_key
llama_ip = llama_api_IP
summary = summarize_with_llama(llama_ip, json_file_path, token)
elif api_name.lower() == 'kobold':
token = kobold_api_key
kobold_ip = kobold_api_IP
summary = summarize_with_kobold(kobold_ip, json_file_path)
elif api_name.lower() == 'oobabooga':
token = ooba_api_key
ooba_ip = ooba_api_IP
oobabooga_url = "http://localhost:5000/api/v1/generate" # Replace with your oobabooga API URL
summary = summarize_with_oobabooga(oobabooga_url, json_file_path)
summary = summarize_with_oobabooga(oobabooga_ip, json_file_path)
else:
logging.warning(f"Unsupported API: {api_name}")
summary = None
Expand Down

0 comments on commit 8512db7

Please sign in to comment.