Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from ggerganov:master #165

Closed
wants to merge 72 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
c0d6f79
SYCL: Use get_multi_ptr instead of deprecated get_pointer in wkv6 (#1…
qnixsynapse Jan 7, 2025
a4dd490
rpc : code cleanup (#11107)
rgerganov Jan 7, 2025
a3d50bc
ggml-backend : only offload from host buffers (#11120)
slaren Jan 7, 2025
017cc5f
ggml-backend : only offload from host buffers (fix) (#11124)
slaren Jan 7, 2025
53ff6b9
GGUF: C++ refactor, backend support, misc fixes (#11030)
JohannesGaessler Jan 7, 2025
bec2183
fix: Vulkan shader gen binary path when Cross-compiling (#11096)
ag2s20150909 Jan 8, 2025
02f0430
Disable GL_KHR_cooperative_matrix Vulkan extension if not available. …
mbaudier Jan 8, 2025
0d52a69
ci : fix cmake option (#11125)
ggerganov Jan 8, 2025
8cef75c
llamafile : ppc64le MMA INT8 implementation (#10912)
amritahs-ibm Jan 8, 2025
a3c1232
arg : option to exclude arguments from specific examples (#11136)
ggerganov Jan 8, 2025
80ccf5d
ci : pin dependency to specific version (#11137)
ngxson Jan 8, 2025
c792dcf
ggml : allow loading backend with env variable (ggml/1059)
rgerganov Jan 5, 2025
99a3755
sync : ggml
ggerganov Jan 8, 2025
c07d437
llama : avoid hardcoded QK_K (#11061)
ggerganov Jan 8, 2025
4d2b3d8
lora : improve compat with `mergekit-extract-lora` (#11131)
ngxson Jan 8, 2025
f7cd133
ci : use actions from ggml-org (#11140)
ngxson Jan 8, 2025
1bf839b
Enhance user input handling for llama-run (#11138)
ericcurtin Jan 8, 2025
8a1d9c2
gguf-py : move scripts directory (#11116)
VJHack Jan 8, 2025
8d59d91
fix: add missing msg in static_assert (#11143)
hydai Jan 8, 2025
d9feae1
llama-chat : add phi 4 template (#11148)
ngxson Jan 9, 2025
be0e950
media : remove old img [no ci]
ggerganov Jan 9, 2025
f8feb4b
model: Add support for PhiMoE arch (#11003)
phymbert Jan 9, 2025
8eceb88
server : add tooltips to settings and themes btn (#11154)
danbev Jan 9, 2025
1204f97
doc: add cuda guide for fedora (#11135)
teihome Jan 9, 2025
c6860cc
SYCL: Refactor ggml_sycl_compute_forward (#11121)
qnixsynapse Jan 10, 2025
ee7136c
llama: add support for QRWKV6 model architecture (#11001)
MollySophia Jan 10, 2025
c3f9d25
Vulkan: Fix float16 use on devices without float16 support + fix subg…
0cc4m Jan 10, 2025
ff3fcab
convert : add --print-supported-models option (#11172)
danbev Jan 10, 2025
ba8a1f9
examples : add README.md to tts example [no ci] (#11155)
danbev Jan 10, 2025
2739a71
convert : sort print supported models [no ci] (#11179)
danbev Jan 11, 2025
c05e8c9
gguf-py: fixed local detection of gguf package (#11180)
VJHack Jan 11, 2025
afa8a9e
llama : add `llama_vocab`, functions -> methods, naming (#11110)
ggerganov Jan 12, 2025
08f10f6
llama : remove notion of CLS token (#11064)
ggerganov Jan 12, 2025
9a48399
llama : fix chat template gguf key (#11201)
ngxson Jan 12, 2025
924518e
Reset color before we exit (#11205)
ericcurtin Jan 12, 2025
1244cdc
ggml : do not define GGML_USE_CUDA when building with GGML_BACKEND_DL…
rgerganov Jan 13, 2025
8f70fc3
llama : remove 'd' from bad special token log (#11212)
danbev Jan 13, 2025
7426a26
contrib : add naming guidelines (#11177)
ggerganov Jan 13, 2025
00b4c3d
common : support tag-based --hf-repo like on ollama (#11195)
ngxson Jan 13, 2025
ca001f6
contrib : add naming guidelines (cont) (#11177)
ggerganov Jan 13, 2025
437e05f
server : (UI) Support for RTL text as models input or output (#11208)
ebraminio Jan 13, 2025
a29f087
contrib : add naming guidelines (cont) (#11177)
ggerganov Jan 13, 2025
39509fb
cuda : CUDA Graph Compute Function Refactor (precursor for performanc…
aendk Jan 13, 2025
84a4481
cli : auto activate conversation mode if chat template is available (…
ngxson Jan 13, 2025
504af20
server : (UI) Improve messages bubble shape in RTL (#11220)
ebraminio Jan 13, 2025
d00a80e
scripts : sync opencl
ggerganov Jan 14, 2025
48e1ae0
scripts : sync gguf
ggerganov Jan 14, 2025
a4f3f5d
scripts : sync gguf (cont)
ggerganov Jan 14, 2025
44d1e79
sync : ggml
ggerganov Jan 14, 2025
091592d
Refactor test-chat-template.cpp (#11224)
ochafik Jan 14, 2025
c5bf0d1
server : Improve code snippets direction between RTL text (#11221)
ebraminio Jan 14, 2025
bbf3e55
vocab : add dummy tokens for "no_vocab" type (#11231)
ggerganov Jan 14, 2025
b4d92a5
ci : add -no-cnv for tests (#11238)
ngxson Jan 14, 2025
f446c2c
SYCL: Add gated linear attention kernel (#11175)
qnixsynapse Jan 15, 2025
0ccd7f3
examples : add embd_to_audio to tts-outetts.py [no ci] (#11235)
danbev Jan 15, 2025
432df2d
RoPE: fix back, CUDA support for back + noncont. (#11240)
JohannesGaessler Jan 15, 2025
1d85043
fix: ggml: fix vulkan-shaders-gen build (#10448)
sparkleholic Jan 15, 2025
f11cfdf
ci : use -no-cnv in gguf-split tests (#11254)
ggerganov Jan 15, 2025
adc5dd9
vulkan: scale caching for k quants + misc fixes (#11081)
netrunnereve Jan 15, 2025
c67cc98
ggml: aarch64: implement SVE kernels for q4_K_q8_K vector dot (#11227)
fj-y-saito Jan 16, 2025
681149c
llama : add `llama_model_load_from_splits` (#11255)
ngxson Jan 16, 2025
9c8dcef
CUDA: backwards pass for misc. ops, add tests (#11257)
JohannesGaessler Jan 16, 2025
4dbc8b9
llama : add internlm3 support (#11233)
RunningLeon Jan 16, 2025
206bc53
vulkan: optimize coopmat2 q2_k dequant function (#11130)
jeffbolznv Jan 16, 2025
466300f
vulkan: optimize coopmat2 q4_k/q5_k dequant functions. (#11206)
jeffbolznv Jan 16, 2025
bd38dde
vulkan: support copy from f32 to q4_0/q4_1/q5_0/q5_1/q8_0/iq4_nl (#11…
jeffbolznv Jan 16, 2025
7a689c4
README : added kalavai to infrastructure list (#11216)
musoles Jan 17, 2025
960ec65
llama : fix deprecation message: vocabable -> vocab (#11269)
dwrensha Jan 17, 2025
a133566
vocab : fix double-eos check (#11273)
ggerganov Jan 17, 2025
667d728
rpc : early register backend devices (#11262)
rgerganov Jan 17, 2025
3edfa7d
llama.android: add field formatChat to control whether to parse speci…
codezjx Jan 17, 2025
44e18ef
vulkan: fix coopmat2 flash attention for non-contiguous inputs (#11281)
jeffbolznv Jan 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
examples : add embd_to_audio to tts-outetts.py [no ci] (ggml-org#11235)
This commit contains a suggestion for adding the missing embd_to_audio
function from tts.cpp to tts-outetts.py. This introduces a depencency
numpy which I was not sure if that is acceptable or not (only PyTorch
was mentioned in referened PR).

Also the README has been updated with instructions to run the example
with llama-server and the python script.

Refs: ggml-org#10784 (comment)
  • Loading branch information
danbev authored Jan 15, 2025
commit 0ccd7f3eb2debe477ffe3c44d5353cc388c9418d
37 changes: 37 additions & 0 deletions examples/tts/README.md
Original file line number Diff line number Diff line change
@@ -78,3 +78,40 @@ play the audio:
$ aplay output.wav
```

### Running the example with llama-server
Running this example with `llama-server` is also possible and requires two
server instances to be started. One will serve the LLM model and the other
will serve the voice decoder model.

The LLM model server can be started with the following command:
```console
$ ./build/bin/llama-server -m ./models/outetts-0.2-0.5B-q8_0.gguf --port 8020
```

And the voice decoder model server can be started using:
```console
./build/bin/llama-server -m ./models/wavtokenizer-large-75-f16.gguf --port 8021 --embeddings --pooling none
```

Then we can run [tts-outetts.py](tts-outetts.py) to generate the audio.

First create a virtual environment for python and install the required
dependencies (this in only required to be done once):
```console
$ python3 -m venv venv
$ source venv/bin/activate
(venv) pip install requests numpy
```

And then run the python script using:
```conole
(venv) python ./examples/tts/tts-outetts.py http://localhost:8020 http://localhost:8021 "Hello world"
spectrogram generated: n_codes: 90, n_embd: 1282
converting to audio ...
audio generated: 28800 samples
audio written to file "output.wav"
```
And to play the audio we can again use aplay or any other media player:
```console
$ aplay output.wav
```
128 changes: 126 additions & 2 deletions examples/tts/tts-outetts.py
Original file line number Diff line number Diff line change
@@ -3,6 +3,121 @@
#import struct
import requests
import re
import struct
import numpy as np
from concurrent.futures import ThreadPoolExecutor


def fill_hann_window(size, periodic=True):
if periodic:
return np.hanning(size + 1)[:-1]
return np.hanning(size)


def irfft(n_fft, complex_input):
return np.fft.irfft(complex_input, n=n_fft)


def fold(buffer, n_out, n_win, n_hop, n_pad):
result = np.zeros(n_out)
n_frames = len(buffer) // n_win

for i in range(n_frames):
start = i * n_hop
end = start + n_win
result[start:end] += buffer[i * n_win:(i + 1) * n_win]

return result[n_pad:-n_pad] if n_pad > 0 else result


def process_frame(args):
l, n_fft, ST, hann = args
frame = irfft(n_fft, ST[l])
frame = frame * hann
hann2 = hann * hann
return frame, hann2


def embd_to_audio(embd, n_codes, n_embd, n_thread=4):
embd = np.asarray(embd, dtype=np.float32).reshape(n_codes, n_embd)

n_fft = 1280
n_hop = 320
n_win = 1280
n_pad = (n_win - n_hop) // 2
n_out = (n_codes - 1) * n_hop + n_win

hann = fill_hann_window(n_fft, True)

E = np.zeros((n_embd, n_codes), dtype=np.float32)
for l in range(n_codes):
for k in range(n_embd):
E[k, l] = embd[l, k]

half_embd = n_embd // 2
S = np.zeros((n_codes, half_embd + 1), dtype=np.complex64)

for k in range(half_embd):
for l in range(n_codes):
mag = E[k, l]
phi = E[k + half_embd, l]

mag = np.clip(np.exp(mag), 0, 1e2)
S[l, k] = mag * np.exp(1j * phi)

res = np.zeros(n_codes * n_fft)
hann2_buffer = np.zeros(n_codes * n_fft)

with ThreadPoolExecutor(max_workers=n_thread) as executor:
args = [(l, n_fft, S, hann) for l in range(n_codes)]
results = list(executor.map(process_frame, args))

for l, (frame, hann2) in enumerate(results):
res[l*n_fft:(l+1)*n_fft] = frame
hann2_buffer[l*n_fft:(l+1)*n_fft] = hann2

audio = fold(res, n_out, n_win, n_hop, n_pad)
env = fold(hann2_buffer, n_out, n_win, n_hop, n_pad)

mask = env > 1e-10
audio[mask] /= env[mask]

return audio


def save_wav(filename, audio_data, sample_rate):
num_channels = 1
bits_per_sample = 16
bytes_per_sample = bits_per_sample // 8
data_size = len(audio_data) * bytes_per_sample
byte_rate = sample_rate * num_channels * bytes_per_sample
block_align = num_channels * bytes_per_sample
chunk_size = 36 + data_size # 36 = size of header minus first 8 bytes

header = struct.pack(
'<4sI4s4sIHHIIHH4sI',
b'RIFF',
chunk_size,
b'WAVE',
b'fmt ',
16, # fmt chunk size
1, # audio format (PCM)
num_channels,
sample_rate,
byte_rate,
block_align,
bits_per_sample,
b'data',
data_size
)

audio_data = np.clip(audio_data * 32767, -32768, 32767)
pcm_data = audio_data.astype(np.int16)

with open(filename, 'wb') as f:
f.write(header)
f.write(pcm_data.tobytes())


def process_text(text: str):
text = re.sub(r'\d+(\.\d+)?', lambda x: x.group(), text.lower()) # TODO this needs to be fixed
@@ -170,6 +285,15 @@ def process_text(text: str):
print('spectrogram generated: n_codes: %d, n_embd: %d' % (n_codes, n_embd))

# post-process the spectrogram to convert to audio
# TODO: see the tts.cpp:embd_to_audio() and implement it in Python
print('converting to audio ...')
print('TODO: see the tts.cpp:embd_to_audio() and implement it in Python')
audio = embd_to_audio(embd, n_codes, n_embd)
print('audio generated: %d samples' % len(audio))

filename = "output.wav"
sample_rate = 24000 # sampling rate

# zero out first 0.25 seconds
audio[:24000 // 4] = 0.0

save_wav(filename, audio, sample_rate)
print('audio written to file "%s"' % filename)
Loading