Skip to content

Commit

Permalink
ruby : support new-segment callback (ggerganov#2506)
Browse files Browse the repository at this point in the history
* Add Params#new_segment_callback= method

* Add tests for Params#new_segment_callback=

* Group tests for #transcribe

* Don't use static for thread-safety

* Set new_segment_callback only when necessary

* Remove redundant check

* [skip ci] Add Ruby version README

* Revert "Group tests for #transcribe"

This reverts commit 71b65b0.

* Revert "Add tests for Params#new_segment_callback="

This reverts commit 81e6df3.

* Add test for Context#full_n_segments

* Add Context#full_n_segments

* Add tests for lang API

* Add lang API

* Add tests for Context#full_lang_id API

* Add Context#full_lang_id

* Add abnormal test cases for lang

* Raise appropriate errors from lang APIs

* Add tests for Context#full_get_segment_t{0,1} API

* Add Context#full_get_segment_t{0,1}

* Add tests for Context#full_get_segment_speaker_turn_next API

* Add Context#full_get_segment_speaker_turn_next

* Add tests for Context#full_get_segment_text

* Add Context#full_get_setgment_text

* Add tests for Params#new_segment_callback=

* Run new segment callback

* Split tests to multiple files

* Use container struct for new segment callback

* Add tests for Params#new_segment_callback_user_data=

* Add Whisper::Params#new_user_callback_user_data=

* Add GC-related test for new segment callback

* Protect new segment callback related structs from GC

* Add meaningful test for build

* Rename: new_segment_callback_user_data -> new_segment_callback_container

* Add tests for Whisper::Segment

* Add Whisper::Segment and Whisper::Context#each_segment

* Extract c_ruby_whisper_callback_container_allocate()

* Add test for Whisper::Params#on_new_segment

* Add Whisper::Params#on_new_egment

* Assign symbol IDs to variables

* Make extsources.yaml simpler

* Update README

* Add document comments

* Add test for calling Whisper::Params#on_new_segment multiple times

* Add file dependencies to GitHub actions config and .gitignore

* Add more files to ext/.gitignore
  • Loading branch information
KitaitiMakoto authored and adutilleul committed Nov 16, 2024
1 parent a9d704b commit 7011725
Show file tree
Hide file tree
Showing 14 changed files with 1,112 additions and 170 deletions.
10 changes: 10 additions & 0 deletions .github/workflows/bindings-ruby.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ on:
- ggml/src/ggml-quants.h
- ggml/src/ggml-quants.c
- ggml/src/ggml-cpu-impl.h
- ggml/src/ggml-metal.m
- ggml/src/ggml-metal.metal
- ggml/src/ggml-blas.cpp
- ggml/include/ggml.h
- ggml/include/ggml-alloc.h
- ggml/include/ggml-backend.h
Expand All @@ -24,6 +27,8 @@ on:
- ggml/include/ggml-metal.h
- ggml/include/ggml-sycl.h
- ggml/include/ggml-vulkan.h
- ggml/include/ggml-blas.h
- scripts/get-flags.mk
- examples/dr_wav.h
pull_request:
paths:
Expand All @@ -41,6 +46,9 @@ on:
- ggml/src/ggml-quants.h
- ggml/src/ggml-quants.c
- ggml/src/ggml-cpu-impl.h
- ggml/src/ggml-metal.m
- ggml/src/ggml-metal.metal
- ggml/src/ggml-blas.cpp
- ggml/include/ggml.h
- ggml/include/ggml-alloc.h
- ggml/include/ggml-backend.h
Expand All @@ -49,6 +57,8 @@ on:
- ggml/include/ggml-metal.h
- ggml/include/ggml-sycl.h
- ggml/include/ggml-vulkan.h
- ggml/include/ggml-blas.h
- scripts/get-flags.mk
- examples/dr_wav.h

jobs:
Expand Down
1 change: 0 additions & 1 deletion bindings/ruby/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
README.md
LICENSE
pkg/
lib/whisper.*
110 changes: 110 additions & 0 deletions bindings/ruby/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
whispercpp
==========

![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg)

Ruby bindings for [whisper.cpp][], an interface of automatic speech recognition model.

Installation
------------

Install the gem and add to the application's Gemfile by executing:

$ bundle add whispercpp

If bundler is not being used to manage dependencies, install the gem by executing:

$ gem install whispercpp

Usage
-----

```ruby
require "whisper"

whisper = Whisper::Context.new("path/to/model.bin")

params = Whisper::Params.new
params.language = "en"
params.offset = 10_000
params.duration = 60_000
params.max_text_tokens = 300
params.translate = true
params.print_timestamps = false

whisper.transcribe("path/to/audio.wav", params) do |whole_text|
puts whole_text
end

```

### Preparing model ###

Use script to download model file(s):

```bash
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp
sh ./models/download-ggml-model.sh base.en
```

There are some types of models. See [models][] page for details.

### Preparing audio file ###

Currently, whisper.cpp accepts only 16-bit WAV files.

### API ###

Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`:

```ruby
def format_time(time_ms)
sec, decimal_part = time_ms.divmod(1000)
min, sec = sec.divmod(60)
hour, min = min.divmod(60)
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
end

whisper.transcribe("path/to/audio.wav", params)

whisper.each_segment.with_index do |segment, index|
line = "[%{nth}: %{st} --> %{ed}] %{text}" % {
nth: index + 1,
st: format_time(segment.start_time),
ed: format_time(segment.end_time),
text: segment.text
}
line << " (speaker turned)" if segment.speaker_next_turn?
puts line
end

```

You can also add hook to params called on new segment:

```ruby
def format_time(time_ms)
sec, decimal_part = time_ms.divmod(1000)
min, sec = sec.divmod(60)
hour, min = min.divmod(60)
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
end

# Add hook before calling #transcribe
params.on_new_segment do |segment|
line = "[%{st} --> %{ed}] %{text}" % {
st: format_time(segment.start_time),
ed: format_time(segment.end_time),
text: segment.text
}
line << " (speaker turned)" if segment.speaker_next_turn?
puts line
end

whisper.transcribe("path/to/audio.wav", params)

```

[whisper.cpp]: https://github.com/ggerganov/whisper.cpp
[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models
17 changes: 8 additions & 9 deletions bindings/ruby/Rakefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,16 @@ require "yaml"
require "rake/testtask"

extsources = YAML.load_file("extsources.yaml")
extsources.each_pair do |src_dir, dests|
dests.each do |dest|
src = Pathname(src_dir)/File.basename(dest)

file src
file dest => src do |t|
cp t.source, t.name
end
SOURCES = FileList[]
extsources.each do |src|
basename = src.pathmap("%f")
dest = basename == "LICENSE" ? basename : basename.pathmap("ext/%f")
file src
file dest => src do |t|
cp t.source, t.name
end
SOURCES.include dest
end
SOURCES = extsources.values.flatten
CLEAN.include SOURCES
CLEAN.include FileList[
"ext/*.o",
Expand Down
7 changes: 7 additions & 0 deletions bindings/ruby/ext/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ ggml-backend.c
ggml-backend.h
ggml-common.h
ggml-cpu-impl.h
ggml-metal.m
ggml-metal.metal
ggml-metal-embed.metal
ggml-blas.cpp
ggml-cuda.h
ggml-impl.h
ggml-kompute.h
Expand All @@ -20,9 +24,12 @@ ggml-quants.c
ggml-quants.h
ggml-sycl.h
ggml-vulkan.h
ggml-blas.h
get-flags.mk
whisper.cpp
whisper.h
dr_wav.h
depend
whisper.bundle
whisper.so
whisper.dll
Loading

0 comments on commit 7011725

Please sign in to comment.