forked from ggerganov/whisper.cpp
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ruby : support new-segment callback (ggerganov#2506)
* Add Params#new_segment_callback= method * Add tests for Params#new_segment_callback= * Group tests for #transcribe * Don't use static for thread-safety * Set new_segment_callback only when necessary * Remove redundant check * [skip ci] Add Ruby version README * Revert "Group tests for #transcribe" This reverts commit 71b65b0. * Revert "Add tests for Params#new_segment_callback=" This reverts commit 81e6df3. * Add test for Context#full_n_segments * Add Context#full_n_segments * Add tests for lang API * Add lang API * Add tests for Context#full_lang_id API * Add Context#full_lang_id * Add abnormal test cases for lang * Raise appropriate errors from lang APIs * Add tests for Context#full_get_segment_t{0,1} API * Add Context#full_get_segment_t{0,1} * Add tests for Context#full_get_segment_speaker_turn_next API * Add Context#full_get_segment_speaker_turn_next * Add tests for Context#full_get_segment_text * Add Context#full_get_setgment_text * Add tests for Params#new_segment_callback= * Run new segment callback * Split tests to multiple files * Use container struct for new segment callback * Add tests for Params#new_segment_callback_user_data= * Add Whisper::Params#new_user_callback_user_data= * Add GC-related test for new segment callback * Protect new segment callback related structs from GC * Add meaningful test for build * Rename: new_segment_callback_user_data -> new_segment_callback_container * Add tests for Whisper::Segment * Add Whisper::Segment and Whisper::Context#each_segment * Extract c_ruby_whisper_callback_container_allocate() * Add test for Whisper::Params#on_new_segment * Add Whisper::Params#on_new_egment * Assign symbol IDs to variables * Make extsources.yaml simpler * Update README * Add document comments * Add test for calling Whisper::Params#on_new_segment multiple times * Add file dependencies to GitHub actions config and .gitignore * Add more files to ext/.gitignore
- Loading branch information
1 parent
a9d704b
commit 7011725
Showing
14 changed files
with
1,112 additions
and
170 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
README.md | ||
LICENSE | ||
pkg/ | ||
lib/whisper.* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
whispercpp | ||
========== | ||
|
||
![whisper.cpp](https://user-images.githubusercontent.com/1991296/235238348-05d0f6a4-da44-4900-a1de-d0707e75b763.jpeg) | ||
|
||
Ruby bindings for [whisper.cpp][], an interface of automatic speech recognition model. | ||
|
||
Installation | ||
------------ | ||
|
||
Install the gem and add to the application's Gemfile by executing: | ||
|
||
$ bundle add whispercpp | ||
|
||
If bundler is not being used to manage dependencies, install the gem by executing: | ||
|
||
$ gem install whispercpp | ||
|
||
Usage | ||
----- | ||
|
||
```ruby | ||
require "whisper" | ||
|
||
whisper = Whisper::Context.new("path/to/model.bin") | ||
|
||
params = Whisper::Params.new | ||
params.language = "en" | ||
params.offset = 10_000 | ||
params.duration = 60_000 | ||
params.max_text_tokens = 300 | ||
params.translate = true | ||
params.print_timestamps = false | ||
|
||
whisper.transcribe("path/to/audio.wav", params) do |whole_text| | ||
puts whole_text | ||
end | ||
|
||
``` | ||
|
||
### Preparing model ### | ||
|
||
Use script to download model file(s): | ||
|
||
```bash | ||
git clone https://github.com/ggerganov/whisper.cpp.git | ||
cd whisper.cpp | ||
sh ./models/download-ggml-model.sh base.en | ||
``` | ||
|
||
There are some types of models. See [models][] page for details. | ||
|
||
### Preparing audio file ### | ||
|
||
Currently, whisper.cpp accepts only 16-bit WAV files. | ||
|
||
### API ### | ||
|
||
Once `Whisper::Context#transcribe` called, you can retrieve segments by `#each_segment`: | ||
|
||
```ruby | ||
def format_time(time_ms) | ||
sec, decimal_part = time_ms.divmod(1000) | ||
min, sec = sec.divmod(60) | ||
hour, min = min.divmod(60) | ||
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part] | ||
end | ||
|
||
whisper.transcribe("path/to/audio.wav", params) | ||
|
||
whisper.each_segment.with_index do |segment, index| | ||
line = "[%{nth}: %{st} --> %{ed}] %{text}" % { | ||
nth: index + 1, | ||
st: format_time(segment.start_time), | ||
ed: format_time(segment.end_time), | ||
text: segment.text | ||
} | ||
line << " (speaker turned)" if segment.speaker_next_turn? | ||
puts line | ||
end | ||
|
||
``` | ||
|
||
You can also add hook to params called on new segment: | ||
|
||
```ruby | ||
def format_time(time_ms) | ||
sec, decimal_part = time_ms.divmod(1000) | ||
min, sec = sec.divmod(60) | ||
hour, min = min.divmod(60) | ||
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part] | ||
end | ||
|
||
# Add hook before calling #transcribe | ||
params.on_new_segment do |segment| | ||
line = "[%{st} --> %{ed}] %{text}" % { | ||
st: format_time(segment.start_time), | ||
ed: format_time(segment.end_time), | ||
text: segment.text | ||
} | ||
line << " (speaker turned)" if segment.speaker_next_turn? | ||
puts line | ||
end | ||
|
||
whisper.transcribe("path/to/audio.wav", params) | ||
|
||
``` | ||
|
||
[whisper.cpp]: https://github.com/ggerganov/whisper.cpp | ||
[models]: https://github.com/ggerganov/whisper.cpp/tree/master/models |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.