Bump version to 1.1.0 and update benchmarks #1161

MahmoudAshraf97 · 2024-11-20T21:09:43Z

No description provided.

MahmoudAshraf97 · 2024-11-21T10:33:01Z

OpenAI Whisper Inference

import torch

# torch.set_num_threads(8)
from whisper import load_model, transcribe, load_audio

audio = load_audio("benchmark/benchmark.m4a")
model = load_model("large-v2, device="cpu")


use_cuda = True

if use_cuda:
    #This is needed to ensure FP16 inference
    state_dict = model.state_dict()
    for i, (parameter_name, parameter) in enumerate(model.named_parameters()):
        if "ln" not in parameter_name:
            state_dict[parameter_name] = state_dict[parameter_name].half()
    model.load_state_dict(state_dict, assign=True)
    model = model.cuda()

result = transcribe(model, audio, beam_size=5, best_of=5, verbose=False)

MahmoudAshraf97 · 2024-11-21T16:21:18Z

Measure CPU memory for whisper.cpp

#!/bin/bash

# Command to run
COMMAND="./main -m models/ggml-large-v2.bin -l auto -fa ../faster-whisper/benchmark/output.wav"
# Run the command and measure memory consumption
OUTPUT=$(/usr/bin/time -v $COMMAND 2>&1)

# Extract the peak memory usage from the output
PEAK_MEMORY=$(echo "$OUTPUT" | grep "Maximum resident set size" | awk '{print $6}')

# Convert to MB for readability
PEAK_MEMORY_MB=$(bc <<< "scale=2; $PEAK_MEMORY / 1024")

# Print the result
echo "Peak memory consumption: $PEAK_MEMORY_MB MB"

GPU memory

import time
import pynvml

def measure_gpu_memory(command):
    # Initialize NVML
    pynvml.nvmlInit()
    device_count = pynvml.nvmlDeviceGetCount()
    peak_memory = 0

    # Get initial memory usage for all GPUs
    initial_memory = 0

    handle = pynvml.nvmlDeviceGetHandleByIndex(0)
    mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
    initial_memory += mem_info.used

    # Start the process
    import subprocess
    process = subprocess.Popen(command, shell=True)

    try:
        while process.poll() is None:  # While the process is running
            total_memory = 0
            handle = pynvml.nvmlDeviceGetHandleByIndex(0)
            mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
            total_memory += mem_info.used
            peak_memory = max(peak_memory, total_memory)
            time.sleep(0.5)
    finally:
        pynvml.nvmlShutdown()

    # Calculate memory usage difference (peak - initial)
    memory_difference = (peak_memory - initial_memory) / 1024 / 1024  # Convert to MB
    return memory_difference

if __name__ == "__main__":
    command = "./main -m models/ggml-large-v2.bin -l auto -fa ../faster-whisper/benchmark/output.wav"
    additional_memory = measure_gpu_memory(command)
    print(f"Additional GPU memory used: {additional_memory:.2f} MB")

* update version * Update CPU benchmarks * Updated GPU benchmarks * .. * more gpu benchmarks

MahmoudAshraf97 added 4 commits November 20, 2024 23:07

update version

8b40dbc

Update CPU benchmarks

eacfb0b

Updated GPU benchmarks

c3af287

..

3982049

more gpu benchmarks

80b9692

MahmoudAshraf97 merged commit 97a4785 into SYSTRAN:master Nov 21, 2024
3 checks passed

Equipo45 pushed a commit to Equipo45/faster-whisper that referenced this pull request Dec 4, 2024

Bump version to 1.1.0 and update benchmarks (SYSTRAN#1161)

4599088

* update version * Update CPU benchmarks * Updated GPU benchmarks * .. * more gpu benchmarks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump version to 1.1.0 and update benchmarks #1161

Bump version to 1.1.0 and update benchmarks #1161

MahmoudAshraf97 commented Nov 20, 2024

MahmoudAshraf97 commented Nov 21, 2024

MahmoudAshraf97 commented Nov 21, 2024 •

edited

Loading

Bump version to 1.1.0 and update benchmarks #1161

Bump version to 1.1.0 and update benchmarks #1161

Conversation

MahmoudAshraf97 commented Nov 20, 2024

MahmoudAshraf97 commented Nov 21, 2024

MahmoudAshraf97 commented Nov 21, 2024 • edited Loading

MahmoudAshraf97 commented Nov 21, 2024 •

edited

Loading