Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test stt #5634

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
11 changes: 11 additions & 0 deletions app/client/api.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,16 @@ export interface SpeechOptions {
onController?: (controller: AbortController) => void;
}

export interface TranscriptionOptions {
model?: "whisper-1";
file: Blob;
language?: string;
prompt?: string;
response_format?: "json" | "text" | "srt" | "verbose_json" | "vtt";
temperature?: number;
onController?: (controller: AbortController) => void;
}

export interface ChatOptions {
messages: RequestMessage[];
config: LLMConfig;
Expand Down Expand Up @@ -98,6 +108,7 @@ export interface LLMModelProvider {
export abstract class LLMApi {
abstract chat(options: ChatOptions): Promise<void>;
abstract speech(options: SpeechOptions): Promise<ArrayBuffer>;
abstract transcription(options: TranscriptionOptions): Promise<string>;
abstract usage(): Promise<LLMUsage>;
abstract models(): Promise<LLMModel[]>;
}
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/alibaba.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import {
LLMApi,
LLMModel,
SpeechOptions,
TranscriptionOptions,
MultimodalContent,
} from "../api";
import Locale from "../../locales";
Expand Down Expand Up @@ -89,6 +90,10 @@ export class QwenApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines +93 to +95
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method.

The transcription method is currently a placeholder. To complete this feature:

  1. Implement the actual transcription logic.
  2. Handle potential errors and edge cases.
  3. Ensure the implementation adheres to the TranscriptionOptions interface.
  4. Add appropriate error handling and logging.
  5. Consider adding unit tests for this new functionality.

Would you like assistance in implementing the transcription method or creating a GitHub issue to track this task?


async chat(options: ChatOptions) {
const messages = options.messages.map((v) => ({
role: v.role,
Expand Down
12 changes: 11 additions & 1 deletion app/client/platforms/anthropic.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
import { Anthropic, ApiPath } from "@/app/constant";
import { ChatOptions, getHeaders, LLMApi, SpeechOptions } from "../api";
import {
ChatOptions,
getHeaders,
LLMApi,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import {
useAccessStore,
useAppConfig,
Expand Down Expand Up @@ -77,6 +83,10 @@ export class ClaudeApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines +86 to +88
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method.

The transcription method has been added to the ClaudeApi class, which is a good start for introducing transcription functionality. However, the method is currently not implemented.

To complete this feature:

  1. Implement the transcription logic using the Anthropic API or any other appropriate service.
  2. Handle potential errors and edge cases.
  3. Add unit tests to verify the functionality.

Would you like assistance in implementing this method or creating a GitHub issue to track this task?


extractMessage(res: any) {
console.log("[Response] claude response: ", res);

Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/baidu.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import {
LLMModel,
MultimodalContent,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import {
Expand Down Expand Up @@ -81,6 +82,10 @@ export class ErnieApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines +85 to +87
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method and add documentation.

The transcription method has been added with the correct signature, but it currently throws a "Method not implemented" error. Please implement the method to handle transcription requests. Additionally, consider adding documentation to explain the purpose and expected behavior of this method.

Would you like assistance in implementing the transcription method or creating documentation for it?


async chat(options: ChatOptions) {
const messages = options.messages.map((v) => ({
// "error_code": 336006, "error_msg": "the role of message with even index in the messages must be user or function",
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/bytedance.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import {
LLMModel,
MultimodalContent,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import {
Expand Down Expand Up @@ -83,6 +84,10 @@ export class DoubaoApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}

async chat(options: ChatOptions) {
const messages = options.messages.map((v) => ({
role: v.role,
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/google.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import {
LLMModel,
LLMUsage,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import {
useAccessStore,
Expand Down Expand Up @@ -68,6 +69,10 @@ export class GeminiProApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}

async chat(options: ChatOptions): Promise<void> {
const apiClient = this;
let multimodal = false;
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/iflytek.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import {
LLMApi,
LLMModel,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import {
Expand Down Expand Up @@ -64,6 +65,10 @@ export class SparkApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines +68 to +70
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Implement the transcription method.

The transcription method has been added with the correct signature, which is good. However, it currently throws a "Method not implemented" error.

To complete this feature:

  1. Implement the transcription logic using the options parameter.
  2. Ensure proper error handling.
  3. Add unit tests for this new method.
  4. Update any relevant documentation.

Would you like assistance in implementing this method or creating a task to track its implementation?


async chat(options: ChatOptions) {
const messages: ChatOptions["messages"] = [];
for (const v of options.messages) {
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/moonshot.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ import {
LLMApi,
LLMModel,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import { getClientConfig } from "@/app/config/client";
import { getMessageTextContent } from "@/app/utils";
Expand Down Expand Up @@ -63,6 +64,10 @@ export class MoonshotApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}

async chat(options: ChatOptions) {
const messages: ChatOptions["messages"] = [];
for (const v of options.messages) {
Expand Down
42 changes: 42 additions & 0 deletions app/client/platforms/openai.ts
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ import {
LLMUsage,
MultimodalContent,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import { getClientConfig } from "@/app/config/client";
Expand Down Expand Up @@ -180,6 +181,47 @@ export class ChatGPTApi implements LLMApi {
}
}

async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());

console.log("[Request] openai audio transcriptions payload: ", options);

const controller = new AbortController();
options.onController?.(controller);

try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Incorrect number of arguments in this.path method call

In line 200, the this.path method is called with two arguments (OpenaiPath.TranscriptionPath, options.model), but the path method is defined to accept only one argument. This will result in a TypeError.

To fix this, update the path method signature to accept the optional model parameter:

-export class ChatGPTApi implements LLMApi {
-  // Existing code...
-  path(path: string): string {
+export class ChatGPTApi implements LLMApi {
+  // Existing code...
+  path(path: string, model?: string): string {
     // Method implementation...
   }

Ensure that you handle the model parameter within the path method as needed.

Committable suggestion was skipped due to low confidence.

const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};

// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
Comment on lines +215 to +218
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add response status check before parsing JSON

In lines 214-217, the code attempts to parse the JSON response without verifying if the request was successful. If the request fails, res.json() might throw an error or the response might not contain the expected text property.

Consider checking res.ok before parsing the response and handle errors appropriately:

const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
+ if (!res.ok) {
+   const errorText = await res.text();
+   console.error(`[Response] Transcription request failed: ${errorText}`);
+   throw new Error(`Transcription request failed with status ${res.status}`);
+ }
const json = await res.json();
return json.text;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
if (!res.ok) {
const errorText = await res.text();
console.error(`[Response] Transcription request failed: ${errorText}`);
throw new Error(`Transcription request failed with status ${res.status}`);
}
const json = await res.json();
return json.text;

} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and response parsing in the transcription method

The transcription method implementation looks good overall, but there are a few areas for improvement:

  1. Error handling could be more specific to provide better debugging information.
  2. The response parsing assumes the presence of a text property without checking.
  3. The headers variable is declared but not used in the fetch call.

Consider applying the following improvements:

  1. Enhance error handling:
 } catch (e) {
-  console.log("[Request] failed to make a audio transcriptions request", e);
-  throw e;
+  console.error("[Request] failed to make an audio transcriptions request", e);
+  throw new Error(`Transcription request failed: ${e.message}`);
 }
  1. Add response status check and error handling:
 const res = await fetch(path, payload);
 clearTimeout(requestTimeoutId);
+if (!res.ok) {
+  throw new Error(`Transcription request failed with status ${res.status}`);
+}
 const json = await res.json();
-return json.text;
+return json.text ?? '';
  1. Use the headers variable in the fetch call:
 const payload = {
   method: "POST",
   body: formData,
   signal: controller.signal,
-  headers: headers,
+  headers,
 };

These changes will improve the robustness and reliability of the transcription method.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers: headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
const json = await res.json();
return json.text;
} catch (e) {
console.log("[Request] failed to make a audio transcriptions request", e);
throw e;
}
}
async transcription(options: TranscriptionOptions): Promise<string> {
const formData = new FormData();
formData.append("file", options.file, "audio.wav");
formData.append("model", options.model ?? "whisper-1");
if (options.language) formData.append("language", options.language);
if (options.prompt) formData.append("prompt", options.prompt);
if (options.response_format)
formData.append("response_format", options.response_format);
if (options.temperature)
formData.append("temperature", options.temperature.toString());
console.log("[Request] openai audio transcriptions payload: ", options);
const controller = new AbortController();
options.onController?.(controller);
try {
const path = this.path(OpenaiPath.TranscriptionPath, options.model);
const headers = getHeaders(true);
const payload = {
method: "POST",
body: formData,
signal: controller.signal,
headers,
};
// make a fetch request
const requestTimeoutId = setTimeout(
() => controller.abort(),
REQUEST_TIMEOUT_MS,
);
const res = await fetch(path, payload);
clearTimeout(requestTimeoutId);
if (!res.ok) {
throw new Error(`Transcription request failed with status ${res.status}`);
}
const json = await res.json();
return json.text ?? '';
} catch (e) {
console.error("[Request] failed to make an audio transcriptions request", e);
throw new Error(`Transcription request failed: ${e.message}`);
}
}


async chat(options: ChatOptions) {
const modelConfig = {
...useAppConfig.getState().modelConfig,
Expand Down
5 changes: 5 additions & 0 deletions app/client/platforms/tencent.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import {
LLMModel,
MultimodalContent,
SpeechOptions,
TranscriptionOptions,
} from "../api";
import Locale from "../../locales";
import {
Expand Down Expand Up @@ -93,6 +94,10 @@ export class HunyuanApi implements LLMApi {
throw new Error("Method not implemented.");
}

transcription(options: TranscriptionOptions): Promise<string> {
throw new Error("Method not implemented.");
}
Comment on lines 96 to +99
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider implementing both speech and transcription methods

Both the speech and transcription methods are currently unimplemented. To fully realize the speech-to-text and text-to-speech functionality mentioned in the PR summary, both methods should be implemented.

Would you like assistance in drafting implementations for both methods? This would ensure the full functionality of the new feature.


async chat(options: ChatOptions) {
const visionModel = isVisionModel(options.config.model);
const messages = options.messages.map((v, index) => ({
Expand Down
55 changes: 54 additions & 1 deletion app/components/chat.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import React, {
} from "react";

import SendWhiteIcon from "../icons/send-white.svg";
import VoiceWhiteIcon from "../icons/voice-white.svg";
import BrainIcon from "../icons/brain.svg";
import RenameIcon from "../icons/rename.svg";
import ExportIcon from "../icons/share.svg";
Expand Down Expand Up @@ -72,6 +73,7 @@ import {
isDalle3,
showPlugins,
safeLocalStorage,
isFirefox,
} from "../utils";

import { uploadImage as uploadImageRemote } from "@/app/utils/chat";
Expand All @@ -98,7 +100,9 @@ import {
import { useNavigate } from "react-router-dom";
import {
CHAT_PAGE_SIZE,
DEFAULT_STT_ENGINE,
DEFAULT_TTS_ENGINE,
FIREFOX_DEFAULT_STT_ENGINE,
ModelProvider,
Path,
REQUEST_TIMEOUT_MS,
Expand All @@ -118,6 +122,7 @@ import { MultimodalContent } from "../client/api";
const localStorage = safeLocalStorage();
import { ClientApi } from "../client/api";
import { createTTSPlayer } from "../utils/audio";
import { OpenAITranscriptionApi, WebTranscriptionApi } from "../utils/speech";
import { MsEdgeTTS, OUTPUT_FORMAT } from "../utils/ms_edge_tts";

const ttsPlayer = createTTSPlayer();
Expand Down Expand Up @@ -546,6 +551,44 @@ export function ChatActions(props: {
}
}, [chatStore, currentModel, models]);

const [isListening, setIsListening] = useState(false);
const [isTranscription, setIsTranscription] = useState(false);
const [speechApi, setSpeechApi] = useState<any>(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using a more specific type for speechApi state

The speechApi state is initialized with any type. Consider using a more specific type to improve type safety.

- const [speechApi, setSpeechApi] = useState<any>(null);
+ const [speechApi, setSpeechApi] = useState<WebTranscriptionApi | OpenAITranscriptionApi | null>(null);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const [speechApi, setSpeechApi] = useState<any>(null);
const [speechApi, setSpeechApi] = useState<WebTranscriptionApi | OpenAITranscriptionApi | null>(null);


useEffect(() => {
if (isFirefox()) config.sttConfig.engine = FIREFOX_DEFAULT_STT_ENGINE;
setSpeechApi(
config.sttConfig.engine === DEFAULT_STT_ENGINE
? new WebTranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
)
: new OpenAITranscriptionApi((transcription) =>
onRecognitionEnd(transcription),
),
);
}, []);

const startListening = async () => {
if (speechApi) {
await speechApi.start();
setIsListening(true);
}
};
const stopListening = async () => {
if (speechApi) {
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(true);
await speechApi.stop();
setIsListening(false);
}
};
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove console.log in production code

There's a console.log statement in the onRecognitionEnd function. Consider removing it or replacing it with a more appropriate logging mechanism for production code.

  const onRecognitionEnd = (finalTranscript: string) => {
-   console.log(finalTranscript);
    if (finalTranscript) props.setUserInput(finalTranscript);
    if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
      setIsTranscription(false);
  };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const onRecognitionEnd = (finalTranscript: string) => {
console.log(finalTranscript);
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};
const onRecognitionEnd = (finalTranscript: string) => {
if (finalTranscript) props.setUserInput(finalTranscript);
if (config.sttConfig.engine !== DEFAULT_STT_ENGINE)
setIsTranscription(false);
};


return (
<div className={styles["chat-input-actions"]}>
{couldStop && (
Expand Down Expand Up @@ -780,6 +823,16 @@ export function ChatActions(props: {
icon={<ShortcutkeyIcon />}
/>
)}

{config.sttConfig.enable && (
<ChatAction
onClick={async () =>
isListening ? await stopListening() : await startListening()
}
text={isListening ? Locale.Chat.StopSpeak : Locale.Chat.StartSpeak}
icon={<VoiceWhiteIcon />}
/>
)}
</div>
);
}
Expand Down Expand Up @@ -1505,7 +1558,7 @@ function _Chat() {
setAttachImages(images);
}

// 快捷键 shortcut keys
// 快捷键
const [showShortcutKeyModal, setShowShortcutKeyModal] = useState(false);

useEffect(() => {
Expand Down
12 changes: 12 additions & 0 deletions app/components/settings.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ import { nanoid } from "nanoid";
import { useMaskStore } from "../store/mask";
import { ProviderType } from "../utils/cloud";
import { TTSConfigList } from "./tts-config";
import { STTConfigList } from "./stt-config";

function EditPromptModal(props: { id: string; onClose: () => void }) {
const promptStore = usePromptStore();
Expand Down Expand Up @@ -1703,6 +1704,17 @@ export function Settings() {
/>
</List>

<List>
<STTConfigList
sttConfig={config.sttConfig}
updateConfig={(updater) => {
const sttConfig = { ...config.sttConfig };
updater(sttConfig);
config.update((config) => (config.sttConfig = sttConfig));
}}
/>
</List>

<DangerItems />
</div>
</ErrorBoundary>
Expand Down
Loading
Loading