Skip to content

Latest commit

 

History

History
1018 lines (721 loc) · 12.6 KB

reference.md

File metadata and controls

1018 lines (721 loc) · 12.6 KB

Reference

ApiStatus

client.apiStatus.get() -> Cartesia.ApiInfo

🔌 Usage

await client.apiStatus.get();

⚙️ Parameters

requestOptions: ApiStatus.RequestOptions

Datasets

client.datasets.list() -> Cartesia.PaginatedDatasets

🔌 Usage

await client.datasets.list();

⚙️ Parameters

requestOptions: Datasets.RequestOptions

client.datasets.create({ ...params }) -> Cartesia.Dataset

🔌 Usage

await client.datasets.create({
    name: "name",
});

⚙️ Parameters

request: Cartesia.CreateDatasetRequest

requestOptions: Datasets.RequestOptions

client.datasets.listFiles(id) -> Cartesia.PaginatedDatasetFiles

🔌 Usage

await client.datasets.listFiles("id");

⚙️ Parameters

id: string

requestOptions: Datasets.RequestOptions

Infill

client.infill.bytes(leftAudio, rightAudio, { ...params }) -> stream.Readable

📝 Description

Generate audio that smoothly connects two existing audio segments. This is useful for inserting new speech between existing speech segments while maintaining natural transitions.

The cost is 1 credit per character of the infill text plus a fixed cost of 300 credits.

Only the sonic-preview model is supported for infill at this time.

At least one of left_audio or right_audio must be provided.

As with all generative models, there's some inherent variability, but here's some tips we recommend to get the best results from infill:

  • Use longer infill transcripts
    • This gives the model more flexibility to adapt to the rest of the audio
  • Target natural pauses in the audio when deciding where to clip
    • This means you don't need word-level timestamps to be as precise
  • Clip right up to the start and end of the audio segment you want infilled, keeping as much silence in the left/right audio segments as possible
    • This helps the model generate more natural transitions

🔌 Usage

await client.infill.bytes(fs.createReadStream("/path/to/your/file"), fs.createReadStream("/path/to/your/file"), {
    modelId: "sonic-preview",
    language: "en",
    transcript: "middle segment",
    voiceId: "694f9389-aac1-45b6-b726-9d9369183238",
    outputFormatContainer: "mp3",
    outputFormatSampleRate: 44100,
    outputFormatBitRate: 128000,
    voiceExperimentalControlsSpeed: "slowest",
    voiceExperimentalControlsEmotion: ["surprise:high", "curiosity:high"],
});

⚙️ Parameters

leftAudio: File | fs.ReadStream | Blob

rightAudio: File | fs.ReadStream | Blob

request: Cartesia.InfillBytesRequest

requestOptions: Infill.RequestOptions

Tts

client.tts.bytes({ ...params }) -> stream.Readable

🔌 Usage

await client.tts.bytes({
    modelId: "sonic-english",
    transcript: "Hello, world!",
    voice: {
        mode: "id",
        id: "694f9389-aac1-45b6-b726-9d9369183238",
    },
    language: "en",
    outputFormat: {
        container: "mp3",
        sampleRate: 44100,
        bitRate: 128000,
    },
});

⚙️ Parameters

request: Cartesia.TtsRequest

requestOptions: Tts.RequestOptions

client.tts.sse({ ...params }) -> core.Stream

🔌 Usage

const response = await client.tts.sse({
    modelId: "sonic-english",
    transcript: "Hello, world!",
    voice: {
        mode: "id",
        id: "694f9389-aac1-45b6-b726-9d9369183238",
    },
    language: "en",
    outputFormat: {
        container: "raw",
        sampleRate: 44100,
        encoding: "pcm_f32le",
    },
});
for await (const item of response) {
    console.log(item);
}

⚙️ Parameters

request: Cartesia.TtsRequest

requestOptions: Tts.RequestOptions

VoiceChanger

client.voiceChanger.bytes(clip, { ...params }) -> stream.Readable

📝 Description

Takes an audio file of speech, and returns an audio file of speech spoken with the same intonation, but with a different voice.

This endpoint is priced at 15 characters per second of input audio.

🔌 Usage

await client.voiceChanger.bytes(fs.createReadStream("/path/to/your/file"), {
    voiceId: "694f9389-aac1-45b6-b726-9d9369183238",
    outputFormatContainer: "mp3",
    outputFormatSampleRate: 44100,
    outputFormatBitRate: 128000,
});

⚙️ Parameters

clip: File | fs.ReadStream | Blob

request: Cartesia.VoiceChangerBytesRequest

requestOptions: VoiceChanger.RequestOptions

client.voiceChanger.sse(clip, { ...params }) -> core.Stream

🔌 Usage

const response = await client.voiceChanger.sse(fs.createReadStream("/path/to/your/file"), {
    voiceId: "694f9389-aac1-45b6-b726-9d9369183238",
    outputFormatContainer: "mp3",
    outputFormatSampleRate: 44100,
    outputFormatBitRate: 128000,
});
for await (const item of response) {
    console.log(item);
}

⚙️ Parameters

clip: File | fs.ReadStream | Blob

request: Cartesia.VoiceChangerSseRequest

requestOptions: VoiceChanger.RequestOptions

Voices

client.voices.list() -> Cartesia.Voice[]

🔌 Usage

await client.voices.list();

⚙️ Parameters

requestOptions: Voices.RequestOptions

client.voices.clone(clip, { ...params }) -> Cartesia.VoiceMetadata

📝 Description

Clone a voice from an audio clip. This endpoint has two modes, stability and similarity.

Similarity mode clones are more similar to the source clip, but may reproduce background noise. For these, use an audio clip about 5 seconds long.

Stability mode clones are more stable, but may not sound as similar to the source clip. For these, use an audio clip 10-20 seconds long.

🔌 Usage

await client.voices.clone(fs.createReadStream("/path/to/your/file"), {
    name: "A high-stability cloned voice",
    description: "Copied from Cartesia docs",
    mode: "stability",
    language: "en",
    enhance: true,
});

⚙️ Parameters

clip: File | fs.ReadStream | Blob

request: Cartesia.CloneVoiceRequest

requestOptions: Voices.RequestOptions

client.voices.delete(id) -> void

🔌 Usage

await client.voices.delete("id");

⚙️ Parameters

id: Cartesia.VoiceId

requestOptions: Voices.RequestOptions

client.voices.update(id, { ...params }) -> Cartesia.Voice

🔌 Usage

await client.voices.update("id", {
    name: "name",
    description: "description",
});

⚙️ Parameters

id: Cartesia.VoiceId

request: Cartesia.UpdateVoiceRequest

requestOptions: Voices.RequestOptions

client.voices.get(id) -> Cartesia.Voice

🔌 Usage

await client.voices.get("id");

⚙️ Parameters

id: Cartesia.VoiceId

requestOptions: Voices.RequestOptions

client.voices.localize({ ...params }) -> Cartesia.EmbeddingResponse

🔌 Usage

await client.voices.localize({
    embedding: [1.1, 1.1],
    language: "en",
    originalSpeakerGender: "male",
    dialect: undefined,
});

⚙️ Parameters

request: Cartesia.LocalizeVoiceRequest

requestOptions: Voices.RequestOptions

client.voices.mix({ ...params }) -> Cartesia.EmbeddingResponse

🔌 Usage

await client.voices.mix({
    voices: [
        {
            id: "id",
            weight: 1.1,
        },
        {
            id: "id",
            weight: 1.1,
        },
    ],
});

⚙️ Parameters

request: Cartesia.MixVoicesRequest

requestOptions: Voices.RequestOptions

client.voices.create({ ...params }) -> Cartesia.Voice

📝 Description

Create voice from raw features. If you'd like to clone a voice from an audio file, please use Clone Voice instead.

🔌 Usage

await client.voices.create({
    name: "My Custom Voice",
    description: "A custom voice created through the API",
    embedding: [],
    language: "en",
    baseVoiceId: "123e4567-e89b-12d3-a456-426614174000",
});

⚙️ Parameters

request: Cartesia.CreateVoiceRequest

requestOptions: Voices.RequestOptions