Skip to main content
The official Java SDK for Camb.ai provides convenient access to text-to-speech, dubbing, translation, transcription, audio separation, voice cloning, and audio generation. Requests use a fluent builder pattern; async jobs follow a typed submit-poll-fetch workflow with TaskStatus enums.

Installation

Gradle

Add the dependency to your build.gradle:
dependencies {
  implementation 'ai.camb:cambai-java-sdk:1.5.9'
}

Maven

Add the dependency to your pom.xml:
<dependency>
  <groupId>ai.camb</groupId>
  <artifactId>cambai-java-sdk</artifactId>
  <version>1.5.9</version>
</dependency>

Authentication

Get your API key from CAMB.AI Studio and set it as an environment variable:
export CAMB_API_KEY=your_api_key_here

Quick Start

Streaming Text-to-Speech

The SDK returns an InputStream for TTS audio so you can stream the response directly to disk:
The generated SDK classes in this repository are in the default Java package (no package ...; declaration). To keep this snippet runnable without extra packaging changes, put Main in the default package too.
import resources.texttospeech.requests.CreateStreamTtsRequestPayload;
import resources.texttospeech.types.CreateStreamTtsRequestPayloadLanguage;
import resources.texttospeech.types.CreateStreamTtsRequestPayloadSpeechModel;
import types.OutputFormat;
import types.StreamTtsOutputConfiguration;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;

public class Main {
  private static void saveStreamToFile(InputStream stream, String filename) throws IOException {
    try (InputStream in = stream; FileOutputStream out = new FileOutputStream(filename)) {
      byte[] buffer = new byte[4096];
      int bytesRead;
      while ((bytesRead = in.read(buffer)) != -1) {
        out.write(buffer, 0, bytesRead);
      }
    }
  }

  public static void main(String[] args) {
    String apiKey = System.getenv("CAMB_API_KEY");
    if (apiKey == null || apiKey.isEmpty()) {
      throw new IllegalStateException("Missing CAMB_API_KEY environment variable.");
    }

    CambApiClient client = CambApiClient.builder()
      .apiKey(apiKey)
      .build();

    InputStream audioStream = client.textToSpeech().tts(
      CreateStreamTtsRequestPayload.builder()
        .text("Hello! Welcome to Camb.ai text-to-speech.")
        .language(CreateStreamTtsRequestPayloadLanguage.EN_US)
        .voiceId(147320)
        .speechModel(CreateStreamTtsRequestPayloadSpeechModel.MARSFLASH)
        .outputConfiguration(StreamTtsOutputConfiguration.builder().format(OutputFormat.WAV).build())
        .build()
    );

    try {
      saveStreamToFile(audioStream, "output.wav");
      System.out.println("Audio saved to output.wav");
    } catch (IOException e) {
      throw new RuntimeException("Failed to save TTS output file", e);
    }
  }
}
The snippets in the sections below assume client is already initialized as shown above.

Models / Configuration

Camb.ai uses MARS models configured through speechModel on the streaming TTS request:
.speechModel(CreateStreamTtsRequestPayloadSpeechModel.MARSFLASH)
// Sample rate: 22.05kHz
To control the output encoding, set outputConfiguration.format with types.OutputFormat:
OutputFormatValue sent to the API
OutputFormat.WAVwav
OutputFormat.FLACflac
OutputFormat.MP3mp3
OutputFormat.ADTSadts

TTS Options

textToSpeech().tts(...) accepts CreateStreamTtsRequestPayload. Set the core request fields plus optional controls with the request builder.
Builder methodDescription
.text(...)Text to synthesize. For MARS Instruct, you can include inline emotion or pacing tags in the text.
.language(...)Locale such as CreateStreamTtsRequestPayloadLanguage.EN_US.
.voiceId(...)Voice profile ID from voiceCloning().listVoices().
.speechModel(...)Model to use, such as MARSFLASH, MARSPRO, or MARSINSTRUCT.
.userInstructions(...)Adds style, tone, pronunciation, or delivery guidance for the request. Available only with MARSINSTRUCT.
.outputConfiguration(...)Output settings such as audio format.
.voiceSettings(...)Voice behavior controls such as speaking rate, reference enhancement, or accent preservation.
.inferenceOptions(...)Advanced generation controls for supported models.
.enhanceNamedEntitiesPronunciation(...)Improves pronunciation for names and other named entities when supported.

Text-to-Speech

The textToSpeech().tts(...) call streams audio as an InputStream. Add userInstructions when using MARSINSTRUCT to control delivery style:
InputStream audioStream = client.textToSpeech().tts(
  CreateStreamTtsRequestPayload.builder()
    .text("A warm greeting, delivered naturally.")
    .language(CreateStreamTtsRequestPayloadLanguage.EN_US)
    .voiceId(147320)
    .speechModel(CreateStreamTtsRequestPayloadSpeechModel.MARSINSTRUCT)
    .userInstructions("Speak with a friendly, upbeat tone.")
    .outputConfiguration(StreamTtsOutputConfiguration.builder().format(OutputFormat.WAV).build())
    .build()
);
// Write audioStream to a file using saveStreamToFile as shown in Quick Start.

Voice Cloning

Voice cloning uses the voice library to list available voices and create custom clones from short reference audio samples:

List voices

List<ListVoicesListVoicesGetResponseItem> voices = client.voiceCloning().listVoices();
for (ListVoicesListVoicesGetResponseItem item : voices) {
  item.visit(new ListVoicesListVoicesGetResponseItem.Visitor<Void>() {
    @Override
    public Void visit(Voice voice) {
      System.out.println("ID: " + voice.getId() + ", Name: " + voice.getVoiceName());
      return null;
    }
    @Override
    public Void visit(Map<String, Object> value) { return null; }
  });
}

Create a custom voice

Upload a reference recording along with a display name and a gender identifier:
File referenceFile = new File("reference.wav");

CreateCustomVoiceOut created = client.voiceCloning().createCustomVoice(
  referenceFile,
  BodyCreateCustomVoiceCreateCustomVoicePost.builder()
    .voiceName("My Custom Voice")
    .gender(1) // gender values are API-defined; see endpoint docs for mapping
    .description("Warm and conversational.")
    .enhanceAudio(true)
    .language(Languages.EN_US.getValue())
    .build()
);
System.out.println("Created voice_id: " + created.getVoiceId());

Dubbing

Dubbing is an asynchronous pipeline: you submit a job, poll until it succeeds, then fetch dubbed run information:
OrchestratorPipelineCallResult submitted = client.dub().endToEndDubbing(
  EndToEndDubbingRequestPayload.builder()
    .videoUrl("https://example.com/video.mp4")
    .sourceLanguage(Languages.EN_US.getValue())
    .targetLanguages(Collections.singletonList(Languages.HI_IN.getValue()))
    .build()
);

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.dub().getEndToEndDubbingStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Dubbing failed");
  Thread.sleep(5000);
}

GetDubbedRunInfoDubResultRunIdGetResponse result = client.dub().getDubbedRunInfo(Optional.of(runId));
result.visit(new GetDubbedRunInfoDubResultRunIdGetResponse.Visitor<Void>() {
  @Override
  public Void visit(DubbingResult r) {
    System.out.println("audio_url: " + r.getAudioUrl());
    return null;
  }
  @Override
  public Void visit(Map<String, DubbingResult> map) {
    map.forEach((lang, r) -> System.out.println(lang + ": " + r.getAudioUrl()));
    return null;
  }
});

Translation

Translation is asynchronous: create a task, poll status, then fetch translated text:
// createTranslation returns Object; convert via ObjectMappers to read task_id.
Object submittedObj = client.translation().createTranslation(
  CreateTranslationRequestPayload.builder()
    .texts(Arrays.asList("Hello, how are you?", "Welcome to Camb.ai."))
    .sourceLanguage(Languages.EN_US.getValue())
    .targetLanguage(Languages.FR_FR.getValue())
    .build()
);

OrchestratorPipelineCallResult submitted =
  ObjectMappers.JSON_MAPPER.convertValue(submittedObj, OrchestratorPipelineCallResult.class);
String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.translation().getTranslationTaskStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Translation failed");
  Thread.sleep(2000);
}

TranslationResult result = client.translation().getTranslationResult(Optional.of(runId));
result.getTexts().forEach(System.out::println);

Transcription

Transcription is an asynchronous pipeline. Submit an audio URL, poll until success, then fetch the structured transcript:
OrchestratorPipelineCallResult submitted = client.transcription().createTranscription(
  Optional.empty(),
  Optional.empty(),
  BodyCreateTranscriptionTranscribePost.builder()
    .language(Languages.EN_US.getValue())
    .mediaUrl("https://example.com/audio.mp3")
    .build()
);

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.transcription().getTranscriptionTaskStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Transcription failed");
  Thread.sleep(3000);
}

TranscriptionResult result = client.transcription().getTranscriptionResult(
  Optional.of(runId),
  GetTranscriptionResultTranscriptionResultRunIdGetRequest.builder().wordLevelTimestamps(true).build()
);
result.getTranscript().forEach(t ->
  System.out.println(t.getStart() + " - " + t.getEnd() + ": " + t.getText())
);

Audio separation

Audio separation uploads a local file, polls completion, then returns download URLs for the separated stems:
OrchestratorPipelineCallResult submitted =
  client.audioSeparation().createAudioSeparation(Optional.of(new File("track.mp3")));

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.audioSeparation().getAudioSeparationStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Audio separation failed");
  Thread.sleep(3000);
}

GetAudioSeparationResultOut result = client.audioSeparation().getAudioSeparationRunInfo(Optional.of(runId));
System.out.println("foreground: " + result.getForegroundAudioUrl());
System.out.println("background: " + result.getBackgroundAudioUrl());

Text-to-voice

Text-to-voice is asynchronous. Create a voice from a description, poll until success, then read preview URLs:
OrchestratorPipelineCallResult submitted = client.textToVoice().createTextToVoice(
  CreateTextToVoiceRequestPayload.builder()
    .text("A confident narrator introducing a documentary.")
    .voiceDescription("A smooth, rich baritone voice with gentle warmth.")
    .build()
);

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.textToVoice().getTextToVoiceStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Text-to-voice failed");
  Thread.sleep(2000);
}

GetTextToVoiceResultOut result = client.textToVoice().getTextToVoiceResult(Optional.of(runId));
result.getPreviews().forEach(url -> System.out.println("Preview: " + url));

Text-to-audio

Text-to-audio is asynchronous. Submit a prompt, poll until it succeeds, then download the resulting audio stream:
OrchestratorPipelineCallResult submitted = client.textToAudio().createTextToAudio(
  CreateTextToAudioRequestPayload.builder()
    .prompt("Heavy rain on a tin roof at night with distant thunder.")
    .duration(15.0)
    .audioType(TextToAudioType.SOUND)
    .build()
);

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.textToAudio().getTextToAudioStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Text-to-audio failed");
  Thread.sleep(3000);
}

InputStream audioStream = client.textToAudio().getTextToAudioResult(Optional.of(runId));
saveStreamToFile(audioStream, "soundscape.wav"); // saveStreamToFile defined in Quick Start

Stories

The Stories endpoint ingests a document file and generates narrated audio asynchronously. The client returns a union response for submission, so you extract task_id with visit(...):
CreateStoryStoryPostResponse submitted = client.story().createStory(
  new File("story.pdf"),
  BodyCreateStoryStoryPost.builder()
    .sourceLanguage(Languages.EN_US.getValue())
    .title("My Story")
    .build()
);

final String[] taskIdHolder = new String[1];
submitted.visit(new CreateStoryStoryPostResponse.Visitor<Void>() {
  @Override
  public Void visit(OrchestratorPipelineCallResult value) {
    taskIdHolder[0] = value.getTaskId().orElseThrow();
    return null;
  }
  @Override
  public Void visit(GetSetupStoryResultResponse value) {
    throw new RuntimeException("Unexpected setup response");
  }
});

String taskId = taskIdHolder[0];
Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.story().getStoryStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Stories task failed");
  Thread.sleep(5000);
}

Map<String, Object> runInfo = client.story().getStoryRunInfo(Optional.of(runId));
System.out.println("Story run info: " + runInfo);

Translated TTS

Translated TTS is asynchronous. Create a translated TTS task, poll until success, then inspect the success payload via OrchestratorPipelineResult.getAdditionalProperties():
CreateTranslatedTtsOut created = client.translatedTts().createTranslatedTts(
  CreateTranslatedTtsRequestPayload.builder()
    .text("Good morning, welcome to our service.")
    .voiceId(147320)
    .sourceLanguage(Languages.EN_US.getValue())
    .targetLanguage(Languages.HI_IN.getValue())
    .build()
);

while (true) {
  OrchestratorPipelineResult status =
    client.translatedTts().getTranslatedTtsTaskStatus(created.getTaskId());
  if (status.getStatus() == TaskStatus.SUCCESS) {
    System.out.println(status.getAdditionalProperties());
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Translated TTS failed");
  Thread.sleep(3000);
}

Dictionaries

Dictionaries are shared term mappings that APIs can use to handle terminology consistently across transcription, dubbing, and translation:

List dictionaries

List<DictionaryWithTerms> dictionaries = client.dictionaries().getDictionaries();
dictionaries.forEach(d -> System.out.println(d.getId() + ": " + d.getName()));

Create from file and manage terms

// Create a dictionary from a CSV file.
Object created = client.dictionaries().createDictionaryFromFile(
  new File("terms.csv"),
  BodyCreateDictionaryFromFileDictionariesCreateFromFilePost.builder()
    .dictionaryName("Product Terms")
    .dictionaryDescription("Brand-specific terminology.")
    .build()
);

// Add a term to an existing dictionary.
int dictionaryId = 123;
client.dictionaries().addTermToDictionary(
  dictionaryId,
  AddDictionaryTermPayload.builder()
    .translations(Arrays.asList(
      TermTranslationInput.builder()
        .translation("Camb.ai")
        .language(Languages.HI_IN.getValue())
        .build()
    ))
    .build()
);

// Remove a specific term.
client.dictionaries().deleteDictionaryTerm(dictionaryId, /* termId */ 456);

Custom Providers

Custom hosting providers are implemented as ITtsProvider instances. You call provider.tts(request, requestOptions) directly instead of routing through CambApiClient:
import core.RequestOptions;
import resources.texttospeech.requests.CreateStreamTtsRequestPayload;
import com.fasterxml.jackson.databind.ObjectMapper;
import okhttp3.MediaType;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;

import java.io.InputStream;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

// Minimal Baseten provider implementation (based on the SDK example).
class BasetenProvider implements ITtsProvider {
  private final String apiKey;
  private final String url;
  private final String referenceAudio;
  private final String referenceLanguage;
  private final OkHttpClient httpClient;
  private final ObjectMapper objectMapper;

  public BasetenProvider(String apiKey, String url, String referenceAudio, String referenceLanguage) {
    this.apiKey = apiKey;
    this.url = url;
    this.referenceAudio = referenceAudio;
    this.referenceLanguage = referenceLanguage;
    this.httpClient = new OkHttpClient();
    this.objectMapper = new ObjectMapper();
  }

  @Override
  public InputStream tts(CreateStreamTtsRequestPayload request, RequestOptions requestOptions) {
    String language = request.getLanguage().toString().toLowerCase().replace("_", "-");

    Map<String, Object> payload = new HashMap<>();
    payload.put("text", request.getText());
    payload.put("language", language);
    payload.put("output_duration", null);
    payload.put("reference_audio", referenceAudio);
    payload.put("reference_language", referenceLanguage);
    payload.put("output_format", "flac");
    payload.put("apply_ner_nlp", false);

    request.getOutputConfiguration().ifPresent(config -> {
      config.getFormat().ifPresent(f -> payload.put("output_format", f.toString().toLowerCase()));
    });

    try {
      String json = objectMapper.writeValueAsString(payload);
      RequestBody body = RequestBody.create(json, MediaType.parse("application/json"));

      Request req = new Request.Builder()
        .url(this.url)
        .addHeader("Authorization", "Api-Key " + this.apiKey)
        .post(body)
        .build();

      Response response = httpClient.newCall(req).execute();
      if (!response.isSuccessful()) {
        String errorBody = response.body() != null ? response.body().string() : "<no body>";
        throw new RuntimeException("Baseten API error " + response.code() + ": " + errorBody);
      }

      return response.body().byteStream();
    } catch (IOException e) {
      throw new RuntimeException("Network error calling Baseten: " + e.getMessage(), e);
    }
  }
}

// Usage: instantiate the provider and call tts() directly.
ITtsProvider provider = new BasetenProvider(
  System.getenv("BASETEN_API_KEY"),
  System.getenv("BASETEN_URL"),
  System.getenv("BASETEN_REFERENCE_AUDIO"),
  "en-us"
);

InputStream audioStream = provider.tts(
  CreateStreamTtsRequestPayload.builder()
    .text("Hello from Java via Baseten Mars8-Flash!")
    .language(CreateStreamTtsRequestPayloadLanguage.EN_US)
    .voiceId(1)
    .build(),
  null
);
saveStreamToFile(audioStream, "baseten_output.wav");

Next Steps

https://mintcdn.com/cambai/2LvnefIkletroPxv/images/pipecat-orange.svg?fit=max&auto=format&n=2LvnefIkletroPxv&q=85&s=40cf8e001b8cadc8a4c3c557dea603d5

Voice Agents

Build real-time voice agents with Pipecat
https://mintcdn.com/cambai/2LvnefIkletroPxv/images/livekit-orange.svg?fit=max&auto=format&n=2LvnefIkletroPxv&q=85&s=c750fcee9b1de69e3c1d0d6ec7eb6b3f

LiveKit Integration

Create voice agents with LiveKit

API Reference

Explore the full TTS API

Voice Library

Browse available voices

Resources