Java SDK - Camb.ai

The official Java SDK for Camb.ai provides convenient access to text-to-speech, dubbing, translation, transcription, audio separation, voice cloning, and audio generation. Requests use a fluent builder pattern; async jobs follow a typed submit-poll-fetch workflow with TaskStatus enums.

Installation

Gradle

Add the dependency to your build.gradle:

dependencies {
  implementation 'ai.camb:cambai-java-sdk:1.5.9'
}

Maven

Add the dependency to your pom.xml:

<dependency>
  <groupId>ai.camb</groupId>
  <artifactId>cambai-java-sdk</artifactId>
  <version>1.5.9</version>
</dependency>

Authentication

Get your API key from CAMB.AI Studio and set it as an environment variable:

export CAMB_API_KEY=your_api_key_here

Quick Start

Streaming Text-to-Speech

The SDK returns an InputStream for TTS audio so you can stream the response directly to disk:

The generated SDK classes in this repository are in the default Java package (no package ...; declaration). To keep this snippet runnable without extra packaging changes, put Main in the default package too.

import resources.texttospeech.requests.CreateStreamTtsRequestPayload;
import resources.texttospeech.types.CreateStreamTtsRequestPayloadLanguage;
import resources.texttospeech.types.CreateStreamTtsRequestPayloadSpeechModel;
import types.OutputFormat;
import types.StreamTtsOutputConfiguration;

import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;

public class Main {
  private static void saveStreamToFile(InputStream stream, String filename) throws IOException {
    try (InputStream in = stream; FileOutputStream out = new FileOutputStream(filename)) {
      byte[] buffer = new byte[4096];
      int bytesRead;
      while ((bytesRead = in.read(buffer)) != -1) {
        out.write(buffer, 0, bytesRead);
      }
    }
  }

  public static void main(String[] args) {
    String apiKey = System.getenv("CAMB_API_KEY");
    if (apiKey == null || apiKey.isEmpty()) {
      throw new IllegalStateException("Missing CAMB_API_KEY environment variable.");
    }

    CambApiClient client = CambApiClient.builder()
      .apiKey(apiKey)
      .build();

    InputStream audioStream = client.textToSpeech().tts(
      CreateStreamTtsRequestPayload.builder()
        .text("Hello! Welcome to Camb.ai text-to-speech.")
        .language(CreateStreamTtsRequestPayloadLanguage.EN_US)
        .voiceId(147320)
        .speechModel(CreateStreamTtsRequestPayloadSpeechModel.MARSFLASH)
        .outputConfiguration(StreamTtsOutputConfiguration.builder().format(OutputFormat.WAV).build())
        .build()
    );

    try {
      saveStreamToFile(audioStream, "output.wav");
      System.out.println("Audio saved to output.wav");
    } catch (IOException e) {
      throw new RuntimeException("Failed to save TTS output file", e);
    }
  }
}

The snippets in the sections below assume client is already initialized as shown above.

Models / Configuration

Camb.ai uses MARS models configured through speechModel on the streaming TTS request:

.speechModel(CreateStreamTtsRequestPayloadSpeechModel.MARSFLASH)
// Sample rate: 22.05kHz

To control the output encoding, set outputConfiguration.format with types.OutputFormat:

`OutputFormat`	Value sent to the API
`OutputFormat.WAV`	`wav`
`OutputFormat.FLAC`	`flac`
`OutputFormat.MP3`	`mp3`
`OutputFormat.ADTS`	`adts`

Text-to-Speech

The textToSpeech().tts(...) call streams audio as an InputStream. Add userInstructions when using MARSINSTRUCT to control delivery style:

InputStream audioStream = client.textToSpeech().tts(
  CreateStreamTtsRequestPayload.builder()
    .text("A warm greeting, delivered naturally.")
    .language(CreateStreamTtsRequestPayloadLanguage.EN_US)
    .voiceId(147320)
    .speechModel(CreateStreamTtsRequestPayloadSpeechModel.MARSINSTRUCT)
    .userInstructions("Speak with a friendly, upbeat tone.")
    .outputConfiguration(StreamTtsOutputConfiguration.builder().format(OutputFormat.WAV).build())
    .build()
);
// Write audioStream to a file using saveStreamToFile as shown in Quick Start.

Voice Cloning

Voice cloning uses the voice library to list available voices and create custom clones from short reference audio samples:

List voices

List<ListVoicesListVoicesGetResponseItem> voices = client.voiceCloning().listVoices();
for (ListVoicesListVoicesGetResponseItem item : voices) {
  item.visit(new ListVoicesListVoicesGetResponseItem.Visitor<Void>() {
    @Override
    public Void visit(Voice voice) {
      System.out.println("ID: " + voice.getId() + ", Name: " + voice.getVoiceName());
      return null;
    }
    @Override
    public Void visit(Map<String, Object> value) { return null; }
  });
}

Create a custom voice

Upload a reference recording along with a display name and a gender identifier:

File referenceFile = new File("reference.wav");

CreateCustomVoiceOut created = client.voiceCloning().createCustomVoice(
  referenceFile,
  BodyCreateCustomVoiceCreateCustomVoicePost.builder()
    .voiceName("My Custom Voice")
    .gender(1) // gender values are API-defined; see endpoint docs for mapping
    .description("Warm and conversational.")
    .enhanceAudio(true)
    .language(Languages.EN_US.getValue())
    .build()
);
System.out.println("Created voice_id: " + created.getVoiceId());

Dubbing

Dubbing is an asynchronous pipeline: you submit a job, poll until it succeeds, then fetch dubbed run information:

OrchestratorPipelineCallResult submitted = client.dub().endToEndDubbing(
  EndToEndDubbingRequestPayload.builder()
    .videoUrl("https://example.com/video.mp4")
    .sourceLanguage(Languages.EN_US.getValue())
    .targetLanguages(Collections.singletonList(Languages.HI_IN.getValue()))
    .build()
);

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.dub().getEndToEndDubbingStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Dubbing failed");
  Thread.sleep(5000);
}

GetDubbedRunInfoDubResultRunIdGetResponse result = client.dub().getDubbedRunInfo(Optional.of(runId));
result.visit(new GetDubbedRunInfoDubResultRunIdGetResponse.Visitor<Void>() {
  @Override
  public Void visit(DubbingResult r) {
    System.out.println("audio_url: " + r.getAudioUrl());
    return null;
  }
  @Override
  public Void visit(Map<String, DubbingResult> map) {
    map.forEach((lang, r) -> System.out.println(lang + ": " + r.getAudioUrl()));
    return null;
  }
});

Translation

Translation is asynchronous: create a task, poll status, then fetch translated text:

// createTranslation returns Object; convert via ObjectMappers to read task_id.
Object submittedObj = client.translation().createTranslation(
  CreateTranslationRequestPayload.builder()
    .texts(Arrays.asList("Hello, how are you?", "Welcome to Camb.ai."))
    .sourceLanguage(Languages.EN_US.getValue())
    .targetLanguage(Languages.FR_FR.getValue())
    .build()
);

OrchestratorPipelineCallResult submitted =
  ObjectMappers.JSON_MAPPER.convertValue(submittedObj, OrchestratorPipelineCallResult.class);
String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.translation().getTranslationTaskStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Translation failed");
  Thread.sleep(2000);
}

TranslationResult result = client.translation().getTranslationResult(Optional.of(runId));
result.getTexts().forEach(System.out::println);

Transcription

Transcription is an asynchronous pipeline. Submit an audio URL, poll until success, then fetch the structured transcript:

OrchestratorPipelineCallResult submitted = client.transcription().createTranscription(
  Optional.empty(),
  Optional.empty(),
  BodyCreateTranscriptionTranscribePost.builder()
    .language(Languages.EN_US.getValue())
    .mediaUrl("https://example.com/audio.mp3")
    .build()
);

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.transcription().getTranscriptionTaskStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Transcription failed");
  Thread.sleep(3000);
}

TranscriptionResult result = client.transcription().getTranscriptionResult(
  Optional.of(runId),
  GetTranscriptionResultTranscriptionResultRunIdGetRequest.builder().wordLevelTimestamps(true).build()
);
result.getTranscript().forEach(t ->
  System.out.println(t.getStart() + " - " + t.getEnd() + ": " + t.getText())
);

Audio separation

Audio separation uploads a local file, polls completion, then returns download URLs for the separated stems:

OrchestratorPipelineCallResult submitted =
  client.audioSeparation().createAudioSeparation(Optional.of(new File("track.mp3")));

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.audioSeparation().getAudioSeparationStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Audio separation failed");
  Thread.sleep(3000);
}

GetAudioSeparationResultOut result = client.audioSeparation().getAudioSeparationRunInfo(Optional.of(runId));
System.out.println("foreground: " + result.getForegroundAudioUrl());
System.out.println("background: " + result.getBackgroundAudioUrl());

Text-to-voice

Text-to-voice is asynchronous. Create a voice from a description, poll until success, then read preview URLs:

OrchestratorPipelineCallResult submitted = client.textToVoice().createTextToVoice(
  CreateTextToVoiceRequestPayload.builder()
    .text("A confident narrator introducing a documentary.")
    .voiceDescription("A smooth, rich baritone voice with gentle warmth.")
    .build()
);

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.textToVoice().getTextToVoiceStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Text-to-voice failed");
  Thread.sleep(2000);
}

GetTextToVoiceResultOut result = client.textToVoice().getTextToVoiceResult(Optional.of(runId));
result.getPreviews().forEach(url -> System.out.println("Preview: " + url));

Text-to-audio

Text-to-audio is asynchronous. Submit a prompt, poll until it succeeds, then download the resulting audio stream:

OrchestratorPipelineCallResult submitted = client.textToAudio().createTextToAudio(
  CreateTextToAudioRequestPayload.builder()
    .prompt("Heavy rain on a tin roof at night with distant thunder.")
    .duration(15.0)
    .audioType(TextToAudioType.SOUND)
    .build()
);

String taskId = submitted.getTaskId().orElseThrow();

Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.textToAudio().getTextToAudioStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Text-to-audio failed");
  Thread.sleep(3000);
}

InputStream audioStream = client.textToAudio().getTextToAudioResult(Optional.of(runId));
saveStreamToFile(audioStream, "soundscape.wav"); // saveStreamToFile defined in Quick Start

Stories

The Stories endpoint ingests a document file and generates narrated audio asynchronously. The client returns a union response for submission, so you extract task_id with visit(...):

CreateStoryStoryPostResponse submitted = client.story().createStory(
  new File("story.pdf"),
  BodyCreateStoryStoryPost.builder()
    .sourceLanguage(Languages.EN_US.getValue())
    .title("My Story")
    .build()
);

final String[] taskIdHolder = new String[1];
submitted.visit(new CreateStoryStoryPostResponse.Visitor<Void>() {
  @Override
  public Void visit(OrchestratorPipelineCallResult value) {
    taskIdHolder[0] = value.getTaskId().orElseThrow();
    return null;
  }
  @Override
  public Void visit(GetSetupStoryResultResponse value) {
    throw new RuntimeException("Unexpected setup response");
  }
});

String taskId = taskIdHolder[0];
Integer runId = null;
while (true) {
  OrchestratorPipelineResult status = client.story().getStoryStatus(taskId);
  if (status.getStatus() == TaskStatus.SUCCESS) {
    runId = status.getRunId().orElseThrow();
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Stories task failed");
  Thread.sleep(5000);
}

Map<String, Object> runInfo = client.story().getStoryRunInfo(Optional.of(runId));
System.out.println("Story run info: " + runInfo);

Translated TTS

Translated TTS is asynchronous. Create a translated TTS task, poll until success, then inspect the success payload via OrchestratorPipelineResult.getAdditionalProperties():

CreateTranslatedTtsOut created = client.translatedTts().createTranslatedTts(
  CreateTranslatedTtsRequestPayload.builder()
    .text("Good morning, welcome to our service.")
    .voiceId(147320)
    .sourceLanguage(Languages.EN_US.getValue())
    .targetLanguage(Languages.HI_IN.getValue())
    .build()
);

while (true) {
  OrchestratorPipelineResult status =
    client.translatedTts().getTranslatedTtsTaskStatus(created.getTaskId());
  if (status.getStatus() == TaskStatus.SUCCESS) {
    System.out.println(status.getAdditionalProperties());
    break;
  }
  if (status.getStatus() == TaskStatus.ERROR) throw new RuntimeException("Translated TTS failed");
  Thread.sleep(3000);
}

Dictionaries

Dictionaries are shared term mappings that APIs can use to handle terminology consistently across transcription, dubbing, and translation:

List dictionaries

List<DictionaryWithTerms> dictionaries = client.dictionaries().getDictionaries();
dictionaries.forEach(d -> System.out.println(d.getId() + ": " + d.getName()));

Create from file and manage terms

// Create a dictionary from a CSV file.
Object created = client.dictionaries().createDictionaryFromFile(
  new File("terms.csv"),
  BodyCreateDictionaryFromFileDictionariesCreateFromFilePost.builder()
    .dictionaryName("Product Terms")
    .dictionaryDescription("Brand-specific terminology.")
    .build()
);

// Add a term to an existing dictionary.
int dictionaryId = 123;
client.dictionaries().addTermToDictionary(
  dictionaryId,
  AddDictionaryTermPayload.builder()
    .translations(Arrays.asList(
      TermTranslationInput.builder()
        .translation("Camb.ai")
        .language(Languages.HI_IN.getValue())
        .build()
    ))
    .build()
);

// Remove a specific term.
client.dictionaries().deleteDictionaryTerm(dictionaryId, /* termId */ 456);

Custom Providers

Custom hosting providers are implemented as ITtsProvider instances. You call provider.tts(request, requestOptions) directly instead of routing through CambApiClient:

import core.RequestOptions;
import resources.texttospeech.requests.CreateStreamTtsRequestPayload;
import com.fasterxml.jackson.databind.ObjectMapper;
import okhttp3.MediaType;
import okhttp3.OkHttpClient;
import okhttp3.Request;
import okhttp3.RequestBody;
import okhttp3.Response;

import java.io.InputStream;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

// Minimal Baseten provider implementation (based on the SDK example).
class BasetenProvider implements ITtsProvider {
  private final String apiKey;
  private final String url;
  private final String referenceAudio;
  private final String referenceLanguage;
  private final OkHttpClient httpClient;
  private final ObjectMapper objectMapper;

  public BasetenProvider(String apiKey, String url, String referenceAudio, String referenceLanguage) {
    this.apiKey = apiKey;
    this.url = url;
    this.referenceAudio = referenceAudio;
    this.referenceLanguage = referenceLanguage;
    this.httpClient = new OkHttpClient();
    this.objectMapper = new ObjectMapper();
  }

  @Override
  public InputStream tts(CreateStreamTtsRequestPayload request, RequestOptions requestOptions) {
    String language = request.getLanguage().toString().toLowerCase().replace("_", "-");

    Map<String, Object> payload = new HashMap<>();
    payload.put("text", request.getText());
    payload.put("language", language);
    payload.put("output_duration", null);
    payload.put("reference_audio", referenceAudio);
    payload.put("reference_language", referenceLanguage);
    payload.put("output_format", "flac");
    payload.put("apply_ner_nlp", false);

    request.getOutputConfiguration().ifPresent(config -> {
      config.getFormat().ifPresent(f -> payload.put("output_format", f.toString().toLowerCase()));
    });

    try {
      String json = objectMapper.writeValueAsString(payload);
      RequestBody body = RequestBody.create(json, MediaType.parse("application/json"));

      Request req = new Request.Builder()
        .url(this.url)
        .addHeader("Authorization", "Api-Key " + this.apiKey)
        .post(body)
        .build();

      Response response = httpClient.newCall(req).execute();
      if (!response.isSuccessful()) {
        String errorBody = response.body() != null ? response.body().string() : "<no body>";
        throw new RuntimeException("Baseten API error " + response.code() + ": " + errorBody);
      }

      return response.body().byteStream();
    } catch (IOException e) {
      throw new RuntimeException("Network error calling Baseten: " + e.getMessage(), e);
    }
  }
}

// Usage: instantiate the provider and call tts() directly.
ITtsProvider provider = new BasetenProvider(
  System.getenv("BASETEN_API_KEY"),
  System.getenv("BASETEN_URL"),
  System.getenv("BASETEN_REFERENCE_AUDIO"),
  "en-us"
);

InputStream audioStream = provider.tts(
  CreateStreamTtsRequestPayload.builder()
    .text("Hello from Java via Baseten Mars8-Flash!")
    .language(CreateStreamTtsRequestPayloadLanguage.EN_US)
    .voiceId(1)
    .build(),
  null
);
saveStreamToFile(audioStream, "baseten_output.wav");

Next Steps

https://mintcdn.com/cambai/2LvnefIkletroPxv/images/pipecat-orange.svg?fit=max&auto=format&n=2LvnefIkletroPxv&q=85&s=40cf8e001b8cadc8a4c3c557dea603d5

Voice Agents

Build real-time voice agents with Pipecat

https://mintcdn.com/cambai/2LvnefIkletroPxv/images/livekit-orange.svg?fit=max&auto=format&n=2LvnefIkletroPxv&q=85&s=c750fcee9b1de69e3c1d0d6ec7eb6b3f

LiveKit Integration

Create voice agents with LiveKit

API Reference

Explore the full TTS API

Voice Library

Browse available voices

​Installation

​Gradle

​Maven

​Authentication

​Quick Start

​Streaming Text-to-Speech

​Models / Configuration

​Text-to-Speech

​Voice Cloning

​List voices

​Create a custom voice

​Dubbing

​Translation

​Transcription

​Audio separation

​Text-to-voice

​Text-to-audio

​Stories

​Translated TTS

​Dictionaries

​List dictionaries

​Create from file and manage terms

​Custom Providers

​Next Steps

Voice Agents

LiveKit Integration

API Reference

Voice Library

​Resources

Installation

Gradle

Maven

Authentication

Quick Start

Streaming Text-to-Speech

Models / Configuration

Text-to-Speech

Voice Cloning

List voices

Create a custom voice

Dubbing

Translation

Transcription

Audio separation

Text-to-voice

Text-to-audio

Stories

Translated TTS

Dictionaries

List dictionaries

Create from file and manage terms

Custom Providers

Next Steps

Resources