The Token-as-a-Service (TaaS) API provides an OpenAI-compatible interface for LLM inference, embeddings, reranking, and audio processing. All endpoints use standard HTTP methods and return JSON (except audio/speech which returns binary audio).
Base URL: https://taas.cloudsigma.com/v1
Authenticate every request with a TaaS Bearer token in the Authorization header.
Tokens are prefixed with taas_ and are scoped per team or project.
Authorization: Bearer taas_xxxxxxxxxxxxxxxx
Generate API tokens from Settings → API Tokens in the TaaS console. Keep tokens secret — they grant full access to your account's quota.
All errors return a JSON object with a detail or error field:
| Status | Meaning |
|---|---|
400 | Bad Request — invalid parameters or malformed body |
401 | Unauthorized — missing or invalid Bearer token |
403 | Forbidden — token lacks permission for this action |
404 | Not Found — model or resource doesn't exist |
422 | Validation Error — check request body fields |
429 | Rate Limited — slow down or contact support for higher limits |
500 | Server Error — try again or contact support |
Returns the list of all models available to your account. Each model object includes its identifier, type, and owner. Use the id field as the model parameter in inference requests.
| Field | Type | Description |
|---|---|---|
id | string | Model identifier to use in API calls (e.g. claude-sonnet-4) |
object | string | Always "model" |
owned_by | string | Provider name (e.g. anthropic, openai) |
# List available models curl https://taas.cloudsigma.com/v1/models \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx"
import requests response = requests.get( "https://taas.cloudsigma.com/v1/models", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"} ) models = response.json() for m in models: print(m["id"]) # e.g. "claude-sonnet-4"
const response = await fetch( "https://taas.cloudsigma.com/v1/models", { headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx" } } ); const models = await response.json(); models.forEach(m => console.log(m.id));
[
{
"id": "claude-sonnet-4",
"object": "model",
"owned_by": "anthropic"
},
{
"id": "gpt-4o",
"object": "model",
"owned_by": "openai"
},
{
"id": "minimax-m2",
"object": "model",
"owned_by": "minimax"
}
]
Generate a chat completion response from a model. Supports streaming via Server-Sent Events. Compatible with OpenAI's chat/completions API — existing OpenAI clients work by changing the base URL and API key.
| Field | Type | Description |
|---|---|---|
model required | string | Model ID from /v1/models (e.g. claude-sonnet-4) |
messages required | array | Array of {role, content} objects. Roles: system, user, assistant |
stream | boolean | Stream tokens via SSE (default: false) |
temperature | number | Sampling temperature 0–2 (default: 1.0). Higher = more creative |
max_tokens | integer | Maximum tokens to generate |
top_p | number | Nucleus sampling probability 0–1 (default: 1.0) |
| Field | Type | Description |
|---|---|---|
choices[].message.role | string | Always "assistant" |
choices[].message.content | string | Generated text |
usage.prompt_tokens | integer | Tokens in the input messages |
usage.completion_tokens | integer | Tokens generated |
usage.total_tokens | integer | Sum of prompt + completion tokens |
# Chat completion curl -X POST https://taas.cloudsigma.com/v1/chat/completions \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "claude-sonnet-4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum entanglement."} ], "temperature": 0.7, "max_tokens": 512 }'
import requests response = requests.post( "https://taas.cloudsigma.com/v1/chat/completions", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, json={ "model": "claude-sonnet-4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum entanglement."} ], "temperature": 0.7, "max_tokens": 512 } ) result = response.json() print(result["choices"][0]["message"]["content"])
const response = await fetch( "https://taas.cloudsigma.com/v1/chat/completions", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx", "Content-Type": "application/json", }, body: JSON.stringify({ model: "claude-sonnet-4", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Explain quantum entanglement." } ], temperature: 0.7, max_tokens: 512, }), } ); const data = await response.json(); console.log(data.choices[0].message.content);
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "claude-sonnet-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum entanglement is a phenomenon..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 142,
"total_tokens": 170
}
}
Generate vector embeddings for one or more text inputs. Use embeddings for semantic search, similarity comparison, clustering, and retrieval-augmented generation (RAG).
| Field | Type | Description |
|---|---|---|
model required | string | Embedding model ID (e.g. bge-m3) |
input required | string or array | Text string or array of strings to embed. Max 2048 tokens per string. |
| Field | Type | Description |
|---|---|---|
data[].embedding | array | Dense float vector representing the input text |
data[].index | integer | Index of the input string this embedding corresponds to |
usage.prompt_tokens | integer | Total tokens processed |
usage.total_tokens | integer | Same as prompt_tokens for embeddings |
# Generate embeddings curl -X POST https://taas.cloudsigma.com/v1/embeddings \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-m3", "input": ["Hello world", "How are you?"] }'
import requests response = requests.post( "https://taas.cloudsigma.com/v1/embeddings", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, json={ "model": "bge-m3", "input": ["Hello world", "How are you?"] } ) data = response.json() vector = data["data"][0]["embedding"] print(f"Dimensions: {len(vector)}")
const response = await fetch( "https://taas.cloudsigma.com/v1/embeddings", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx", "Content-Type": "application/json", }, body: JSON.stringify({ model: "bge-m3", input: ["Hello world", "How are you?"], }), } ); const data = await response.json(); console.log(`Dims: ${data.data[0].embedding.length}`);
{
"object": "list",
"model": "bge-m3",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0234, -0.0871, 0.1203, ...]
},
{
"object": "embedding",
"index": 1,
"embedding": [0.0512, -0.0344, 0.0987, ...]
}
],
"usage": {
"prompt_tokens": 7,
"total_tokens": 7
}
}
Rerank a list of documents by relevance to a query. Ideal for improving retrieval quality in RAG pipelines — pass candidate documents from a vector search and get them sorted by true semantic relevance.
| Field | Type | Description |
|---|---|---|
model required | string | Reranker model ID (e.g. bge-reranker-v2-m3) |
query required | string | The search query to rank documents against |
documents required | array | Array of document strings to score and rank |
| Field | Type | Description |
|---|---|---|
results[].index | integer | Original index of the document in the input array |
results[].relevance_score | number | Relevance score (higher = more relevant). Results are sorted descending. |
# Rerank documents curl -X POST https://taas.cloudsigma.com/v1/rerank \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-reranker-v2-m3", "query": "What is quantum computing?", "documents": [ "Quantum computing uses qubits to process information.", "Classical computers use binary bits.", "The weather today is sunny." ] }'
import requests response = requests.post( "https://taas.cloudsigma.com/v1/rerank", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, json={ "model": "bge-reranker-v2-m3", "query": "What is quantum computing?", "documents": [ "Quantum computing uses qubits to process information.", "Classical computers use binary bits.", "The weather today is sunny." ] } ) results = response.json()["results"] for r in results: print(f"idx={r['index']} score={r['relevance_score']:.4f}")
const response = await fetch( "https://taas.cloudsigma.com/v1/rerank", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx", "Content-Type": "application/json", }, body: JSON.stringify({ model: "bge-reranker-v2-m3", query: "What is quantum computing?", documents: [ "Quantum computing uses qubits to process information.", "Classical computers use binary bits.", "The weather today is sunny.", ], }), } ); const { results } = await response.json();
{
"model": "bge-reranker-v2-m3",
"results": [
{
"index": 0,
"relevance_score": 0.9823
},
{
"index": 1,
"relevance_score": 0.4512
},
{
"index": 2,
"relevance_score": 0.0031
}
]
}
Transcribe audio to text using Whisper or another speech-to-text model. Send the audio file as multipart/form-data. Supports WAV, MP3, M4A, FLAC, OGG, and WebM formats.
| Field | Type | Description |
|---|---|---|
file required | file | Audio file to transcribe. Max 25 MB. |
model | string | Model to use (default: whisper) |
language | string | ISO-639-1 language code (e.g. en, de). Auto-detected if omitted. |
| Field | Type | Description |
|---|---|---|
text | string | The transcribed text from the audio file |
# Transcribe audio file curl -X POST https://taas.cloudsigma.com/v1/audio/transcriptions \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -F "file=@recording.wav" \ -F "model=whisper"
import requests with open("recording.wav", "rb") as f: response = requests.post( "https://taas.cloudsigma.com/v1/audio/transcriptions", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, files={"file": ("recording.wav", f, "audio/wav")}, data={"model": "whisper"} ) result = response.json() print(result["text"])
const formData = new FormData(); formData.append("file", audioBlob, "recording.wav"); formData.append("model", "whisper"); const response = await fetch( "https://taas.cloudsigma.com/v1/audio/transcriptions", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx" }, body: formData, } ); const { text } = await response.json(); console.log(text);
{
"text": "Hello, this is a test recording for the transcription API."
}
Convert text to natural-sounding audio. The response is a binary audio stream (WAV or MP3). Use the Kokoro model for high-quality multilingual speech synthesis.
| Field | Type | Description |
|---|---|---|
model required | string | TTS model ID (e.g. kokoro) |
input required | string | Text to synthesize into speech. Max 4096 characters. |
voice | string | Voice identifier (model-specific). Omit for model default. |
response_format | string | Audio format: mp3 or wav (default: mp3) |
speed | number | Speaking speed multiplier 0.25–4.0 (default: 1.0) |
Returns raw audio binary data with Content-Type: audio/mpeg (MP3) or audio/wav. Save the response body directly to a file.
# Text to speech — save to file curl -X POST https://taas.cloudsigma.com/v1/audio/speech \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "kokoro", "input": "Welcome to Token-as-a-Service.", "voice": "af_sarah" }' \ --output speech.mp3
import requests response = requests.post( "https://taas.cloudsigma.com/v1/audio/speech", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, json={ "model": "kokoro", "input": "Welcome to Token-as-a-Service.", "voice": "af_sarah" } ) with open("speech.mp3", "wb") as f: f.write(response.content) print("Saved speech.mp3")
const response = await fetch( "https://taas.cloudsigma.com/v1/audio/speech", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx", "Content-Type": "application/json", }, body: JSON.stringify({ model: "kokoro", input: "Welcome to Token-as-a-Service.", voice: "af_sarah", }), } ); const audioBuffer = await response.arrayBuffer(); // Play or save the audio binary
# Binary audio data (MP3 or WAV)
# Content-Type: audio/mpeg
# Save response body directly to speech.mp3
Returns the current health status of the TaaS API gateway. Use this endpoint to verify connectivity and confirm the service is operational before sending inference requests. No authentication required.
| Field | Type | Description |
|---|---|---|
status | string | Always "ok" when the service is healthy |
# Check API health
curl https://taas.cloudsigma.com/health
import requests response = requests.get("https://taas.cloudsigma.com/health") print(response.json()) # {"status": "ok"}
const response = await fetch( "https://taas.cloudsigma.com/health" ); const data = await response.json(); console.log(data.status); // "ok"
{
"status": "ok"
}