TaaS API Reference

The Token-as-a-Service (TaaS) API provides an OpenAI-compatible interface for LLM inference, embeddings, reranking, and audio processing. All endpoints use standard HTTP methods and return JSON (except audio/speech which returns binary audio).

Base URL: https://taas.cloudsigma.com/v1

Authentication

Authenticate every request with a TaaS Bearer token in the Authorization header. Tokens are prefixed with taas_ and are scoped per team or project.

Authorization Header:
Authorization: Bearer taas_xxxxxxxxxxxxxxxx

Generate API tokens from Settings → API Tokens in the TaaS console. Keep tokens secret — they grant full access to your account's quota.

Error Handling

All errors return a JSON object with a detail or error field:

StatusMeaning
400Bad Request — invalid parameters or malformed body
401Unauthorized — missing or invalid Bearer token
403Forbidden — token lacks permission for this action
404Not Found — model or resource doesn't exist
422Validation Error — check request body fields
429Rate Limited — slow down or contact support for higher limits
500Server Error — try again or contact support

GET List Models

/v1/models

Returns the list of all models available to your account. Each model object includes its identifier, type, and owner. Use the id field as the model parameter in inference requests.

Response Fields
FieldTypeDescription
idstringModel identifier to use in API calls (e.g. claude-sonnet-4)
objectstringAlways "model"
owned_bystringProvider name (e.g. anthropic, openai)
# List available models
curl https://taas.cloudsigma.com/v1/models \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx"
import requests

response = requests.get(
    "https://taas.cloudsigma.com/v1/models",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}
)
models = response.json()
for m in models:
    print(m["id"])  # e.g. "claude-sonnet-4"
const response = await fetch(
  "https://taas.cloudsigma.com/v1/models",
  {
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"
    }
  }
);
const models = await response.json();
models.forEach(m => console.log(m.id));
Response · 200
[
  {
    "id": "claude-sonnet-4",
    "object": "model",
    "owned_by": "anthropic"
  },
  {
    "id": "gpt-4o",
    "object": "model",
    "owned_by": "openai"
  },
  {
    "id": "minimax-m2",
    "object": "model",
    "owned_by": "minimax"
  }
]

POST Chat Completions

/v1/chat/completions

Generate a chat completion response from a model. Supports streaming via Server-Sent Events. Compatible with OpenAI's chat/completions API — existing OpenAI clients work by changing the base URL and API key.

Request Body
FieldTypeDescription
model requiredstringModel ID from /v1/models (e.g. claude-sonnet-4)
messages requiredarrayArray of {role, content} objects. Roles: system, user, assistant
streambooleanStream tokens via SSE (default: false)
temperaturenumberSampling temperature 0–2 (default: 1.0). Higher = more creative
max_tokensintegerMaximum tokens to generate
top_pnumberNucleus sampling probability 0–1 (default: 1.0)
Response Fields
FieldTypeDescription
choices[].message.rolestringAlways "assistant"
choices[].message.contentstringGenerated text
usage.prompt_tokensintegerTokens in the input messages
usage.completion_tokensintegerTokens generated
usage.total_tokensintegerSum of prompt + completion tokens
# Chat completion
curl -X POST https://taas.cloudsigma.com/v1/chat/completions \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum entanglement."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'
import requests

response = requests.post(
    "https://taas.cloudsigma.com/v1/chat/completions",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
    json={
        "model": "claude-sonnet-4",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain quantum entanglement."}
        ],
        "temperature": 0.7,
        "max_tokens": 512
    }
)
result = response.json()
print(result["choices"][0]["message"]["content"])
const response = await fetch(
  "https://taas.cloudsigma.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "claude-sonnet-4",
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Explain quantum entanglement." }
      ],
      temperature: 0.7,
      max_tokens: 512,
    }),
  }
);
const data = await response.json();
console.log(data.choices[0].message.content);
Response · 200
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "claude-sonnet-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum entanglement is a phenomenon..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 142,
    "total_tokens": 170
  }
}

POST Embeddings

/v1/embeddings

Generate vector embeddings for one or more text inputs. Use embeddings for semantic search, similarity comparison, clustering, and retrieval-augmented generation (RAG).

Request Body
FieldTypeDescription
model requiredstringEmbedding model ID (e.g. bge-m3)
input requiredstring or arrayText string or array of strings to embed. Max 2048 tokens per string.
Response Fields
FieldTypeDescription
data[].embeddingarrayDense float vector representing the input text
data[].indexintegerIndex of the input string this embedding corresponds to
usage.prompt_tokensintegerTotal tokens processed
usage.total_tokensintegerSame as prompt_tokens for embeddings
# Generate embeddings
curl -X POST https://taas.cloudsigma.com/v1/embeddings \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "input": ["Hello world", "How are you?"]
  }'
import requests

response = requests.post(
    "https://taas.cloudsigma.com/v1/embeddings",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
    json={
        "model": "bge-m3",
        "input": ["Hello world", "How are you?"]
    }
)
data = response.json()
vector = data["data"][0]["embedding"]
print(f"Dimensions: {len(vector)}")
const response = await fetch(
  "https://taas.cloudsigma.com/v1/embeddings",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "bge-m3",
      input: ["Hello world", "How are you?"],
    }),
  }
);
const data = await response.json();
console.log(`Dims: ${data.data[0].embedding.length}`);
Response · 200
{
  "object": "list",
  "model": "bge-m3",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0234, -0.0871, 0.1203, ...]
    },
    {
      "object": "embedding",
      "index": 1,
      "embedding": [0.0512, -0.0344, 0.0987, ...]
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "total_tokens": 7
  }
}

POST Rerank

/v1/rerank

Rerank a list of documents by relevance to a query. Ideal for improving retrieval quality in RAG pipelines — pass candidate documents from a vector search and get them sorted by true semantic relevance.

Request Body
FieldTypeDescription
model requiredstringReranker model ID (e.g. bge-reranker-v2-m3)
query requiredstringThe search query to rank documents against
documents requiredarrayArray of document strings to score and rank
Response Fields
FieldTypeDescription
results[].indexintegerOriginal index of the document in the input array
results[].relevance_scorenumberRelevance score (higher = more relevant). Results are sorted descending.
# Rerank documents
curl -X POST https://taas.cloudsigma.com/v1/rerank \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "query": "What is quantum computing?",
    "documents": [
      "Quantum computing uses qubits to process information.",
      "Classical computers use binary bits.",
      "The weather today is sunny."
    ]
  }'
import requests

response = requests.post(
    "https://taas.cloudsigma.com/v1/rerank",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
    json={
        "model": "bge-reranker-v2-m3",
        "query": "What is quantum computing?",
        "documents": [
            "Quantum computing uses qubits to process information.",
            "Classical computers use binary bits.",
            "The weather today is sunny."
        ]
    }
)
results = response.json()["results"]
for r in results:
    print(f"idx={r['index']} score={r['relevance_score']:.4f}")
const response = await fetch(
  "https://taas.cloudsigma.com/v1/rerank",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "bge-reranker-v2-m3",
      query: "What is quantum computing?",
      documents: [
        "Quantum computing uses qubits to process information.",
        "Classical computers use binary bits.",
        "The weather today is sunny.",
      ],
    }),
  }
);
const { results } = await response.json();
Response · 200
{
  "model": "bge-reranker-v2-m3",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.9823
    },
    {
      "index": 1,
      "relevance_score": 0.4512
    },
    {
      "index": 2,
      "relevance_score": 0.0031
    }
  ]
}

POST Audio Transcriptions

/v1/audio/transcriptions

Transcribe audio to text using Whisper or another speech-to-text model. Send the audio file as multipart/form-data. Supports WAV, MP3, M4A, FLAC, OGG, and WebM formats.

Request (multipart/form-data)
FieldTypeDescription
file requiredfileAudio file to transcribe. Max 25 MB.
modelstringModel to use (default: whisper)
languagestringISO-639-1 language code (e.g. en, de). Auto-detected if omitted.
Response Fields
FieldTypeDescription
textstringThe transcribed text from the audio file
# Transcribe audio file
curl -X POST https://taas.cloudsigma.com/v1/audio/transcriptions \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -F "file=@recording.wav" \
  -F "model=whisper"
import requests

with open("recording.wav", "rb") as f:
    response = requests.post(
        "https://taas.cloudsigma.com/v1/audio/transcriptions",
        headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
        files={"file": ("recording.wav", f, "audio/wav")},
        data={"model": "whisper"}
    )
result = response.json()
print(result["text"])
const formData = new FormData();
formData.append("file", audioBlob, "recording.wav");
formData.append("model", "whisper");

const response = await fetch(
  "https://taas.cloudsigma.com/v1/audio/transcriptions",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"
    },
    body: formData,
  }
);
const { text } = await response.json();
console.log(text);
Response · 200
{
  "text": "Hello, this is a test recording for the transcription API."
}

POST Text to Speech

/v1/audio/speech

Convert text to natural-sounding audio. The response is a binary audio stream (WAV or MP3). Use the Kokoro model for high-quality multilingual speech synthesis.

Request Body
FieldTypeDescription
model requiredstringTTS model ID (e.g. kokoro)
input requiredstringText to synthesize into speech. Max 4096 characters.
voicestringVoice identifier (model-specific). Omit for model default.
response_formatstringAudio format: mp3 or wav (default: mp3)
speednumberSpeaking speed multiplier 0.25–4.0 (default: 1.0)
Response

Returns raw audio binary data with Content-Type: audio/mpeg (MP3) or audio/wav. Save the response body directly to a file.

# Text to speech — save to file
curl -X POST https://taas.cloudsigma.com/v1/audio/speech \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Welcome to Token-as-a-Service.",
    "voice": "af_sarah"
  }' \
  --output speech.mp3
import requests

response = requests.post(
    "https://taas.cloudsigma.com/v1/audio/speech",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
    json={
        "model": "kokoro",
        "input": "Welcome to Token-as-a-Service.",
        "voice": "af_sarah"
    }
)
with open("speech.mp3", "wb") as f:
    f.write(response.content)
print("Saved speech.mp3")
const response = await fetch(
  "https://taas.cloudsigma.com/v1/audio/speech",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "kokoro",
      input: "Welcome to Token-as-a-Service.",
      voice: "af_sarah",
    }),
  }
);
const audioBuffer = await response.arrayBuffer();
// Play or save the audio binary
Response · 200
# Binary audio data (MP3 or WAV)
# Content-Type: audio/mpeg
# Save response body directly to speech.mp3

GET Health Check

/health

Returns the current health status of the TaaS API gateway. Use this endpoint to verify connectivity and confirm the service is operational before sending inference requests. No authentication required.

Response Fields
FieldTypeDescription
statusstringAlways "ok" when the service is healthy
# Check API health
curl https://taas.cloudsigma.com/health
import requests

response = requests.get("https://taas.cloudsigma.com/health")
print(response.json())  # {"status": "ok"}
const response = await fetch(
  "https://taas.cloudsigma.com/health"
);
const data = await response.json();
console.log(data.status); // "ok"
Response · 200
{
  "status": "ok"
}