TaaS API Reference

The Token-as-a-Service (TaaS) API provides an OpenAI-compatible interface for LLM inference, embeddings, reranking, and audio processing. All endpoints use standard HTTP methods and return JSON (except audio/speech which returns binary audio).

Base URL: https://taas.cloudsigma.com/v1

Codex native base URL: https://taas.cloudsigma.com/v1/codex

TaaS now also exposes a native Codex-compatible endpoint for clients such as Codex ACP and other Codex-native integrations. Use the standard /v1 OpenAI-compatible base for normal SDK traffic, and the native Codex base when a client expects Codex-specific request semantics.

Authentication

Authenticate every request with a TaaS Bearer token in the Authorization header. Tokens are prefixed with taas_ and are scoped per team or project.

Authorization Header:
Authorization: Bearer taas_xxxxxxxxxxxxxxxx

Generate API tokens from Settings → API Tokens in the TaaS console. Keep tokens secret — they grant full access to your account's quota.

Session affinity & request observability

If you want repeated requests to stay on the same routed session, you can send a stable session identifier. This improves continuation reliability and makes request-level debugging easier in the TaaS console.

Input	Where to send it	Notes
`X-Session-Id`	HTTP header	Highest precedence. Best option when you control headers.
`metadata.session_id`	JSON body metadata	Stable explicit session id for OpenAI-compatible request bodies.
`metadata.sticky_key`	JSON body metadata	Compatibility fallback, especially useful for Anthropic-style continuation flows.

Precedence is X-Session-Id → metadata.session_id → metadata.sticky_key. If none are supplied, TaaS can still infer continuity heuristically, but explicit IDs are more predictable.

Error Handling

All errors return a JSON object with a detail or error field:

Status	Meaning
`400`	Bad Request — invalid parameters or malformed body
`401`	Unauthorized — missing or invalid Bearer token
`403`	Forbidden — token lacks permission for this action
`404`	Not Found — model or resource doesn't exist
`422`	Validation Error — check request body fields
`429`	Rate Limited — slow down or contact support for higher limits
`500`	Server Error — try again or contact support

GET List Models

/v1/models

Returns the list of all models available to your account. Each model object includes its identifier, type, and owner. Use the id field as the model parameter in inference requests.

Response Fields

Field	Type	Description
`id`	string	Model identifier to use in API calls (e.g. `claude-sonnet-4`)
`object`	string	Always `"model"`
`owned_by`	string	Provider name (e.g. `anthropic`, `openai`)

# List available models
curl https://taas.cloudsigma.com/v1/models \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx"

import requests

response = requests.get(
    "https://taas.cloudsigma.com/v1/models",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}
)
models = response.json()
for m in models:
    print(m["id"])  # e.g. "claude-sonnet-4"

const response = await fetch(
  "https://taas.cloudsigma.com/v1/models",
  {
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"
    }
  }
);
const models = await response.json();
models.forEach(m => console.log(m.id));

Response · 200

[
  {
    "id": "claude-sonnet-4",
    "object": "model",
    "owned_by": "anthropic"
  },
  {
    "id": "gpt-4o",
    "object": "model",
    "owned_by": "openai"
  },
  {
    "id": "minimax-m2",
    "object": "model",
    "owned_by": "minimax"
  }
]

POST Chat Completions

/v1/chat/completions

Generate a chat completion response from a model. Supports streaming via Server-Sent Events. Compatible with OpenAI's chat/completions API — existing OpenAI clients work by changing the base URL and API key.

Request Body

Field	Type	Description
`model` required	string	Model ID from `/v1/models` (e.g. `claude-sonnet-4`)
`messages` required	array	Array of `{role, content}` objects. Roles: `system`, `user`, `assistant`
`metadata.session_id`	string	Optional explicit session id for continuity and easier per-request tracing.
`metadata.sticky_key`	string	Optional sticky routing key. Used after `X-Session-Id` and `metadata.session_id`.
`stream`	boolean	Stream tokens via SSE (default: `false`)
`temperature`	number	Sampling temperature 0–2 (default: `1.0`). Higher = more creative
`max_tokens`	integer	Maximum tokens to generate
`top_p`	number	Nucleus sampling probability 0–1 (default: `1.0`)

Response Fields

Field	Type	Description
`choices[].message.role`	string	Always `"assistant"`
`choices[].message.content`	string	Generated text
`usage.prompt_tokens`	integer	Tokens in the input messages
`usage.completion_tokens`	integer	Tokens generated
`usage.total_tokens`	integer	Sum of prompt + completion tokens

# Chat completion with explicit session continuity
curl -X POST https://taas.cloudsigma.com/v1/chat/completions \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -H "X-Session-Id: user-42-chat-a" \
  -d '{
    "model": "claude-sonnet-4",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum entanglement."}
    ],
    "metadata": {
      "session_id": "user-42-chat-a"
    },
    "temperature": 0.7,
    "max_tokens": 512
  }'

import requests

response = requests.post(
    "https://taas.cloudsigma.com/v1/chat/completions",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
    json={
        "model": "claude-sonnet-4",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain quantum entanglement."}
        ],
        "temperature": 0.7,
        "max_tokens": 512
    }
)
result = response.json()
print(result["choices"][0]["message"]["content"])

const response = await fetch(
  "https://taas.cloudsigma.com/v1/chat/completions",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "claude-sonnet-4",
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "Explain quantum entanglement." }
      ],
      temperature: 0.7,
      max_tokens: 512,
    }),
  }
);
const data = await response.json();
console.log(data.choices[0].message.content);

Response · 200

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "claude-sonnet-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum entanglement is a phenomenon..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 142,
    "total_tokens": 170
  }
}

POST Codex native responses

/v1/codex/responses

Native Codex-compatible passthrough endpoint for Codex-specific clients such as Codex ACP. Requests sent here are routed to Codex OAuth tokens only and are forwarded without the Chat Completions → Codex transformation used by the OpenAI-compatible proxy path.

Important: this endpoint expects Codex-native request semantics. In particular, input must be a list and stream must be set to true.

Field	Type	Description
`model` required	string	Codex-capable model id (for example `gpt-5.4`)
`instructions`	string	Optional system/instructions text
`input` required	array	Native Codex input message array
`stream` required	boolean	Must be `true` for native Codex requests
`store`	boolean	Passed through as provided

Recommended for harnesses and agent runtimes that already speak Codex-native protocol. For ordinary OpenAI SDK traffic, keep using /v1/chat/completions and the standard OpenAI-compatible base URL.

curl -X POST https://taas.cloudsigma.com/v1/codex/responses \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "store": false,
    "stream": true,
    "instructions": "Reply with exactly OK",
    "input": [
      {"role": "user", "content": "Reply with exactly OK"}
    ]
  }'

POST Embeddings

/v1/embeddings

Generate vector embeddings for one or more text inputs. Use embeddings for semantic search, similarity comparison, clustering, and retrieval-augmented generation (RAG).

Request Body

Field	Type	Description
`model` required	string	Embedding model ID (e.g. `bge-m3`)
`input` required	string or array	Text string or array of strings to embed. Max 2048 tokens per string.

Response Fields

Field	Type	Description
`data[].embedding`	array	Dense float vector representing the input text
`data[].index`	integer	Index of the input string this embedding corresponds to
`usage.prompt_tokens`	integer	Total tokens processed
`usage.total_tokens`	integer	Same as prompt_tokens for embeddings

# Generate embeddings
curl -X POST https://taas.cloudsigma.com/v1/embeddings \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-m3",
    "input": ["Hello world", "How are you?"]
  }'

import requests

response = requests.post(
    "https://taas.cloudsigma.com/v1/embeddings",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
    json={
        "model": "bge-m3",
        "input": ["Hello world", "How are you?"]
    }
)
data = response.json()
vector = data["data"][0]["embedding"]
print(f"Dimensions: {len(vector)}")

const response = await fetch(
  "https://taas.cloudsigma.com/v1/embeddings",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "bge-m3",
      input: ["Hello world", "How are you?"],
    }),
  }
);
const data = await response.json();
console.log(`Dims: ${data.data[0].embedding.length}`);

Response · 200

{
  "object": "list",
  "model": "bge-m3",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0234, -0.0871, 0.1203, ...]
    },
    {
      "object": "embedding",
      "index": 1,
      "embedding": [0.0512, -0.0344, 0.0987, ...]
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "total_tokens": 7
  }
}

POST Rerank

/v1/rerank

Rerank a list of documents by relevance to a query. Ideal for improving retrieval quality in RAG pipelines — pass candidate documents from a vector search and get them sorted by true semantic relevance.

Request Body

Field	Type	Description
`model` required	string	Reranker model ID (e.g. `bge-reranker-v2-m3`)
`query` required	string	The search query to rank documents against
`documents` required	array	Array of document strings to score and rank

Response Fields

Field	Type	Description
`results[].index`	integer	Original index of the document in the input array
`results[].relevance_score`	number	Relevance score (higher = more relevant). Results are sorted descending.

# Rerank documents
curl -X POST https://taas.cloudsigma.com/v1/rerank \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "query": "What is quantum computing?",
    "documents": [
      "Quantum computing uses qubits to process information.",
      "Classical computers use binary bits.",
      "The weather today is sunny."
    ]
  }'

import requests

response = requests.post(
    "https://taas.cloudsigma.com/v1/rerank",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
    json={
        "model": "bge-reranker-v2-m3",
        "query": "What is quantum computing?",
        "documents": [
            "Quantum computing uses qubits to process information.",
            "Classical computers use binary bits.",
            "The weather today is sunny."
        ]
    }
)
results = response.json()["results"]
for r in results:
    print(f"idx={r['index']} score={r['relevance_score']:.4f}")

const response = await fetch(
  "https://taas.cloudsigma.com/v1/rerank",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "bge-reranker-v2-m3",
      query: "What is quantum computing?",
      documents: [
        "Quantum computing uses qubits to process information.",
        "Classical computers use binary bits.",
        "The weather today is sunny.",
      ],
    }),
  }
);
const { results } = await response.json();

Response · 200

{
  "model": "bge-reranker-v2-m3",
  "results": [
    {
      "index": 0,
      "relevance_score": 0.9823
    },
    {
      "index": 1,
      "relevance_score": 0.4512
    },
    {
      "index": 2,
      "relevance_score": 0.0031
    }
  ]
}

POST Audio Transcriptions

/v1/audio/transcriptions

Transcribe audio to text using Whisper or another speech-to-text model. Send the audio file as multipart/form-data. Supports WAV, MP3, M4A, FLAC, OGG, and WebM formats.

Request (multipart/form-data)

Field	Type	Description
`file` required	file	Audio file to transcribe. Max 25 MB.
`model`	string	Model to use (default: `whisper`)
`language`	string	ISO-639-1 language code (e.g. `en`, `de`). Auto-detected if omitted.

Response Fields

Field	Type	Description
`text`	string	The transcribed text from the audio file

# Transcribe audio file
curl -X POST https://taas.cloudsigma.com/v1/audio/transcriptions \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -F "file=@recording.wav" \
  -F "model=whisper"

import requests

with open("recording.wav", "rb") as f:
    response = requests.post(
        "https://taas.cloudsigma.com/v1/audio/transcriptions",
        headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
        files={"file": ("recording.wav", f, "audio/wav")},
        data={"model": "whisper"}
    )
result = response.json()
print(result["text"])

const formData = new FormData();
formData.append("file", audioBlob, "recording.wav");
formData.append("model", "whisper");

const response = await fetch(
  "https://taas.cloudsigma.com/v1/audio/transcriptions",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"
    },
    body: formData,
  }
);
const { text } = await response.json();
console.log(text);

Response · 200

{
  "text": "Hello, this is a test recording for the transcription API."
}

POST Text to Speech

/v1/audio/speech

Convert text to natural-sounding audio. The response is a binary audio stream (WAV or MP3). Use the Kokoro model for high-quality multilingual speech synthesis.

Request Body

Field	Type	Description
`model` required	string	TTS model ID (e.g. `kokoro`)
`input` required	string	Text to synthesize into speech. Max 4096 characters.
`voice`	string	Voice identifier (model-specific). Omit for model default.
`response_format`	string	Audio format: `mp3` or `wav` (default: `mp3`)
`speed`	number	Speaking speed multiplier 0.25–4.0 (default: `1.0`)

Response

Returns raw audio binary data with Content-Type: audio/mpeg (MP3) or audio/wav. Save the response body directly to a file.

# Text to speech — save to file
curl -X POST https://taas.cloudsigma.com/v1/audio/speech \
  -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "kokoro",
    "input": "Welcome to Token-as-a-Service.",
    "voice": "af_sarah"
  }' \
  --output speech.mp3

import requests

response = requests.post(
    "https://taas.cloudsigma.com/v1/audio/speech",
    headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"},
    json={
        "model": "kokoro",
        "input": "Welcome to Token-as-a-Service.",
        "voice": "af_sarah"
    }
)
with open("speech.mp3", "wb") as f:
    f.write(response.content)
print("Saved speech.mp3")

const response = await fetch(
  "https://taas.cloudsigma.com/v1/audio/speech",
  {
    method: "POST",
    headers: {
      "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "kokoro",
      input: "Welcome to Token-as-a-Service.",
      voice: "af_sarah",
    }),
  }
);
const audioBuffer = await response.arrayBuffer();
// Play or save the audio binary

Response · 200

# Binary audio data (MP3 or WAV)
# Content-Type: audio/mpeg
# Save response body directly to speech.mp3

GET Health Check

/health

Returns the current health status of the TaaS API gateway. Use this endpoint to verify connectivity and confirm the service is operational before sending inference requests. No authentication required.

Response Fields

Field	Type	Description
`status`	string	Always `"ok"` when the service is healthy

# Check API health
curl https://taas.cloudsigma.com/health

import requests

response = requests.get("https://taas.cloudsigma.com/health")
print(response.json())  # {"status": "ok"}

const response = await fetch(
  "https://taas.cloudsigma.com/health"
);
const data = await response.json();
console.log(data.status); // "ok"

Response · 200

{
  "status": "ok"
}