The Token-as-a-Service (TaaS) API provides an OpenAI-compatible interface for LLM inference, embeddings, reranking, and audio processing. All endpoints use standard HTTP methods and return JSON (except audio/speech which returns binary audio).
Base URL: https://taas.cloudsigma.com/v1
Codex native base URL: https://taas.cloudsigma.com/v1/codex
TaaS now also exposes a native Codex-compatible endpoint for clients such as Codex ACP and other Codex-native integrations. Use the standard /v1 OpenAI-compatible base for normal SDK traffic, and the native Codex base when a client expects Codex-specific request semantics.
Authenticate every request with a TaaS Bearer token in the Authorization header.
Tokens are prefixed with taas_ and are scoped per team or project.
Authorization: Bearer taas_xxxxxxxxxxxxxxxx
Generate API tokens from Settings → API Tokens in the TaaS console. Keep tokens secret — they grant full access to your account's quota.
If you want repeated requests to stay on the same routed session, you can send a stable session identifier. This improves continuation reliability and makes request-level debugging easier in the TaaS console.
| Input | Where to send it | Notes |
|---|---|---|
X-Session-Id | HTTP header | Highest precedence. Best option when you control headers. |
metadata.session_id | JSON body metadata | Stable explicit session id for OpenAI-compatible request bodies. |
metadata.sticky_key | JSON body metadata | Compatibility fallback, especially useful for Anthropic-style continuation flows. |
Precedence is X-Session-Id → metadata.session_id → metadata.sticky_key. If none are supplied, TaaS can still infer continuity heuristically, but explicit IDs are more predictable.
All errors return a JSON object with a detail or error field:
| Status | Meaning |
|---|---|
400 | Bad Request — invalid parameters or malformed body |
401 | Unauthorized — missing or invalid Bearer token |
403 | Forbidden — token lacks permission for this action |
404 | Not Found — model or resource doesn't exist |
422 | Validation Error — check request body fields |
429 | Rate Limited — slow down or contact support for higher limits |
500 | Server Error — try again or contact support |
Returns the list of all models available to your account. Each model object includes its identifier, type, and owner. Use the id field as the model parameter in inference requests.
| Field | Type | Description |
|---|---|---|
id | string | Model identifier to use in API calls (e.g. claude-sonnet-4) |
object | string | Always "model" |
owned_by | string | Provider name (e.g. anthropic, openai) |
# List available models curl https://taas.cloudsigma.com/v1/models \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx"
import requests response = requests.get( "https://taas.cloudsigma.com/v1/models", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"} ) models = response.json() for m in models: print(m["id"]) # e.g. "claude-sonnet-4"
const response = await fetch( "https://taas.cloudsigma.com/v1/models", { headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx" } } ); const models = await response.json(); models.forEach(m => console.log(m.id));
[
{
"id": "claude-sonnet-4",
"object": "model",
"owned_by": "anthropic"
},
{
"id": "gpt-4o",
"object": "model",
"owned_by": "openai"
},
{
"id": "minimax-m2",
"object": "model",
"owned_by": "minimax"
}
]
Generate a chat completion response from a model. Supports streaming via Server-Sent Events. Compatible with OpenAI's chat/completions API — existing OpenAI clients work by changing the base URL and API key.
| Field | Type | Description |
|---|---|---|
model required | string | Model ID from /v1/models (e.g. claude-sonnet-4) |
messages required | array | Array of {role, content} objects. Roles: system, user, assistant |
metadata.session_id | string | Optional explicit session id for continuity and easier per-request tracing. |
metadata.sticky_key | string | Optional sticky routing key. Used after X-Session-Id and metadata.session_id. |
stream | boolean | Stream tokens via SSE (default: false) |
temperature | number | Sampling temperature 0–2 (default: 1.0). Higher = more creative |
max_tokens | integer | Maximum tokens to generate |
top_p | number | Nucleus sampling probability 0–1 (default: 1.0) |
| Field | Type | Description |
|---|---|---|
choices[].message.role | string | Always "assistant" |
choices[].message.content | string | Generated text |
usage.prompt_tokens | integer | Tokens in the input messages |
usage.completion_tokens | integer | Tokens generated |
usage.total_tokens | integer | Sum of prompt + completion tokens |
# Chat completion with explicit session continuity curl -X POST https://taas.cloudsigma.com/v1/chat/completions \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -H "X-Session-Id: user-42-chat-a" \ -d '{ "model": "claude-sonnet-4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum entanglement."} ], "metadata": { "session_id": "user-42-chat-a" }, "temperature": 0.7, "max_tokens": 512 }'
import requests response = requests.post( "https://taas.cloudsigma.com/v1/chat/completions", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, json={ "model": "claude-sonnet-4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain quantum entanglement."} ], "temperature": 0.7, "max_tokens": 512 } ) result = response.json() print(result["choices"][0]["message"]["content"])
const response = await fetch( "https://taas.cloudsigma.com/v1/chat/completions", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx", "Content-Type": "application/json", }, body: JSON.stringify({ model: "claude-sonnet-4", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "Explain quantum entanglement." } ], temperature: 0.7, max_tokens: 512, }), } ); const data = await response.json(); console.log(data.choices[0].message.content);
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "claude-sonnet-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum entanglement is a phenomenon..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 142,
"total_tokens": 170
}
}
Native Codex-compatible passthrough endpoint for Codex-specific clients such as Codex ACP. Requests sent here are routed to Codex OAuth tokens only and are forwarded without the Chat Completions → Codex transformation used by the OpenAI-compatible proxy path.
Important: this endpoint expects Codex-native request semantics.
In particular, input must be a list and stream must be set to true.
| Field | Type | Description |
|---|---|---|
model required | string | Codex-capable model id (for example gpt-5.4) |
instructions | string | Optional system/instructions text |
input required | array | Native Codex input message array |
stream required | boolean | Must be true for native Codex requests |
store | boolean | Passed through as provided |
Recommended for harnesses and agent runtimes that already speak Codex-native protocol. For ordinary OpenAI SDK traffic, keep using /v1/chat/completions and the standard OpenAI-compatible base URL.
curl -X POST https://taas.cloudsigma.com/v1/codex/responses \
-H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4",
"store": false,
"stream": true,
"instructions": "Reply with exactly OK",
"input": [
{"role": "user", "content": "Reply with exactly OK"}
]
}'
Generate vector embeddings for one or more text inputs. Use embeddings for semantic search, similarity comparison, clustering, and retrieval-augmented generation (RAG).
| Field | Type | Description |
|---|---|---|
model required | string | Embedding model ID (e.g. bge-m3) |
input required | string or array | Text string or array of strings to embed. Max 2048 tokens per string. |
| Field | Type | Description |
|---|---|---|
data[].embedding | array | Dense float vector representing the input text |
data[].index | integer | Index of the input string this embedding corresponds to |
usage.prompt_tokens | integer | Total tokens processed |
usage.total_tokens | integer | Same as prompt_tokens for embeddings |
# Generate embeddings curl -X POST https://taas.cloudsigma.com/v1/embeddings \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-m3", "input": ["Hello world", "How are you?"] }'
import requests response = requests.post( "https://taas.cloudsigma.com/v1/embeddings", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, json={ "model": "bge-m3", "input": ["Hello world", "How are you?"] } ) data = response.json() vector = data["data"][0]["embedding"] print(f"Dimensions: {len(vector)}")
const response = await fetch( "https://taas.cloudsigma.com/v1/embeddings", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx", "Content-Type": "application/json", }, body: JSON.stringify({ model: "bge-m3", input: ["Hello world", "How are you?"], }), } ); const data = await response.json(); console.log(`Dims: ${data.data[0].embedding.length}`);
{
"object": "list",
"model": "bge-m3",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0234, -0.0871, 0.1203, ...]
},
{
"object": "embedding",
"index": 1,
"embedding": [0.0512, -0.0344, 0.0987, ...]
}
],
"usage": {
"prompt_tokens": 7,
"total_tokens": 7
}
}
Rerank a list of documents by relevance to a query. Ideal for improving retrieval quality in RAG pipelines — pass candidate documents from a vector search and get them sorted by true semantic relevance.
| Field | Type | Description |
|---|---|---|
model required | string | Reranker model ID (e.g. bge-reranker-v2-m3) |
query required | string | The search query to rank documents against |
documents required | array | Array of document strings to score and rank |
| Field | Type | Description |
|---|---|---|
results[].index | integer | Original index of the document in the input array |
results[].relevance_score | number | Relevance score (higher = more relevant). Results are sorted descending. |
# Rerank documents curl -X POST https://taas.cloudsigma.com/v1/rerank \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-reranker-v2-m3", "query": "What is quantum computing?", "documents": [ "Quantum computing uses qubits to process information.", "Classical computers use binary bits.", "The weather today is sunny." ] }'
import requests response = requests.post( "https://taas.cloudsigma.com/v1/rerank", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, json={ "model": "bge-reranker-v2-m3", "query": "What is quantum computing?", "documents": [ "Quantum computing uses qubits to process information.", "Classical computers use binary bits.", "The weather today is sunny." ] } ) results = response.json()["results"] for r in results: print(f"idx={r['index']} score={r['relevance_score']:.4f}")
const response = await fetch( "https://taas.cloudsigma.com/v1/rerank", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx", "Content-Type": "application/json", }, body: JSON.stringify({ model: "bge-reranker-v2-m3", query: "What is quantum computing?", documents: [ "Quantum computing uses qubits to process information.", "Classical computers use binary bits.", "The weather today is sunny.", ], }), } ); const { results } = await response.json();
{
"model": "bge-reranker-v2-m3",
"results": [
{
"index": 0,
"relevance_score": 0.9823
},
{
"index": 1,
"relevance_score": 0.4512
},
{
"index": 2,
"relevance_score": 0.0031
}
]
}
Transcribe audio to text using Whisper or another speech-to-text model. Send the audio file as multipart/form-data. Supports WAV, MP3, M4A, FLAC, OGG, and WebM formats.
| Field | Type | Description |
|---|---|---|
file required | file | Audio file to transcribe. Max 25 MB. |
model | string | Model to use (default: whisper) |
language | string | ISO-639-1 language code (e.g. en, de). Auto-detected if omitted. |
| Field | Type | Description |
|---|---|---|
text | string | The transcribed text from the audio file |
# Transcribe audio file curl -X POST https://taas.cloudsigma.com/v1/audio/transcriptions \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -F "[email protected]" \ -F "model=whisper"
import requests with open("recording.wav", "rb") as f: response = requests.post( "https://taas.cloudsigma.com/v1/audio/transcriptions", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, files={"file": ("recording.wav", f, "audio/wav")}, data={"model": "whisper"} ) result = response.json() print(result["text"])
const formData = new FormData(); formData.append("file", audioBlob, "recording.wav"); formData.append("model", "whisper"); const response = await fetch( "https://taas.cloudsigma.com/v1/audio/transcriptions", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx" }, body: formData, } ); const { text } = await response.json(); console.log(text);
{
"text": "Hello, this is a test recording for the transcription API."
}
Convert text to natural-sounding audio. The response is a binary audio stream (WAV or MP3). Use the Kokoro model for high-quality multilingual speech synthesis.
| Field | Type | Description |
|---|---|---|
model required | string | TTS model ID (e.g. kokoro) |
input required | string | Text to synthesize into speech. Max 4096 characters. |
voice | string | Voice identifier (model-specific). Omit for model default. |
response_format | string | Audio format: mp3 or wav (default: mp3) |
speed | number | Speaking speed multiplier 0.25–4.0 (default: 1.0) |
Returns raw audio binary data with Content-Type: audio/mpeg (MP3) or audio/wav. Save the response body directly to a file.
# Text to speech — save to file curl -X POST https://taas.cloudsigma.com/v1/audio/speech \ -H "Authorization: Bearer taas_xxxxxxxxxxxxxxxx" \ -H "Content-Type: application/json" \ -d '{ "model": "kokoro", "input": "Welcome to Token-as-a-Service.", "voice": "af_sarah" }' \ --output speech.mp3
import requests response = requests.post( "https://taas.cloudsigma.com/v1/audio/speech", headers={"Authorization": "Bearer taas_xxxxxxxxxxxxxxxx"}, json={ "model": "kokoro", "input": "Welcome to Token-as-a-Service.", "voice": "af_sarah" } ) with open("speech.mp3", "wb") as f: f.write(response.content) print("Saved speech.mp3")
const response = await fetch( "https://taas.cloudsigma.com/v1/audio/speech", { method: "POST", headers: { "Authorization": "Bearer taas_xxxxxxxxxxxxxxxx", "Content-Type": "application/json", }, body: JSON.stringify({ model: "kokoro", input: "Welcome to Token-as-a-Service.", voice: "af_sarah", }), } ); const audioBuffer = await response.arrayBuffer(); // Play or save the audio binary
# Binary audio data (MP3 or WAV)
# Content-Type: audio/mpeg
# Save response body directly to speech.mp3
Returns the current health status of the TaaS API gateway. Use this endpoint to verify connectivity and confirm the service is operational before sending inference requests. No authentication required.
| Field | Type | Description |
|---|---|---|
status | string | Always "ok" when the service is healthy |
# Check API health
curl https://taas.cloudsigma.com/health
import requests response = requests.get("https://taas.cloudsigma.com/health") print(response.json()) # {"status": "ok"}
const response = await fetch( "https://taas.cloudsigma.com/health" ); const data = await response.json(); console.log(data.status); // "ok"
{
"status": "ok"
}