Multilingual proxy for LLMs and MCP servers

The same model. In every language.

Most popular LLMs are trained mostly on English. Ask the same question in German or Japanese and the answer is measurably worse. Same model, same money, worse outcome.

AdaptiveAPI sits between your app and the vendor, so the model keeps thinking in English while your user keeps reading their language. JSON shapes, tool calls, code blocks, and streaming deltas survive the round trip.

  • Drop-in for OpenAI, Anthropic, and MCP
  • Streaming, tool calls, JSON shape preserved
  • Self-host. SQLite by default, Postgres at scale

Your LLM is fluent in English
and approximately fluent in everything else.

Translation pipelines exist. Most break tool calls, lose JSON shape, or stutter on streams. AdaptiveAPI is built around the wire format, not on top of it. Your SDK keeps working. Your streaming UI keeps animating. Your function arguments arrive shape-safe.

How it works

Three moving parts. One URL change.

01

Point your SDK at AdaptiveAPI

Change the base URL. Keep your keys, keep your retries, keep your streaming code. The OpenAI client points at /v1/<route>/, the Anthropic client at /anthropic/v1/<route>/, MCP clients at /mcp/<route>.

02

AdaptiveAPI translates both directions

Inbound text becomes the model's working language. The model answers. Outbound text becomes the user's language. IDs, URLs, code blocks, and tool argument keys are kept verbatim. Sentence-boundary streaming keeps first-token latency low.

03

Per route, you decide the policy

Bind a glossary, a DeepL v3 style rule, an allowlist, and a PII redactor to each route token. Override per request with X-AdaptiveApi-* headers. Audit every call without storing bodies.

Code

Same SDK. Same keys. New base URL.

from openai import OpenAI

client = OpenAI(
    api_key="sk-...",
    base_url="https://adaptiveapi.example.com/v1/rt_yourtenant_xxxxx",
)

resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user",
               "content": "Was ist der Unterschied zwischen Array und Liste?"}],
    extra_headers={"X-AdaptiveApi-Target-Lang": "de"},
)
# reply arrives in German. The upstream model saw English.
import anthropic

client = anthropic.Anthropic(
    api_key="sk-ant-...",
    base_url="https://adaptiveapi.example.com/anthropic/v1/rt_yourtenant_xxxxx",
)

message = client.messages.create(
    model="claude-3-7-sonnet-latest",
    max_tokens=1024,
    system="Tu es un assistant serviable.",
    messages=[{"role": "user",
               "content": "Explique-moi le théorème de Pythagore."}],
    extra_headers={"X-AdaptiveApi-Target-Lang": "fr"},
)
{
  "mcpServers": {
    "linear-de": {
      "url": "https://adaptiveapi.example.com/mcp/rt_yourtenant_xxxxx",
      "headers": {
        "Authorization": "Bearer <your-linear-oauth-token>"
      }
    }
  }
}
// Tool descriptions, arguments, and results all arrive translated.
// Your OAuth token rides upstream byte-identical. Never stored. Never logged.
{
  "upstream": {
    "urlTemplate": "https://api.cohere.com/v1/chat",
    "method": "POST"
  },
  "request": {
    "translateJsonPaths": [
      "$.message",
      "$.chat_history[*].message",
      "$.tools[*].description"
    ]
  },
  "response": {
    "streaming": "sse",
    "eventPath": "$.text",
    "finalPaths": ["$.text", "$.generations[*].text"]
  },
  "direction": "bidirectional"
}
What you get

Translation that respects the wire format.

Streaming that actually streams

Buffer per delta.content, flush at sentence boundaries, re-emit synthetic SSE deltas. Pick the default sentence-boundary mode or a progressive mode for lower first-token latency.

Tool calls survive the trip

Function arguments is JSON inside JSON. AdaptiveAPI parses it, walks it with a key-aware denylist, translates the leaves that are actually human text, and re-serialises. IDs, slugs, and URLs stay byte identical.

Glossaries and style rules

Tenant-scoped glossaries map 1:1 to DeepL v3. Style rules carry up to ten 300-character custom instructions. Bind to a route, override per request.

PII redaction, opt in

Regex detectors for email, IBAN, Luhn-validated cards, US SSN, IPv4, and phone. Plug in Microsoft Presidio for higher recall. The upstream model never sees the original spans, so it cannot emit them.

Routes, tokens, audit

Argon2id-hashed route tokens, never logged. Audit metadata covers status, language pair, char counts, and integrity failures. Bodies are off by default.

One proxy, many surfaces

OpenAI Chat and Responses APIs, Anthropic Messages, remote and stdio MCP servers, plus a declarative adapter for any HTTP+JSON API.

Quick start

Up in one command.

# clone, configure, run
git clone https://github.com/DeeJayTC/adaptiveapi.git
cd adaptiveapi/deploy
cp .env.example .env
docker compose up --build

# add the chat demo
docker compose --profile demo up --build
  • API on :8080, health at /healthz
  • Admin UI on :8000
  • Chat demo on :8100 (with the demo profile)
  • SQLite volume by default. Postgres + Redis blocks pre-wired in the compose file.
Deploy

Self-host it the way you already self-host things.

Docker compose

One file. API, admin UI, and a SQLite volume. Postgres, Redis, and OTLP collector blocks pre-wired and commented out.

  • SSE-friendly defaults
  • Optional chat demo profile
  • One .env covers everything

Helm chart

Two Deployments, an Ingress with SSE annotations, an HPA on CPU (2 to 10 replicas), a PDB with minAvailable: 1, and a PVC for SQLite when no external database is configured.

  • Production defaults
  • Existing-secret references
  • Values reference in the repo

From source

.NET 10 API and a Vue 3 admin UI. dotnet run on one side, npm run dev on the other. Nothing exotic.

  • .NET 10 minimal API
  • Vue 3 + Vite
  • Plugin loader for org, SCIM, billing
FAQ

The questions you would have asked.

Do you store my OpenAI, Anthropic, or MCP token?

No. Upstream credentials ride in request headers, are forwarded byte identical, and are never persisted or logged. Only a SHA-256 fingerprint of the Authorization header goes into the audit log for abuse correlation. Never the value itself.

What happens if a translation loses a placeholder?

Each round trip has a hard invariant. Every <adaptiveapi id="TAG_n"/> tag emitted into the source must reappear exactly once in the translator output. If it does not, the pipeline falls back to the source text and increments adaptiveapi_placeholder_integrity_failures_total. Alerts should fire above 0.5%.

Does streaming actually stream?

Yes. The pipeline buffers per choices[i].delta.content, flushes at sentence boundaries once the buffer reaches 80 chars (configurable), translates the completed segment through the full placeholder round trip, then re-emits a synthetic SSE delta. Clients see token-by-token UI with a modest per-sentence latency cost.

What about tool calls?

OpenAI's tool_calls[*].function.arguments is a JSON string containing JSON. The pipeline parses the inner JSON, walks it with a key-aware denylist (id, uuid, email, slug, url, plus *_id and *_code patterns), translates the string leaves that are not identifiers, and re-serialises. Extend the denylist per tenant via proxy rules.

Can I run AdaptiveAPI without any translator?

Yes. The default passthrough translator leaves text unchanged, which makes the proxy a route-token-gated forwarder. Useful while getting routes set up, or for letting some routes through untranslated with X-AdaptiveApi-Mode: off.

What is the license?

GNU Affero General Public License v3.0. If you run a modified version as a network service, the AGPL requires you to make the corresponding source available to the users of that service. Unmodified deployments are not affected.

Self-host it. Fork it. Read the source.

One repo. AGPL. Everything described above is running today.