AI gateway

Using the free AI

The gateway is an OpenAI-shaped SSE proxy at api.xgoose.org/ai. You can drive it from a skill, a script, or any OpenAI SDK that lets you point at a custom base_url.

When to use it

The xgoose extension calls this gateway by default, so most users never touch the URL directly. You only need to call it yourself when you are

writing a skill that needs an extra LLM completion outside the agent loop, or
experimenting from your terminal or a script.

For production workloads of any size, point the extension at your own provider key from /account — the gateway is best thought of as a generous trial, not a hosting plan.

Endpoints

Two routes, both behind a Bearer token:

Method & path	Shape
`POST /ai/v1/chat/completions`	OpenAI chat completions, SSE streaming.
`GET /ai/v1/models`	Allow-listed model ids; refreshed from the upstream every 5 min.

Both endpoints require a valid session — sign in at /login and pass either the cookie set by /auth/login or an Authorization: Bearer <access_token> header.

Request shape

Identical to OpenAI's chat completion schema, with a few caveats:

model — required; must be on the allow-list (see GET /v1/models).
messages — required; the combined byte length is capped at 32 KB.
stream — currently always emitted as SSE regardless of the flag; if you set false, just buffer the stream client-side.
max_tokens — defaults to 1024 in the pre-charge estimate; supply your own to size requests accurately.
temperature, top_p, etc. pass through unchanged.
user — optional; if you omit it, the gateway sets user: "xgoose:<your-user-id>" before forwarding so accounting attributes the call correctly.

Example

curl https://api.xgoose.org/ai/v1/chat/completions \
  -H "Authorization: Bearer $XGOOSE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "stream": true,
    "messages": [
      { "role": "system", "content": "You are helpful." },
      { "role": "user",   "content": "Hello in three words." }
    ]
  }'

The response is text/event-stream with the usual data: {...} frames; the final frame carries a usage object the gateway uses to settle your quota. Response headers include x-request-id (and x-upstream-request-id when the upstream supplies one) — keep them around when filing support requests.

Models

The exact list shifts as we add and retire models. Always read it at runtime:

curl -H "Authorization: Bearer $XGOOSE_TOKEN" \
  https://api.xgoose.org/ai/v1/models

Models are fetched from the underlying LiteLLM proxy and cached in each worker isolate for 5 minutes. Submitting ?refresh=1 bypasses the cache for the current isolate. Any model id not on the list returns 403.

Quota and rate limit

Every request runs through three gates before it touches the upstream:

Per-minute rate limit. Default 20 requests per minute per user — wall-clock minute aligned. Excess returns 429 with {"window": "minute"}.
Model whitelist. Cached for 5 min as described above.
Pre-charge. A cheap estimate (prompt bytes ÷ 4 + requested max_tokens) is debited against your rolling 24-hour bucket. If it would exceed your tier limit, the call short-circuits with 429 and a {"tier", "limit", "used"} payload.

When the stream completes, the gateway reconciles the pre-charge against the real usage frame from the upstream — refunds the difference on under-use and debits a top-up on over-use. If the upstream hard-fails before returning a stream, the pre-charge is fully refunded.

Default tier limits are listed under Limitations.

Apple App Attest header (optional)

Requests from the Safari extension carry an x-xgoose-attestation header. The gateway verifies it against Apple's App Attest service and, on success, upgrades your tier to attested (default 500,000 tokens / 24h) for the duration of the request. There's nothing to do here from a userland skill — the header is added automatically by the Safari build, and the Chrome / Firefox builds don't send it.

Error shape

Every non-200 response is JSON with at minimum a message field. Common codes:

Status	When
`400`	Missing `model` or `messages`, prompt over 32 KB.
`401`	No session, expired access token, revoked session.
`403`	Model not on the whitelist.
`429`	Rate-limited or daily quota exhausted.
`500`	Upstream LiteLLM is misconfigured or down. The pre-charge is refunded.

Bring your own key

From /account you can register your own provider key (OpenAI, Anthropic, Gemini, …). The extension will route directly to the upstream when you select that key, bypassing the gateway entirely — no quota, no rate limit, and the request never hits xgoose.org. We recommend this for any sustained use.