For developers

OpenAI-compatible inference, done properly.

Change the base URL. Keep the code. Open-weight models in your region.

The POST /v1/chat/completions endpoint accepts the same request shape as the OpenAI Chat Completions API. Behind it: open-weight models running on private AWS in London, with no prompt retention and a flat monthly rate. Works in any tool or SDK that supports a custom OpenAI base URL.

See the code IDE setup guide

Every OpenAI integration is one environment variable from pointing at a private open-weight model running in your jurisdiction, with a predictable monthly cost and no token-by-token exposure.

One-line swap

Change the base URL. Keep the code.

The model parameter maps to a Marigold model name. The rest of the request is unchanged.

Before

from openai import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=[{"role": "user", "content": "Summarise this document: ..."}]
)

After

from openai import OpenAI

client = OpenAI(
    base_url="https://api.marigold.run/v1",
    api_key="your-api-key"
)

response = client.chat.completions.create(
    model="qwen/qwen2.5-7b-instruct",
    messages=[{"role": "user", "content": "Summarise this document: ..."}]
)

Raw HTTP

curl https://api.marigold.run/v1/chat/completions \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen2.5-7b-instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Native async API

Beyond chat completions

The native API covers the full model registry: embeddings, image-to-text, text-to-speech, depth estimation, segmentation, and eval. Submit a job, poll for the result, retrieve binary outputs from storage.

Submit and poll

import requests, time

API = "https://api.marigold.run"
KEY = {"Authorization": "Bearer your-api-key"}

job = requests.post(f"{API}/infer", headers=KEY, json={
    "model_type": "text-embedding",
    "model_name": "sentence-transformers/all-MiniLM-L6-v2",
    "input": "The quick brown fox"
}).json()

while True:
    result = requests.get(f"{API}/infer/{job['job_id']}", headers=KEY).json()
    if result["status"] == "complete":
        print(result["output"]["embedding"])
        break
    time.sleep(0.5)

IDE and editor support

Works wherever a custom OpenAI base URL is accepted

Cursor, Continue, Cline, and Aider all support a custom OpenAI-compatible endpoint. Set the base URL to https://api.marigold.run/v1 and your Marigold API key. Full configuration instructions are on the IDE setup page.

IDE setup guide →

What you get

OpenAI-compatible endpoint

Chat completions and embeddings via the same interface as the OpenAI API. Swap the base URL; the rest of your code is unchanged. Works with any SDK or tool that accepts a custom base URL.

Full model registry

Instruct, embedding, TTS, image-to-text, depth, segmentation, and eval model types. One API key gives access to all of them. See the full registry.

Flat monthly pricing

No per-token billing. No usage alerts. No end-of-month reconciliation. From £19/month for individual developer access.

Private AWS deployment

Models run on private AWS infrastructure in London. No shared multi-tenant pool. Your data does not leave your chosen region as part of inference.

LangChain and LlamaIndex compatible

Any framework that wraps the OpenAI SDK works. Set openai_api_base and your key; the model name maps to a Marigold model. No custom integration required.

No prompt retention

Inference inputs are not retained for any downstream purpose. Outputs are stored briefly for retrieval and then deleted. UK and EU data residency by default.

One URL change. Your data stays in your region.

Paid plans are in limited release. Leave your email to be notified when developer access opens.

Join the waitlist

No spam. One email when access opens.

Request access

Noted. We will be in touch.