For developers
OpenAI-compatible inference, done properly.
Change the base URL. Keep the code. Open-weight models in your region.
The POST /v1/chat/completions endpoint accepts the same request
shape as the OpenAI Chat Completions API. Behind it: open-weight models running
on private AWS in London, with no prompt retention and a flat monthly rate.
Works in any tool or SDK that supports a custom OpenAI base URL.
Every OpenAI integration is one environment variable from pointing at a private open-weight model running in your jurisdiction, with a predictable monthly cost and no token-by-token exposure.
One-line swap
Change the base URL. Keep the code.
The model parameter maps to a Marigold model name.
The rest of the request is unchanged.
Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Summarise this document: ..."}]
)
After
from openai import OpenAI
client = OpenAI(
base_url="https://api.marigold.run/v1",
api_key="your-api-key"
)
response = client.chat.completions.create(
model="qwen/qwen2.5-7b-instruct",
messages=[{"role": "user", "content": "Summarise this document: ..."}]
)
Raw HTTP
curl https://api.marigold.run/v1/chat/completions \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen2.5-7b-instruct",
"messages": [{"role": "user", "content": "Hello"}]
}'
Native async API
Beyond chat completions
The native API covers the full model registry: embeddings, image-to-text, text-to-speech, depth estimation, segmentation, and eval. Submit a job, poll for the result, retrieve binary outputs from storage.
Submit and poll
import requests, time
API = "https://api.marigold.run"
KEY = {"Authorization": "Bearer your-api-key"}
job = requests.post(f"{API}/infer", headers=KEY, json={
"model_type": "text-embedding",
"model_name": "sentence-transformers/all-MiniLM-L6-v2",
"input": "The quick brown fox"
}).json()
while True:
result = requests.get(f"{API}/infer/{job['job_id']}", headers=KEY).json()
if result["status"] == "complete":
print(result["output"]["embedding"])
break
time.sleep(0.5)
IDE and editor support
Works wherever a custom OpenAI base URL is accepted
Cursor, Continue, Cline, and Aider all support a custom OpenAI-compatible
endpoint. Set the base URL to https://api.marigold.run/v1
and your Marigold API key. Full configuration instructions are on the
IDE setup page.
What you get
01
OpenAI-compatible endpoint
Chat completions and embeddings via the same interface as the OpenAI API. Swap the base URL; the rest of your code is unchanged. Works with any SDK or tool that accepts a custom base URL.
02
Full model registry
Instruct, embedding, TTS, image-to-text, depth, segmentation, and eval model types. One API key gives access to all of them. See the full registry.
03
Flat monthly pricing
No per-token billing. No usage alerts. No end-of-month reconciliation. From £19/month for individual developer access.
04
Private AWS deployment
Models run on private AWS infrastructure in London. No shared multi-tenant pool. Your data does not leave your chosen region as part of inference.
05
LangChain and LlamaIndex compatible
Any framework that wraps the OpenAI SDK works. Set
openai_api_base and your key; the model name maps
to a Marigold model. No custom integration required.
06
No prompt retention
Inference inputs are not retained for any downstream purpose. Outputs are stored briefly for retrieval and then deleted. UK and EU data residency by default.
One URL change. Your data stays in your region.
Paid plans are in limited release. Leave your email to be notified when developer access opens.