Pricing
Flat monthly inference. No token counting.
Three tiers. One fixed rate each. No per-token billing regardless of usage.
A single monthly amount covers all inference across the model registry. No metered billing, no usage alerts, no end-of-month reconciliation. Your data stays in your chosen region and is never used for training.
Per-token billing passes infrastructure cost variance directly to the caller. A flat monthly rate decouples your costs from your usage patterns and makes inference budgeting a one-line entry.
Plans
Developer
£19
per month
- Full model registry access
- CPU inference, rate-limited
- Single API key
- Output storage included
- Email support
Team
£79
per month
- Full model registry access
- CPU and GPU inference
- Up to five API keys
- Output storage included
- Eval surface included
- Priority support
Pro
£299
per month
- Everything in Team
- Dedicated capacity
- Custom model onboarding
- Unlimited API keys
- GDPR data processing agreement
- Dedicated account support
- SLA available
Guest access is available with no account required. Rate limits apply. Sufficient to evaluate the API and compare model outputs across the registry.
Request accessComparison
What changes with flat-rate pricing
The differences between flat monthly pricing and per-token billing go beyond cost. Model lock-in, data jurisdiction, and training policies each differ.
| Criterion | Marigold | GPT-4.1-mini | Claude Haiku |
|---|---|---|---|
| Pricing model | Flat monthly | Per token | Per token |
| Cost predictability | Fixed | Variable | Variable |
| Model choice | Open-weight registry | GPT-4.1-mini only | Claude Haiku only |
| Data jurisdiction | UK / EU (your region) | US (OpenAI) | US / EU (Anthropic) |
| Training on prompts | Never | Opt-out required | No |
| Bring your own weights | Yes (Pro) | No | No |
Frequently asked
Does Marigold train on my data?
No. Inference requests are not retained for training, fine-tuning, or any other purpose. Outputs are stored briefly for retrieval and then deleted. The Pro tier includes a GDPR data processing agreement that sets this out contractually.
What models does Marigold support?
The hosted registry includes Qwen2.5 instruct variants (1.5B, 7B, 14B), Mistral 7B Instruct, PaliGemma 3B for image-to-text, CLIP for image and text embedding, the facebook/mms-tts family for text-to-speech in English, Welsh, French, German, Spanish, Finnish, and Dutch, plus depth estimation and segmentation models. Custom model onboarding is available on the Pro tier. See the full registry.
Is there a free tier?
Yes. Guest access requires no account. Rate limits apply per IP across text-embedding and instruct model types. Paid plans lift all rate limits and add GPU model access.
Where does my data go?
Inference runs on private AWS infrastructure in London (UK) by default. Data does not leave that region. No third-party model provider receives your inputs. Output references are purged after a configurable retention window.
What is the OpenAI-compatible endpoint?
POST /v1/chat/completions accepts the same request shape as the
OpenAI Chat Completions API. Switch the base URL and API key; the
model parameter maps to a Marigold model name.
The developer page has a before-and-after example.
Can I run Qwen or Mistral via Marigold?
Yes. Qwen2.5 instruct variants and Mistral 7B Instruct are in the hosted registry. Submit via the inference API or the OpenAI-compatible endpoint. See the IDE setup guide if you want to use these models inside Cursor, Continue, or Aider.
Is Marigold GDPR-compliant?
The infrastructure is designed for UK and EU data residency. Inference does not leave your chosen region. Pro tier accounts can obtain a signed data processing agreement. Marigold does not act as data controller for inference inputs; your organisation remains controller.
What is the difference between Marigold and self-hosting a model directly?
Self-hosting requires provisioning compute, managing model weights, implementing an inference API, and maintaining all of it. Marigold provides the typed async API, weight caching, and an eval surface as a managed layer on private AWS infrastructure. You control the deployment region and the model selection.
Know what inference costs before you build.
Paid plans are in limited release. Leave your email and we will reach out when capacity opens.
Join the waitlist
No spam. One email when your tier opens.
Noted. We will be in touch.