Pricing
Priced by how you use it.
Flat rate for human and developer use. Provisioned dedicated capacity for agents.
Flat-rate inference works when usage is human-paced. It does not work for production agent workloads running continuously at scale. Marigold offers both: a predictable monthly amount for interactive and automation use, and reserved dedicated infrastructure for agent workloads that need a capacity guarantee.
A human cannot send requests faster than they can think. An agent can. The tier you need depends on who is driving the requests, not how many people are on the account.
Plans
Human
£19
per month
- Interactive and IDE use
- CPU model access
- Shared infrastructure
- Full model registry
- Output storage included
- Email support
Developer
£79
per month
- Automation and pipeline use
- CPU and GPU model access
- Shared infrastructure
- Full model registry
- Eval surface included
- Output storage included
- Priority support
Agentic
from £299
per month
- Production agent workloads
- Dedicated GPU capacity
- No shared queue
- Scales with your workload
- Custom model onboarding
- GDPR data processing agreement
- Dedicated account support
- SLA available
Guest access is available with no account required. Rate limits apply. Sufficient to evaluate the API and test model outputs before committing to a plan.
Request accessHow the tiers compare
Usage pattern determines the right tier
The distinction is not about the number of people on an account. It is about whether there is a human in the loop deciding when requests are made.
| Usage pattern | Human | Developer | Agentic |
|---|---|---|---|
| IDE and interactive use | Yes | Yes | Yes |
| Scripted automation | Limited | Yes | Yes |
| Continuous agent loops | No | No | Yes |
| GPU model access | No | Yes | Yes |
| Dedicated capacity | No | No | Yes |
| Pricing model | Flat monthly | Flat monthly | Provisioned |
| Data processing agreement | No | No | Yes |
Frequently asked
What is the difference between the three tiers?
The tiers are priced by usage pattern, not seat count or key count. Human covers interactive use: IDE assistants, manual scripts, occasional API calls where a person decides when each request happens. Developer covers automation and pipelines: scheduled jobs, eval runs, batch processing initiated by a human but not requiring their presence. Agentic covers production systems running continuously without human initiation -- loops, pipelines, and workloads where the whole point is removing the human from the request path.
How does Agentic tier pricing work?
Agentic is provisioned capacity, not a flat rate. The monthly minimum is a retainer that reserves dedicated GPU infrastructure for your workloads. Usage scales with your agents up to an agreed ceiling. Your agents are never queued behind other customers' workloads. Accounts that consistently use significantly below their provisioned capacity are moved to the Developer tier -- the reservation only makes sense if the capacity is being used.
Why is there no flat rate for agent workloads?
A production agent running continuously can consume GPU capacity worth multiples of any reasonable flat monthly fee within days. Flat-rate pricing for agent workloads either caps the agent (defeating the purpose) or is not sustainable at the infrastructure cost. Provisioned capacity is honest about what it costs to run dedicated GPU infrastructure and gives you a guarantee in return.
Does Marigold train on my data?
No. Inference requests are not retained for training, fine-tuning, or any other purpose. Outputs are stored briefly for retrieval and then deleted. Agentic tier accounts can obtain a GDPR data processing agreement that sets this out contractually.
Where does my data go?
Inference runs on private AWS infrastructure in London (UK) by default. Data does not leave that region. No third-party model provider receives your inputs. Output references are purged after a configurable retention window.
What is the OpenAI-compatible endpoint?
POST /v1/chat/completions accepts the same request
shape as the OpenAI Chat Completions API. Switch the base URL
and API key; the model parameter maps to a Marigold
model name. The developer page
has a before-and-after example.
What is the difference between Marigold and self-hosting a model directly?
Self-hosting requires provisioning compute, managing model weights, implementing an inference API, and maintaining all of it. Marigold provides the typed async API, weight caching, and an eval surface as a managed layer on private AWS infrastructure. You control the deployment region and the model selection.
Know what inference costs before you build.
Human and Developer plans are in limited release. For Agentic capacity, get in touch directly.
Join the waitlist
No spam. One email when your tier opens.
Noted. We will be in touch.