Pricing

Priced by how you use it.

Flat rate for human and developer use. Provisioned dedicated capacity for agents.

Flat-rate inference works when usage is human-paced. It does not work for production agent workloads running continuously at scale. Marigold offers both: a predictable monthly amount for interactive and automation use, and reserved dedicated infrastructure for agent workloads that need a capacity guarantee.

A human cannot send requests faster than they can think. An agent can. The tier you need depends on who is driving the requests, not how many people are on the account.

Human

£19

per month

  • Interactive and IDE use
  • CPU model access
  • Shared infrastructure
  • Full model registry
  • Output storage included
  • Email support
Join waitlist

Agentic

from £299

per month

  • Production agent workloads
  • Dedicated GPU capacity
  • No shared queue
  • Scales with your workload
  • Custom model onboarding
  • GDPR data processing agreement
  • Dedicated account support
  • SLA available
Talk to us

Guest access is available with no account required. Rate limits apply. Sufficient to evaluate the API and test model outputs before committing to a plan.

Request access

Usage pattern determines the right tier

The distinction is not about the number of people on an account. It is about whether there is a human in the loop deciding when requests are made.

Usage pattern Human Developer Agentic
IDE and interactive use Yes Yes Yes
Scripted automation Limited Yes Yes
Continuous agent loops No No Yes
GPU model access No Yes Yes
Dedicated capacity No No Yes
Pricing model Flat monthly Flat monthly Provisioned
Data processing agreement No No Yes

What is the difference between the three tiers?

The tiers are priced by usage pattern, not seat count or key count. Human covers interactive use: IDE assistants, manual scripts, occasional API calls where a person decides when each request happens. Developer covers automation and pipelines: scheduled jobs, eval runs, batch processing initiated by a human but not requiring their presence. Agentic covers production systems running continuously without human initiation -- loops, pipelines, and workloads where the whole point is removing the human from the request path.

How does Agentic tier pricing work?

Agentic is provisioned capacity, not a flat rate. The monthly minimum is a retainer that reserves dedicated GPU infrastructure for your workloads. Usage scales with your agents up to an agreed ceiling. Your agents are never queued behind other customers' workloads. Accounts that consistently use significantly below their provisioned capacity are moved to the Developer tier -- the reservation only makes sense if the capacity is being used.

Why is there no flat rate for agent workloads?

A production agent running continuously can consume GPU capacity worth multiples of any reasonable flat monthly fee within days. Flat-rate pricing for agent workloads either caps the agent (defeating the purpose) or is not sustainable at the infrastructure cost. Provisioned capacity is honest about what it costs to run dedicated GPU infrastructure and gives you a guarantee in return.

Does Marigold train on my data?

No. Inference requests are not retained for training, fine-tuning, or any other purpose. Outputs are stored briefly for retrieval and then deleted. Agentic tier accounts can obtain a GDPR data processing agreement that sets this out contractually.

Where does my data go?

Inference runs on private AWS infrastructure in London (UK) by default. Data does not leave that region. No third-party model provider receives your inputs. Output references are purged after a configurable retention window.

What is the OpenAI-compatible endpoint?

POST /v1/chat/completions accepts the same request shape as the OpenAI Chat Completions API. Switch the base URL and API key; the model parameter maps to a Marigold model name. The developer page has a before-and-after example.

What is the difference between Marigold and self-hosting a model directly?

Self-hosting requires provisioning compute, managing model weights, implementing an inference API, and maintaining all of it. Marigold provides the typed async API, weight caching, and an eval surface as a managed layer on private AWS infrastructure. You control the deployment region and the model selection.

Know what inference costs before you build.

Human and Developer plans are in limited release. For Agentic capacity, get in touch directly.

Join the waitlist

No spam. One email when your tier opens.

Noted. We will be in touch.