Reliable AI inference

AI jobs that complete.
Even under load.

Most inference APIs are synchronous. Under load, they slow down, rate limit, or drop requests. Marigold is queue-first: every job is durably stored and processed to completion, regardless of load.

Request access

The problem with synchronous inference

A synchronous HTTP call to an inference API requires a persistent connection for the entire duration of inference. For a 30-second image generation or a 60-second large model response, this means 30 to 60 seconds of held TCP connections, mobile networks dropping mid-request, and load spikes that cascade into 429 errors or silent failures. Every dropped connection is a lost request.

Submit and disconnect

POST your request and receive a job ID immediately. The connection closes. Your process, browser tab, or mobile app does not need to stay open. Retrieve the result when convenient -- seconds, minutes, or hours later.

No dropped requests under load

Jobs queue durably. A burst of 100 requests with one GPU available processes sequentially without losing a single job. There are no 429 errors, no silent failures, no requests to retry. The queue absorbs the burst.

Full pipeline visibility

Every job has a status: queued, provisioning, processing, complete, or error. Queue depth is observable. You know exactly where every request is in the pipeline at all times.

Decouple submission from retrieval

The system that submits a job does not need to be the system that retrieves the result. Share a job ID between services, pipelines, or teams. A batch process submits; a dashboard retrieves. A mobile app submits; a server collects and stores.

Webhook delivery

Provide a callback URL at submission time. When the job completes, Marigold POSTs the result directly to your endpoint. No polling loop required for event-driven architectures.

Full audit trail

Every submission, status transition, and completion is recorded with timestamps. Every result is stored and retrievable for up to seven days. Regulated industries require this. Marigold provides it by default.

How it works

1. Submit a job

POST /gen/instruct
{
  "model": "mistralai/mistral-small-24b-instruct-2501",
  "messages": [{"role": "user", "content": "Summarise this contract..."}],
  "callback_url": "https://your-system.com/webhooks/ai"
}

-> { "message_id": "a3f1c2..." }

Returns immediately. The job is durably queued. Your connection closes.

2. Poll for status (optional)

GET /output/gen/instruct/a3f1c2...

-> { "status": "processing" }
-> { "status": "complete", "result": { "choices": [...] } }

Poll from any process, any machine, any time. Or skip polling entirely and use a webhook.

3. Receive via webhook (optional)

POST https://your-system.com/webhooks/ai
{
  "message_id": "a3f1c2...",
  "status": "complete",
  "result": { "choices": [...], "usage": {...} }
}

Marigold delivers the result to your endpoint when the job completes. No polling required.

Built for production pipelines

Overnight batch processing

Submit thousands of document classification, embedding, or summarisation jobs before close of business. Retrieve results the next morning. No held connections, no timeouts, no babysitting.

Mobile and browser applications

Submit an image generation or long-form inference request from a mobile app. The user closes the app. When they return, the result is waiting. No background process required on the client.

Multi-step pipelines

Fan out a document to ten models in parallel. Each job has its own ID. A reconciliation process collects results as they complete and assembles the final output. No coordination infrastructure required.

Cross-team workflows

A data team submits a batch of inference jobs and shares the job IDs with a compliance team. The compliance team retrieves and reviews results independently, on their own schedule. No shared credentials, no coupled systems.

Event-driven architectures

A webhook fires when each job completes. Your downstream system triggers on the event rather than polling. Marigold integrates with any event-driven architecture without additional infrastructure.

Regulated industries

Every job is logged with submission time, completion time, model used, token counts, and result. Seven-day result retention by default. The audit trail required by FCA, NHS, and public sector procurement is built in.

AI inference that finishes what it starts.

Paid plans are in limited release. Leave your email to be notified when developer access opens.

Join the waitlist

No spam. One email when access opens.