Private model hosting. Typed workflows. Evals.

Self-hosted open-weight inference, done properly.

Define your task. Build your pipeline. Measure and improve.

Most AI inference pipelines are built once per engagement and discarded. Marigold is the layer you bring once and keep: open-weight models on private AWS in London, a typed async inference API, declarative YAML workflows, and an eval surface that sharpens with production use. No request reaches OpenAI, Anthropic, or any other external provider. From £19/month, flat rate.

Inference runs on private AWS in London. No request leaves UK jurisdiction, no third-party model provider is in the request path, and no inference input is retained for training. For regulated sectors, that combination resolves the constraint before procurement begins. Architecture and compliance detail →

01

Private Inference API

A unified async API over self-hosted HuggingFace models covering text, image, audio, and cross-modal operations. One container image, one weight cache, per-model isolation. The marginal cost of adding a model falls with each one added.

02

Workflow execution

Declarative YAML pipelines over the model registry. Steps declare typed inputs and outputs; the executor resolves the dependency graph, runs independent steps in parallel, and advances on result. Every step has an audit trail.

03

Evals

Run any model or pipeline against a labelled dataset. Score outputs using the same handler registry. Build custom eval libraries from production runs and corrected outputs. The task specification sharpens with use.

Content ingestion and moderation

Extract text, generate embeddings, classify by category, and gate restricted content before it reaches an index. Text and image embeddings produced in the same pipeline enable cross-modal search.

img2txt text-embed instruct image-embed

Quality-gated image generation

Generate, score for safety, aesthetic quality, and prompt alignment. Reset and regenerate automatically until thresholds are met or attempt limits are reached.

txt2img image-eval image-text-eval

Visual conformance checking

Compare an observation image against a reference using structural embeddings. On deviation, segment both images, diff the masks, and produce a natural-language description of the discrepancy.

image-embed img2mask img2txt instruct

Document briefing and audio delivery

Extract entities, classify, summarise, and convert to speech in multiple languages from a single workflow submission. Output is a written summary plus audio files, one per target language.

img2txt instruct tts

Batch tabular classification

Classify large volumes of unlabelled rows against a small labelled example set. Low-confidence predictions are passed to an instruct model for natural-language explanation. No retraining required.

tabular-classify instruct

Site monitoring with change detection

Embed and compare images from multiple monitored locations in parallel. On detected change, describe what changed and assemble a report. Deliver as text and audio on a schedule.

image-embed img2txt instruct tts

Model types

  • text-embedding, image-embedding
  • instruct (chat / instruction-following)
  • txt2img, img2txt, img2mask, depth
  • tts, txt2audio
  • text-eval, image-eval, image-text-eval
  • text-similarity

Workflow features

  • Shared state across steps
  • Sequential and parallel step execution
  • Conditional branching on model output values
  • Halt and reject with a structured result
  • Prompt interpolation from prior step outputs
  • Per-step audit trail -- status, timing, output, run count

Workflow execution is built on runfox and json-logic-path, open source libraries available on PyPI.

Production AI pipelines should not be built from scratch each time.

Paid plans are in limited release. Leave your email and we will reach out when access opens.

Join the waitlist

No spam. One email when your tier opens.

Noted. We will be in touch.

Questions? ed@bayis.co.uk