Morph Models

Fast general models for agent loops

Run the primary agent loop on fast, OpenAI-compatible coding models served on Morph's custom kernels. One API for chat, code generation, and reasoning.

Get API Key Read the docs

OUTPUT SPEED

ONE OPENAI-COMPATIBLE API

THE LINEUP

Morph Models

Frontier coding models,
served on custom kernels

Output speed

Codegen-specific optimizations and custom GPU kernels. Up to 200 tok/s on Qwen 3.5 397B.

One OpenAI-compatible API

Point your existing client at api.morphllm.com. Switch models by changing one string.

import OpenAI from "openai";

const client = new OpenAI({

  baseURL: "https://api.morphllm.com/v1",

  apiKey: process.env.MORPH_API_KEY,

});

const res = await client.chat.completions.create({

  model: "morph-qwen35-397b",

  messages: [{ role: "user", content: "Refactor this function..." }],

});

The lineup

Open-weight frontier models with long context, served and billed per token. No per-seat fees.

// Available general models

morph-qwen35-397b      // 397B MoE, 262k context

morph-minimax27-230b   // 230B MoE, agentic workflows

morph-dsv4flash        // 393k context, fast

morph-qwen36-27b       // dense, low latency

Built for production agent workloads

Low latency

Custom GPU kernels and speculative decoding tuned for the code-generation workload.

High throughput

Batched serving across a GPU fleet for high-volume agent traffic.

Self-hosted

Run on your own infrastructure for enterprise security and air-gapped environments.

Monitor usage across every model

Usage dashboard

Track tokens, latency, and spend per model across your whole org.

Detailed logging

Inspect individual requests, prompts, and responses.

Run your first model call in minutes.

Get API Key

Free tier available. Pay only for what you use.

General Coding Models

Model Pricing

OpenAI-Compatible API

Morph WarpGrep

Morph Fast Apply

Morph Compact

Morph SDK

Morph MCP

Self-hosting

Blog

Startup Credits

Students

Contact Us

About

Careers

Fast general models for agent loops

Frontier coding models,
served on custom kernels

Output speed

One OpenAI-compatible API

The lineup

Built for production agent workloads

Monitor usage across every model

Run your first model call in minutes.

Fast general models for agent loops

Frontier coding models, served on custom kernels

Output speed

One OpenAI-compatible API

The lineup

Built for production agent workloads

Monitor usage across every model

Run your first model call in minutes.

Frontier coding models,
served on custom kernels