Skip to content
AI & Machine Learning

OpenProse: Making AI Agent Behavior Repeatable and Trustworthy

OpenProse converts your best AI agent sessions into repeatable, reviewable programs — here's how it works and why it matters for production builds.

By NerdHeadz Team
OpenProse: Making AI Agent Behavior Repeatable and Trustworthy
// 01 · The essay

The Real Problem With AI Agents Isn't Intelligence — It's Repeatability

The hardest part of building with AI agents today isn't finding a capable model. Most frontier models can write solid code, plan multi-step tasks, and recover from errors in real time. The hard part is that a brilliant agent session on Tuesday has no reliable relationship to what the same agent does on Wednesday. AI agent behavior is non-deterministic by default, and that gap between "it worked once" and "it works every time" is where production systems fall apart.

We've seen this pattern repeatedly in our AI agent development work: teams get impressive results in a demo, then spend weeks chasing consistency before anything ships. The root issue isn't model quality — it's the absence of a shared contract between the human, the agent, and the workflow itself. OpenProse, an open-source project, proposes an elegant answer to exactly this problem.

What OpenProse Actually Is

Two converging structural slabs compressing toward a glowing hexagonal prism core on dark background

OpenProse is a programming language compiled by an AI agent, not a computer. That distinction matters enormously.

Rather than building another orchestration layer that wraps your agents in external scaffolding, OpenProse injects a contract directly into the coding agent itself. You write a .prose.md file in structured, logical English — and the agent becomes the virtual machine that executes it. There's no middleware, no separate server, no proprietary runtime to maintain.

The core abstraction is deceptively simple: a ### Requires block declares what inputs the workflow needs before it starts, and a ### Ensures block declares what must be provably true when it's done. If you've ever written a function signature and wished an agent would actually honor it, that's the exact feeling OpenProse is designed to produce.

Working on something similar? Talk to our team about your project.

Your Best Agent Sessions Are Already Programs — You Just Can't See Them

Small amber prism above a glowing baseline eclipsed by a large submerged purple slab below

Most developers who work with Claude Code, Codex, or similar tools have experienced a "golden session": every tool call lands, the agent makes smart decisions, the output is exactly right. Then the session ends and that tacit knowledge evaporates into chat history.

OpenProse's session-to-prose tool addresses this directly. It converts JSONL session logs from Claude Code, Codex, and similar agents into .prose.md programs. Critically, it doesn't summarize what happened — it extracts the reusable workflow: phases, decision gates, parallel work patterns, error recovery strategies, and validation checkpoints. The implicit workflow you didn't know you'd authored becomes an explicit, runnable asset.

This is conceptually similar to what GitHub's agent-first development model is pushing toward — treating agent behavior as a versioned artifact that can be reviewed, improved, and shipped like any other piece of software. OpenProse gives that idea a concrete implementation.

Why Skill Declaration Changes Everything

Five prismatic towers of varying heights converging upward toward a single amber apex above

One of the subtler insights embedded in OpenProse is about agent skills and when they get loaded. Standard agent setups surface skills through progressive disclosure — the agent "finds" a skill if it thinks to look for it. That's a coin flip, and in a multi-stage workflow, a missed skill doesn't just degrade quality; it changes the entire character of the output.

OpenProse lets you explicitly declare skills as dependencies at the workflow level. Before a sub-agent runs a stage that requires a specific capability, that skill is loaded deterministically. Each sub-agent runs in its own isolated session, which means scratch work and intermediate reasoning stay contained. The only output that propagates to the next stage is what the ### Ensures block explicitly declares — clean, named artifacts, not polluted context.

This is a meaningful architectural difference from LangChain, CrewAI, and similar frameworks. Those systems abstract orchestration details away from you. OpenProse does the opposite: it makes execution structure visible and reviewable while leaving the actual compute running through the coding agent you already trust. For teams building production AI systems, that distinction between "abstraction hiding complexity" and "structure making complexity legible" is the difference between systems you can debug and systems you can't.

The Audit Trail Is Where Trust Comes From

Four glowing horizontal slabs stacked as accumulating strata on a deep navy gradient background

Reliability in software isn't just about execution — it's about inspectability. OpenProse addresses this with a receipts system: every run deposits its inputs, outputs, logs, and artifacts under a runs/{run-id}/ directory. "The agent said it's done" stops being a vibe and becomes a verifiable claim you can inspect.

The file layout — src/ for authored programs, dist/ for compiled manifests, runs/ for receipts, state/ for durable cross-run memory, and a prose.lock — is deliberately structured to live in git and be reviewed like ordinary software. This is how AI infrastructure at scale actually has to work: not as magic that works until it doesn't, but as auditable systems with observable state.

The Honest Caveats

A translucent dome with misaligned interior fragments rising above a solid purple base prism

OpenProse does not turn language models into deterministic infrastructure — and it's worth being direct about this. LLMs remain non-deterministic at execution time. A well-written prose program makes AI agent behavior dramatically more consistent and inspectable, but it cannot guarantee identical outputs on every run the way a unit test can.

The other critical caveat: a bad prose program will faithfully and repeatably do the wrong thing. OpenProse encodes whatever judgment the author puts into it. The tool amplifies clarity — it doesn't manufacture it. Teams that get the most from it are teams who already know what "good" looks like and need a mechanism to make that knowledge durable.

OpenProse works best on workflows that are genuinely worth making repeatable. Not every one-off prompt needs to become a program. Reserve it for the multi-stage, multi-agent work where consistency is the actual bottleneck.

Ready to build? NerdHeadz ships production AI in weeks, not months. Get a free estimate.

OpenProse represents one of the more honest approaches to the AI agent reliability problem we've seen — it doesn't promise to make agents deterministic, it promises to make their behavior legible, auditable, and reproducible. For teams building real products on top of AI agents, that shift from "it worked once" to "here's the contract it runs against every time" is exactly the infrastructure that separates demos from production systems. The bottleneck in AI agent workflows is not intelligence — it is trust and repeatability.

The bottleneck in AI agent workflows is not intelligence — it is trust and repeatability.

NerdHeadz Engineering
Share article
N

Written by

NerdHeadz Team

Author at NerdHeadz

Frequently asked questions

What is OpenProse and how does it make AI agent behavior repeatable?
OpenProse is an open-source programming language written in logical English and executed by AI coding agents like Claude Code or Codex. It uses a contract-based system with `Requires` and `Ensures` blocks that define exactly what inputs a workflow needs and what outputs it must produce, making agent behavior inspectable and reproducible across sessions.
How is OpenProse different from agent frameworks like LangChain or CrewAI?
Unlike LangChain or CrewAI, OpenProse does not wrap agents in external orchestration middleware. Instead, it embeds the workflow contract directly inside the coding agent, which acts as the compiler and virtual machine. This means you keep full control over execution context while gaining structured, reviewable workflow definitions.
Can OpenProse convert existing Claude Code or Codex sessions into reusable programs?
Yes. OpenProse includes a `session-to-prose` tool that converts JSONL session logs from Claude Code, Codex, and similar agents into `.prose.md` program files. Rather than summarizing the session, it extracts the underlying workflow structure — phases, decision gates, error recovery strategies, and validation steps — so the workflow can be run on demand in future sessions.

Stay in the loop

Engineering notes from the NerdHeadz team. No spam.

Ready to ship something custom?

Schedule a consultation with our team and we’ll send a custom proposal.

Get in touch