Skip to content
Built by Postindustria. We help teams build agentic production systems.

Pipeline Spec Format

Pipelines don’t have to be Python. A YAML or JSON document can describe the full topology — nodes, modes, modifiers, wiring — and load_spec compiles it to the same IR that @node and ForwardConstruct produce. The spec format is the compiler between AI intent and structured execution: an LLM generates the YAML, neograph validates and compiles it.

name: draft-pipeline
nodes:
- name: generate
mode: think
prompt: "Write a first draft about the given topic."
model: fast
outputs: Draft
pipeline:
nodes: [generate]
from neograph import load_spec, compile, run
construct = load_spec("pipeline.yaml") # file path, YAML string, or dict
graph = compile(construct)
result = run(graph, input={"node_id": "demo"})

load_spec accepts a file path, a YAML/JSON string, or a pre-parsed dict. It returns a Construct — the same IR that @node and Construct(nodes=[...]) produce — ready for compile().

Every spec has three top-level keys:

KeyRequiredDescription
nameYesPipeline name. Becomes the Construct name.
nodesYesArray of node definitions.
constructsNoArray of sub-construct definitions.
pipelineYesOrdered list of node/construct names to execute.
nodes:
- name: classify
mode: think
prompt: "Classify the input into categories."
model: fast
outputs: Classification
tools: [search_codebase] # for agent/act modes
context: [raw_input] # state fields injected into prompt
llm_config:
temperature: 0.2
inputs: # explicit; usually inferred from pipeline order
upstream_name: TypeName
FieldRequiredDescription
nameYesUnique node name within the pipeline.
modeNoscripted, think, agent, act, raw. Default: scripted.
outputsYesOutput type name (resolved from type registry).
promptNoPrompt template or registered template name. Required for LLM modes.
modelNoModel tier name. Required for LLM modes.
toolsNoTool names for agent/act modes.
scripted_fnNoRegistered function name for scripted mode.
inputsNoExplicit {upstream: type} mapping. Usually inferred.
contextNoState field names injected verbatim into the prompt.
llm_configNoPer-node LLM settings (temperature, max_tokens, etc.).

Modifiers are declared as sibling keys on the node, not as nested arrays:

nodes:
- name: analyze
mode: think
prompt: "Analyze the input."
model: reason
outputs: Analysis
oracle:
models: [reason, fast, reason]
merge_prompt: "Combine these analyses into a single report."
oracle:
n: 3 # number of parallel generators
models: [reason, fast] # model tiers (infers n from length)
merge_fn: combine # registered scripted merge function
merge_prompt: "..." # OR: LLM merge prompt
merge_model: reason # model for LLM merge (default: reason)

Exactly one of merge_fn or merge_prompt is required. models assigns tiers round-robin and infers n from len(models).

each:
over: claims.items # dotted path to collection in state
key: label # field on each item used as dispatch key
loop:
when: "score < 0.8" # condition expression (field op literal)
max_iterations: 10 # default: 10
on_exhaust: error # "error" (default) or "last"

The when expression supports <, >, <=, >=, ==, != with numeric, boolean (true/false), or quoted string literals. Dotted field access works: result.score < 0.8.

operator:
when: needs_review # registered condition name

Sub-constructs group nodes with isolated state and typed I/O boundaries:

constructs:
- name: refine
input: Draft
output: Draft
nodes: [review, revise]
loop:
when: "score < 0.8"
max_iterations: 5
FieldRequiredDescription
nameYesSub-construct name.
inputYesInput type (boundary port).
outputYesOutput type (boundary port).
nodesYesNode names (references to top-level nodes array).

Sub-constructs support the same modifier blocks as nodes: oracle, each, loop, operator.

A pipeline that decomposes a codebase description into security claims, verifies each claim against the codebase, and produces a report:

name: security-analysis
nodes:
- name: decompose
mode: think
prompt: |
Decompose this codebase description into discrete security claims.
Each claim should be independently verifiable.
model: reason
outputs: Claims
oracle:
models: [reason, fast, reason]
merge_prompt: |
You have multiple decompositions. Combine them into a single,
deduplicated list of security claims.
- name: verify
mode: agent
prompt: |
Verify this security claim against the codebase.
Search for evidence that supports or refutes the claim.
model: fast
outputs: MatchResult
tools: [search_codebase, read_file]
each:
over: decompose.items
key: claim_id
- name: report
mode: think
prompt: |
Produce a security analysis report from the verification results.
Flag critical findings and suggest remediation steps.
model: reason
outputs: Report
pipeline:
nodes: [decompose, verify, report]

Full example: iterative refinement with sub-construct

Section titled “Full example: iterative refinement with sub-construct”

A pipeline that drafts content, then iteratively reviews and revises it:

name: iterative-writer
nodes:
- name: draft
mode: think
prompt: "Write a first draft about the given topic."
model: fast
outputs: Draft
- name: review
mode: think
prompt: "Review this draft. Score 0-1 and provide feedback."
model: reason
outputs: ReviewResult
- name: revise
mode: think
prompt: "Revise the draft based on the review feedback."
model: fast
outputs: Draft
constructs:
- name: refine
input: Draft
output: Draft
nodes: [review, revise]
loop:
when: "score < 0.8"
max_iterations: 5
pipeline:
nodes: [draft, refine]

Types referenced in the spec (like Draft, Claims, Report) must exist in the type registry. You have two options:

from neograph import register_type
from myapp.schemas import Draft, Claims, Report
register_type("Draft", Draft)
register_type("Claims", Claims)
register_type("Report", Report)
construct = load_spec("pipeline.yaml")

Option 2: Auto-generate types from a project surface

Section titled “Option 2: Auto-generate types from a project surface”

Pass a project surface definition to load_spec. Types are generated as Pydantic models from JSON Schema definitions:

project.yaml
types:
Draft:
properties:
content: { type: string }
score: { type: number }
iteration: { type: integer }
required: [content]
ReviewResult:
properties:
score: { type: number }
feedback: { type: string }
required: [score, feedback]
Claims:
properties:
items:
type: array
items: { $ref: Draft }
required: [items]
construct = load_spec("pipeline.yaml", project="project.yaml")

The project surface uses JSON Schema conventions. Nested type references use $ref with the type name (not a JSON pointer). Types are processed in definition order, so a type can reference another type defined earlier in the file.

Supported field types:

JSON SchemaPython type
stringstr
numberfloat
integerint
booleanbool
array (with items)list[T]
$ref: TypeNameRegistered Pydantic model

The spec format is validated against a JSON Schema at load time (when jsonschema is installed). The schema is bundled at neograph/schemas/neograph-pipeline.schema.json.

The full schema is available at the GitHub repository.

After parsing, load_spec builds a Construct and the same validator that checks @node and programmatic pipelines runs on the result:

  • Every node’s inputs are type-checked against upstream outputs
  • Modifier chains are validated (Each paths resolve, Oracle has a merge strategy)
  • Fan-in parameters are type-checked across all upstreams
  • Cycles and self-dependencies raise ConstructError

If the spec is malformed, you get a clear error pointing at the broken edge — before anything executes. When an LLM generates the spec, surface the error back and let it revise.


Documentation © 2025-2026 Constantine Mirin, mirin.pro. Licensed under CC BY-ND 4.0.