Skip to content
Built by Postindustria. We help teams build agentic production systems.

LLM Configuration

NeoGraph does not own the LLM client. You register a factory that creates LLM instances and a compiler that builds prompts. The framework calls them for every LLM invocation — produce, gather, execute, and Oracle merge.

Called once at application startup. Registers two callbacks:

from neograph import configure_llm
configure_llm(
llm_factory=my_factory,
prompt_compiler=my_compiler,
)

If you call run() on a graph that has LLM nodes without calling configure_llm() first, you get a ValueError.

The factory creates a LangChain BaseChatModel instance for a given tier. NeoGraph calls it once per LLM invocation (not per node definition).

The minimal factory takes a tier string and returns a model:

from langchain_openai import ChatOpenAI
MODELS = {
"fast": "gpt-4o-mini",
"reason": "gpt-4o",
"large": "o1",
}
configure_llm(
llm_factory=lambda tier: ChatOpenAI(model=MODELS[tier]),
prompt_compiler=my_compiler,
)

The advanced signature receives the node name and per-node LLM config:

def my_factory(tier, node_name=None, llm_config=None):
config = llm_config or {}
return ChatOpenAI(
model=MODELS[tier],
temperature=config.get("temperature", 0),
max_tokens=config.get("max_tokens", 4096),
)
configure_llm(llm_factory=my_factory, prompt_compiler=my_compiler)

The framework inspects your factory with inspect.signature at configure_llm() time and passes only the kwargs it declares on each call. Factories that use **kwargs receive everything. This provides backward compatibility without runtime try/except — you can start simple and add parameters later.

Each node can carry an llm_config dict that is passed to the factory. With @node, pass it as a keyword:

from neograph import node
@node(output=ClassifiedClaims, prompt='rw/classify', model='reason',
llm_config={"temperature": 0.7, "max_tokens": 2048})
def classify(raw_claims: RawClaims) -> ClassifiedClaims: ...

The llm_config dict is opaque to NeoGraph. The framework passes it through to your factory; you decide what keys mean. Common uses: temperature, max_tokens, top_p, stop sequences.

Mode inference still works when llm_config is present — the decorator infers produce from the prompt= and model= kwargs, regardless of llm_config.

The compiler builds the message list that the LLM receives. NeoGraph calls it for every LLM invocation.

The minimal compiler takes a template name and the input data:

from langchain_core.messages import HumanMessage
configure_llm(
llm_factory=my_factory,
prompt_compiler=lambda template, data: [
HumanMessage(content=f"Template: {template}\n\nData: {data}")
],
)

The full signature receives node_name, config, output_model, and llm_config:

def my_compiler(template, data, *, node_name=None, config=None,
output_model=None, llm_config=None):
configurable = (config or {}).get("configurable", {})
node_id = configurable.get("node_id", "")
project_root = configurable.get("project_root", "")
strategy = (llm_config or {}).get("output_strategy", "structured")
# Load context files from disk based on pipeline metadata
context = load_context(project_root, node_id)
# Build prompt -- inject JSON schema for json_mode
messages = get_prompt(
template_name=template,
node_id=node_id,
context_files=context,
analysis_notes=format_notes(data),
)
# For json_mode: tell the LLM what JSON shape to return
if strategy in ("json_mode", "text") and output_model:
import json
schema = json.dumps(output_model.model_json_schema(), indent=2)
messages.append({"role": "user",
"content": f"Return a JSON object matching this schema:\n{schema}"})
return messages
configure_llm(llm_factory=my_factory, prompt_compiler=my_compiler)

The framework inspects your compiler with inspect.signature at configure_llm() time and passes only the kwargs it declares. Any of these work:

  • (template, data, node_name=, config=, output_model=, llm_config=) — full context
  • (template, data, node_name=, config=) — partial
  • (template, data) — minimal
  • (template, data, **kw) — accepts everything

No try/except at runtime. You can upgrade incrementally without breaking existing compilers.

When you call run(), all fields from the input dict are automatically injected into config["configurable"]:

from neograph import run
result = run(
graph,
input={"node_id": "BR-042", "project_root": "/repo"},
config={"configurable": {"rate_limiter": my_limiter}},
)

Inside every node, config["configurable"] contains:

{
"node_id": "BR-042",
"project_root": "/repo",
"rate_limiter": my_limiter, # from explicit config
}

Input fields take precedence over existing configurable values if there is a key conflict. This means your prompt compiler and LLM factory can access pipeline metadata (node_id, project_root) and shared resources (rate limiters, database connections) without any node reaching into state.

Put expensive-to-create resources in config["configurable"] and access them from your factory, compiler, or @node functions via FromConfig[T]:

from neograph import node, FromConfig
# At the call site
db_pool = create_connection_pool()
rate_limiter = TokenBucketLimiter(tokens_per_minute=100_000)
result = run(
graph,
input={"node_id": "BR-042"},
config={
"configurable": {
"db_pool": db_pool,
"rate_limiter": rate_limiter,
}
},
)
# In a scripted @node function
@node(output=ContextData)
def load_context(
claims: Claims,
db_pool: FromConfig[ConnectionPool],
rate_limiter: FromConfig[RateLimiter],
) -> ContextData:
rate_limiter.acquire()
rows = db_pool.query("SELECT * FROM context WHERE id = %s", claims.id)
return ContextData(rows=rows)

Many models (DeepSeek-R1, o1, QwQ, local models) don’t support with_structured_output. NeoGraph provides three output strategies, selected per-node via llm_config["output_strategy"]:

StrategyHow it worksBest for
"structured"llm.with_structured_output(model)OpenAI, Anthropic, Gemini
"json_mode"LLM returns raw text, framework strips fences + parses JSONDeepSeek, local models
"text"LLM returns prose with embedded JSON, framework extracts itReasoning models (o1, R1)

The framework calls llm.with_structured_output(output_model). Works with any LangChain model that supports native structured output. The framework tries include_raw=True to capture token counts, falling back without it.

@node(output=Claims, prompt='rw/classify', model='fast')
def classify(topic: RawText) -> Claims: ...
# No output_strategy needed — "structured" is the default

The framework calls llm.invoke() directly (no with_structured_output), then strips markdown code fences and parses the JSON into the Pydantic model. Works with any model that returns JSON in its text response.

@node(output=Claims, prompt='rw/decompose', model='reason',
llm_config={"output_strategy": "json_mode"})
def decompose(topic: RawText) -> Claims: ...

The framework handles:

  • Markdown fence stripping (```json ... ```)
  • JSON object extraction from surrounding text
  • Pydantic model_validate_json for type-safe parsing

Same as json_mode — the framework extracts JSON from the LLM’s plain text response. Use this name to signal intent when the model returns prose with embedded JSON rather than fenced code blocks.

@node(output=Analysis, prompt='rw/analyze', model='reason',
llm_config={"output_strategy": "text"})
def analyze(topic: RawText) -> Analysis: ...

Different nodes can use different strategies. This is the production pattern for pipelines that use multiple model providers:

from neograph import node
# DeepSeek for creative decomposition (no structured output support)
@node(output=Claims, prompt='rw/decompose', model='reason',
llm_config={"temperature": 0.9, "output_strategy": "json_mode"})
def decompose(topic: RawText) -> Claims: ...
# Gemini for precise classification (native structured output)
@node(output=ClassifiedClaims, prompt='rw/classify', model='fast',
llm_config={"temperature": 0, "output_strategy": "structured"})
def classify(decompose: Claims) -> ClassifiedClaims: ...
pipeline = construct_from_module(sys.modules[__name__])

For ReAct modes, the output strategy applies to the final parsing step after the tool loop completes:

  • "structured": a separate with_structured_output call parses the final answer
  • "json_mode" / "text": the last message in the tool loop is parsed directly as JSON
@node(mode='gather', output=ResearchResult, prompt='rw/research', model='reason',
tools=[Tool(name="search", budget=5)],
llm_config={"output_strategy": "json_mode"})
def research(query: SearchQuery) -> ResearchResult: ...

Both the factory and compiler accept multiple signatures. The framework inspects your callable at configure_llm() time and passes only the kwargs it declares:

CallbackAccepted parameters
llm_factorytier (required) — plus any of: node_name, llm_config
prompt_compilertemplate, data (required) — plus any of: node_name, config, output_model, llm_config

Functions using **kwargs receive all parameters. This lets you start with a minimal lambda and add parameters later without breaking any node definitions.


Documentation © 2025-2026 Constantine Mirin, mirin.pro. Licensed under CC BY-ND 4.0.