LLM Configuration
NeoGraph does not own the LLM client. You register a factory that creates LLM instances and a compiler that builds prompts. The framework calls them for every LLM invocation — produce, gather, execute, and Oracle merge.
configure_llm()
Section titled “configure_llm()”Called once at application startup. Registers two callbacks:
from neograph import configure_llm
configure_llm( llm_factory=my_factory, prompt_compiler=my_compiler,)If you call run() on a graph that has LLM nodes without calling configure_llm() first, you get a ValueError.
LLM factory
Section titled “LLM factory”The factory creates a LangChain BaseChatModel instance for a given tier. NeoGraph calls it once per LLM invocation (not per node definition).
Simple factory
Section titled “Simple factory”The minimal factory takes a tier string and returns a model:
from langchain_openai import ChatOpenAI
MODELS = { "fast": "gpt-4o-mini", "reason": "gpt-4o", "large": "o1",}
configure_llm( llm_factory=lambda tier: ChatOpenAI(model=MODELS[tier]), prompt_compiler=my_compiler,)Advanced factory
Section titled “Advanced factory”The advanced signature receives the node name and per-node LLM config:
def my_factory(tier, node_name=None, llm_config=None): config = llm_config or {} return ChatOpenAI( model=MODELS[tier], temperature=config.get("temperature", 0), max_tokens=config.get("max_tokens", 4096), )
configure_llm(llm_factory=my_factory, prompt_compiler=my_compiler)The framework inspects your factory with inspect.signature at configure_llm() time and passes only the kwargs it declares on each call. Factories that use **kwargs receive everything. This provides backward compatibility without runtime try/except — you can start simple and add parameters later.
Per-node llm_config
Section titled “Per-node llm_config”Each node can carry an llm_config dict that is passed to the factory. With @node, pass it as a keyword:
from neograph import node
@node(output=ClassifiedClaims, prompt='rw/classify', model='reason', llm_config={"temperature": 0.7, "max_tokens": 2048})def classify(raw_claims: RawClaims) -> ClassifiedClaims: ...The llm_config dict is opaque to NeoGraph. The framework passes it through to your factory; you decide what keys mean. Common uses: temperature, max_tokens, top_p, stop sequences.
Mode inference still works when llm_config is present — the decorator infers produce from the prompt= and model= kwargs, regardless of llm_config.
Prompt compiler
Section titled “Prompt compiler”The compiler builds the message list that the LLM receives. NeoGraph calls it for every LLM invocation.
Simple compiler
Section titled “Simple compiler”The minimal compiler takes a template name and the input data:
from langchain_core.messages import HumanMessage
configure_llm( llm_factory=my_factory, prompt_compiler=lambda template, data: [ HumanMessage(content=f"Template: {template}\n\nData: {data}") ],)Advanced compiler
Section titled “Advanced compiler”The full signature receives node_name, config, output_model, and llm_config:
def my_compiler(template, data, *, node_name=None, config=None, output_model=None, llm_config=None): configurable = (config or {}).get("configurable", {}) node_id = configurable.get("node_id", "") project_root = configurable.get("project_root", "") strategy = (llm_config or {}).get("output_strategy", "structured")
# Load context files from disk based on pipeline metadata context = load_context(project_root, node_id)
# Build prompt -- inject JSON schema for json_mode messages = get_prompt( template_name=template, node_id=node_id, context_files=context, analysis_notes=format_notes(data), )
# For json_mode: tell the LLM what JSON shape to return if strategy in ("json_mode", "text") and output_model: import json schema = json.dumps(output_model.model_json_schema(), indent=2) messages.append({"role": "user", "content": f"Return a JSON object matching this schema:\n{schema}"})
return messages
configure_llm(llm_factory=my_factory, prompt_compiler=my_compiler)The framework inspects your compiler with inspect.signature at configure_llm() time and passes only the kwargs it declares. Any of these work:
(template, data, node_name=, config=, output_model=, llm_config=)— full context(template, data, node_name=, config=)— partial(template, data)— minimal(template, data, **kw)— accepts everything
No try/except at runtime. You can upgrade incrementally without breaking existing compilers.
Config injection
Section titled “Config injection”When you call run(), all fields from the input dict are automatically injected into config["configurable"]:
from neograph import run
result = run( graph, input={"node_id": "BR-042", "project_root": "/repo"}, config={"configurable": {"rate_limiter": my_limiter}},)Inside every node, config["configurable"] contains:
{ "node_id": "BR-042", "project_root": "/repo", "rate_limiter": my_limiter, # from explicit config}Input fields take precedence over existing configurable values if there is a key conflict. This means your prompt compiler and LLM factory can access pipeline metadata (node_id, project_root) and shared resources (rate limiters, database connections) without any node reaching into state.
Shared resources via config
Section titled “Shared resources via config”Put expensive-to-create resources in config["configurable"] and access them from your factory, compiler, or @node functions via FromConfig[T]:
from neograph import node, FromConfig
# At the call sitedb_pool = create_connection_pool()rate_limiter = TokenBucketLimiter(tokens_per_minute=100_000)
result = run( graph, input={"node_id": "BR-042"}, config={ "configurable": { "db_pool": db_pool, "rate_limiter": rate_limiter, } },)
# In a scripted @node function@node(output=ContextData)def load_context( claims: Claims, db_pool: FromConfig[ConnectionPool], rate_limiter: FromConfig[RateLimiter],) -> ContextData: rate_limiter.acquire() rows = db_pool.query("SELECT * FROM context WHERE id = %s", claims.id) return ContextData(rows=rows)Output strategies
Section titled “Output strategies”Many models (DeepSeek-R1, o1, QwQ, local models) don’t support with_structured_output. NeoGraph provides three output strategies, selected per-node via llm_config["output_strategy"]:
| Strategy | How it works | Best for |
|---|---|---|
"structured" | llm.with_structured_output(model) | OpenAI, Anthropic, Gemini |
"json_mode" | LLM returns raw text, framework strips fences + parses JSON | DeepSeek, local models |
"text" | LLM returns prose with embedded JSON, framework extracts it | Reasoning models (o1, R1) |
structured (default)
Section titled “structured (default)”The framework calls llm.with_structured_output(output_model). Works with any LangChain model that supports native structured output. The framework tries include_raw=True to capture token counts, falling back without it.
@node(output=Claims, prompt='rw/classify', model='fast')def classify(topic: RawText) -> Claims: ...# No output_strategy needed — "structured" is the defaultjson_mode
Section titled “json_mode”The framework calls llm.invoke() directly (no with_structured_output), then strips markdown code fences and parses the JSON into the Pydantic model. Works with any model that returns JSON in its text response.
@node(output=Claims, prompt='rw/decompose', model='reason', llm_config={"output_strategy": "json_mode"})def decompose(topic: RawText) -> Claims: ...The framework handles:
- Markdown fence stripping (
```json ... ```) - JSON object extraction from surrounding text
- Pydantic
model_validate_jsonfor type-safe parsing
Same as json_mode — the framework extracts JSON from the LLM’s plain text response. Use this name to signal intent when the model returns prose with embedded JSON rather than fenced code blocks.
@node(output=Analysis, prompt='rw/analyze', model='reason', llm_config={"output_strategy": "text"})def analyze(topic: RawText) -> Analysis: ...Mixing strategies in one pipeline
Section titled “Mixing strategies in one pipeline”Different nodes can use different strategies. This is the production pattern for pipelines that use multiple model providers:
from neograph import node
# DeepSeek for creative decomposition (no structured output support)@node(output=Claims, prompt='rw/decompose', model='reason', llm_config={"temperature": 0.9, "output_strategy": "json_mode"})def decompose(topic: RawText) -> Claims: ...
# Gemini for precise classification (native structured output)@node(output=ClassifiedClaims, prompt='rw/classify', model='fast', llm_config={"temperature": 0, "output_strategy": "structured"})def classify(decompose: Claims) -> ClassifiedClaims: ...
pipeline = construct_from_module(sys.modules[__name__])Strategies in gather/execute modes
Section titled “Strategies in gather/execute modes”For ReAct modes, the output strategy applies to the final parsing step after the tool loop completes:
"structured": a separatewith_structured_outputcall parses the final answer"json_mode"/"text": the last message in the tool loop is parsed directly as JSON
@node(mode='gather', output=ResearchResult, prompt='rw/research', model='reason', tools=[Tool(name="search", budget=5)], llm_config={"output_strategy": "json_mode"})def research(query: SearchQuery) -> ResearchResult: ...Backward compatibility
Section titled “Backward compatibility”Both the factory and compiler accept multiple signatures. The framework inspects your callable at configure_llm() time and passes only the kwargs it declares:
| Callback | Accepted parameters |
|---|---|
llm_factory | tier (required) — plus any of: node_name, llm_config |
prompt_compiler | template, data (required) — plus any of: node_name, config, output_model, llm_config |
Functions using **kwargs receive all parameters. This lets you start with a minimal lambda and add parameters later without breaking any node definitions.
Documentation © 2025-2026 Constantine Mirin, mirin.pro. Licensed under CC BY-ND 4.0.