LLM Configuration

NeoGraph does not own the LLM client. You register a factory that creates LLM instances and a compiler that builds prompts. The framework calls them for every LLM invocation — produce, gather, execute, and Oracle merge.

configure_llm()

Called once at application startup. Registers two callbacks:

from neograph import configure_llm

configure_llm(
    llm_factory=my_factory,
    prompt_compiler=my_compiler,
)

If you call run() on a graph that has LLM nodes without calling configure_llm() first, you get a ValueError.

LLM factory

The factory creates a LangChain BaseChatModel instance for a given tier. NeoGraph calls it once per LLM invocation (not per node definition).

Simple factory

The minimal factory takes a tier string and returns a model:

from langchain_openai import ChatOpenAI

MODELS = {
    "fast": "gpt-4o-mini",
    "reason": "gpt-4o",
    "large": "o1",
}

configure_llm(
    llm_factory=lambda tier: ChatOpenAI(model=MODELS[tier]),
    prompt_compiler=my_compiler,
)

Advanced factory

The advanced signature receives the node name and per-node LLM config:

def my_factory(tier, node_name=None, llm_config=None):
    config = llm_config or {}
    return ChatOpenAI(
        model=MODELS[tier],
        temperature=config.get("temperature", 0),
        max_tokens=config.get("max_tokens", 4096),
    )

configure_llm(llm_factory=my_factory, prompt_compiler=my_compiler)

The framework inspects your factory with inspect.signature at configure_llm() time and passes only the kwargs it declares on each call. Factories that use **kwargs receive everything. This provides backward compatibility without runtime try/except — you can start simple and add parameters later.

Per-node llm_config

Each node can carry an llm_config dict that is passed to the factory. With @node, pass it as a keyword:

from neograph import node

@node(output=ClassifiedClaims, prompt='rw/classify', model='reason',
      llm_config={"temperature": 0.7, "max_tokens": 2048})
def classify(raw_claims: RawClaims) -> ClassifiedClaims: ...

The llm_config dict is opaque to NeoGraph. The framework passes it through to your factory; you decide what keys mean. Common uses: temperature, max_tokens, top_p, stop sequences.

Mode inference still works when llm_config is present — the decorator infers produce from the prompt= and model= kwargs, regardless of llm_config.

Prompt compiler

The compiler builds the message list that the LLM receives. NeoGraph calls it for every LLM invocation.

Simple compiler

The minimal compiler takes a template name and the input data:

from langchain_core.messages import HumanMessage

configure_llm(
    llm_factory=my_factory,
    prompt_compiler=lambda template, data: [
        HumanMessage(content=f"Template: {template}\n\nData: {data}")
    ],
)

Advanced compiler

The full signature receives node_name, config, output_model, and llm_config:

def my_compiler(template, data, *, node_name=None, config=None,
                output_model=None, llm_config=None):
    configurable = (config or {}).get("configurable", {})
    node_id = configurable.get("node_id", "")
    project_root = configurable.get("project_root", "")
    strategy = (llm_config or {}).get("output_strategy", "structured")

    # Load context files from disk based on pipeline metadata
    context = load_context(project_root, node_id)

    # Build prompt -- inject JSON schema for json_mode
    messages = get_prompt(
        template_name=template,
        node_id=node_id,
        context_files=context,
        analysis_notes=format_notes(data),
    )

    # For json_mode: tell the LLM what JSON shape to return
    if strategy in ("json_mode", "text") and output_model:
        import json
        schema = json.dumps(output_model.model_json_schema(), indent=2)
        messages.append({"role": "user",
            "content": f"Return a JSON object matching this schema:\n{schema}"})

    return messages

configure_llm(llm_factory=my_factory, prompt_compiler=my_compiler)

The framework inspects your compiler with inspect.signature at configure_llm() time and passes only the kwargs it declares. Any of these work:

(template, data, node_name=, config=, output_model=, llm_config=) — full context
(template, data, node_name=, config=) — partial
(template, data) — minimal
(template, data, **kw) — accepts everything

No try/except at runtime. You can upgrade incrementally without breaking existing compilers.

Config injection

When you call run(), all fields from the input dict are automatically injected into config["configurable"]:

from neograph import run

result = run(
    graph,
    input={"node_id": "BR-042", "project_root": "/repo"},
    config={"configurable": {"rate_limiter": my_limiter}},
)

Inside every node, config["configurable"] contains:

{
    "node_id": "BR-042",
    "project_root": "/repo",
    "rate_limiter": my_limiter,  # from explicit config
}

Input fields take precedence over existing configurable values if there is a key conflict. This means your prompt compiler and LLM factory can access pipeline metadata (node_id, project_root) and shared resources (rate limiters, database connections) without any node reaching into state.

Shared resources via config

Put expensive-to-create resources in config["configurable"] and access them from your factory, compiler, or @node functions via FromConfig[T]:

from neograph import node, FromConfig

# At the call site
db_pool = create_connection_pool()
rate_limiter = TokenBucketLimiter(tokens_per_minute=100_000)

result = run(
    graph,
    input={"node_id": "BR-042"},
    config={
        "configurable": {
            "db_pool": db_pool,
            "rate_limiter": rate_limiter,
        }
    },
)

# In a scripted @node function
@node(output=ContextData)
def load_context(
    claims: Claims,
    db_pool: FromConfig[ConnectionPool],
    rate_limiter: FromConfig[RateLimiter],
) -> ContextData:
    rate_limiter.acquire()
    rows = db_pool.query("SELECT * FROM context WHERE id = %s", claims.id)
    return ContextData(rows=rows)

Output strategies

Many models (DeepSeek-R1, o1, QwQ, local models) don’t support with_structured_output. NeoGraph provides three output strategies, selected per-node via llm_config["output_strategy"]:

Strategy	How it works	Best for
`"structured"`	`llm.with_structured_output(model)`	OpenAI, Anthropic, Gemini
`"json_mode"`	LLM returns raw text, framework strips fences + parses JSON	DeepSeek, local models
`"text"`	LLM returns prose with embedded JSON, framework extracts it	Reasoning models (o1, R1)

structured (default)

The framework calls llm.with_structured_output(output_model). Works with any LangChain model that supports native structured output. The framework tries include_raw=True to capture token counts, falling back without it.

@node(output=Claims, prompt='rw/classify', model='fast')
def classify(topic: RawText) -> Claims: ...
# No output_strategy needed — "structured" is the default

json_mode

The framework calls llm.invoke() directly (no with_structured_output), then strips markdown code fences and parses the JSON into the Pydantic model. Works with any model that returns JSON in its text response.

@node(output=Claims, prompt='rw/decompose', model='reason',
      llm_config={"output_strategy": "json_mode"})
def decompose(topic: RawText) -> Claims: ...

The framework handles:

Markdown fence stripping (```json ... ```)
JSON object extraction from surrounding text
Pydantic model_validate_json for type-safe parsing

text

Same as json_mode — the framework extracts JSON from the LLM’s plain text response. Use this name to signal intent when the model returns prose with embedded JSON rather than fenced code blocks.

@node(output=Analysis, prompt='rw/analyze', model='reason',
      llm_config={"output_strategy": "text"})
def analyze(topic: RawText) -> Analysis: ...

Mixing strategies in one pipeline

Different nodes can use different strategies. This is the production pattern for pipelines that use multiple model providers:

from neograph import node

# DeepSeek for creative decomposition (no structured output support)
@node(output=Claims, prompt='rw/decompose', model='reason',
      llm_config={"temperature": 0.9, "output_strategy": "json_mode"})
def decompose(topic: RawText) -> Claims: ...

# Gemini for precise classification (native structured output)
@node(output=ClassifiedClaims, prompt='rw/classify', model='fast',
      llm_config={"temperature": 0, "output_strategy": "structured"})
def classify(decompose: Claims) -> ClassifiedClaims: ...

pipeline = construct_from_module(sys.modules[__name__])

Strategies in gather/execute modes

For ReAct modes, the output strategy applies to the final parsing step after the tool loop completes:

"structured": a separate with_structured_output call parses the final answer
"json_mode" / "text": the last message in the tool loop is parsed directly as JSON

@node(mode='gather', output=ResearchResult, prompt='rw/research', model='reason',
      tools=[Tool(name="search", budget=5)],
      llm_config={"output_strategy": "json_mode"})
def research(query: SearchQuery) -> ResearchResult: ...

Backward compatibility

Both the factory and compiler accept multiple signatures. The framework inspects your callable at configure_llm() time and passes only the kwargs it declares:

Callback	Accepted parameters
`llm_factory`	`tier` (required) — plus any of: `node_name`, `llm_config`
`prompt_compiler`	`template`, `data` (required) — plus any of: `node_name`, `config`, `output_model`, `llm_config`

Functions using **kwargs receive all parameters. This lets you start with a minimal lambda and add parameters later without breaking any node definitions.