Skip to content
Built by Postindustria. We help teams build agentic production systems.

2. LLM Think + Agent

This walkthrough introduces NeoGraph’s two primary LLM modes: think (single structured call) and agent (ReAct tool loop with budget enforcement). Together they form the most common two-step pattern: decompose a problem, then research it.

The pipeline is built with @node-decorated functions and construct_from_module. Mode is inferred from the kwargs: prompt= + model= means LLM call. Add tools= and it becomes an agent loop.

The example uses fake LLM and tool implementations so you can run it without API keys. Replace them with real ones for production.

decompose is a think node (single LLM call). research is an agent node (ReAct tool loop with budget). The dotted self-edge represents the tool-calling cycle inside the agent.

  • Configuring the LLM layer with configure_llm()
  • Using @node with prompt= and model= for LLM think mode
  • Using @node with mode="agent" and tools= for ReAct tool loops
  • Declaring tools with Tool(name, budget)
  • Registering tool factories with register_tool_factory()
  • Parameter-name wiring between LLM nodes
from pydantic import BaseModel
class Requirement(BaseModel, frozen=True):
text: str
class Claims(BaseModel, frozen=True):
items: list[str]
class ResearchResult(BaseModel, frozen=True):
findings: list[dict[str, str]]

The decompose node has prompt= and model= but no mode= — NeoGraph infers mode="think" (single LLM call, structured output). The function body is ... because the LLM handles execution.

The research node explicitly sets mode="agent" and adds tools=. It runs a ReAct loop: call the LLM, execute tool calls, feed results back, repeat until the budget is exhausted.

from neograph import node, Tool
@node(outputs=Claims, model="fast", prompt="req/decompose")
def decompose() -> Claims:
...
@node(
mode="agent",
outputs=ResearchResult,
model="reason",
prompt="req/research",
tools=[Tool(name="search_codebase", budget=2)],
)
def research(decompose: Claims) -> ResearchResult:
...

The parameter name decompose on research wires it to the upstream decompose node.

"""Think + Agent: decompose a requirement, then research with tools.
Run:
python 02_produce_and_gather.py
"""
from __future__ import annotations
import sys
from langchain_core.messages import AIMessage
from pydantic import BaseModel
from neograph import Tool, compile, configure_llm, construct_from_module, node, register_tool_factory, run
# -- Schemas ----------------------------------------------------------------
class Requirement(BaseModel, frozen=True):
text: str
class Claims(BaseModel, frozen=True):
items: list[str]
class ResearchResult(BaseModel, frozen=True):
findings: list[dict[str, str]]
# -- Fake LLM (replace with real OpenRouter/OpenAI in production) -----------
class FakeDecomposeLLM:
"""Simulates an LLM that decomposes a requirement into claims."""
def with_structured_output(self, model):
self._model = model
return self
def invoke(self, messages, **kwargs):
return self._model(items=[
"system shall authenticate users",
"system shall log failed attempts",
"system shall rate-limit login",
])
class FakeResearchLLM:
"""Simulates an LLM that calls a search tool, then responds."""
def __init__(self):
self._call_count = 0
def bind_tools(self, tools):
clone = FakeResearchLLM()
clone._call_count = self._call_count
clone._has_tools = len(tools) > 0
return clone
def invoke(self, messages, **kwargs):
self._call_count += 1
if getattr(self, "_has_tools", True) and self._call_count <= 3:
msg = AIMessage(content="")
msg.tool_calls = [{
"name": "search_codebase",
"args": {"query": f"claim-{self._call_count}"},
"id": f"call-{self._call_count}",
}]
return msg
return AIMessage(content="research complete")
def with_structured_output(self, model):
self._model = model
return self
# -- Fake tool --------------------------------------------------------------
search_count = {"n": 0}
class FakeSearchTool:
name = "search_codebase"
def invoke(self, args):
search_count["n"] += 1
return f"Found 3 references for: {args.get('query', '?')}"
register_tool_factory("search_codebase", lambda config, tool_config: FakeSearchTool())
# -- Configure LLM layer ---------------------------------------------------
def llm_factory(tier):
if tier == "fast":
return FakeDecomposeLLM()
return FakeResearchLLM()
configure_llm(
llm_factory=llm_factory,
prompt_compiler=lambda template, data: [{"role": "user", "content": "analyze"}],
)
# -- Pipeline nodes ---------------------------------------------------------
@node(outputs=Claims, model="fast", prompt="req/decompose")
def decompose() -> Claims:
...
@node(
mode="agent",
outputs=ResearchResult,
model="reason",
prompt="req/research",
tools=[Tool(name="search_codebase", budget=2)],
)
def research(decompose: Claims) -> ResearchResult:
...
pipeline = construct_from_module(sys.modules[__name__], name="requirement-analysis")
# -- Run --------------------------------------------------------------------
if __name__ == "__main__":
graph = compile(pipeline)
result = run(graph, input={"node_id": "REQ-042"})
print(f"Decomposed into {len(result['decompose'].items)} claims:")
for claim in result["decompose"].items:
print(f" - {claim}")
print(f"\nSearch tool called {search_count['n']} times (budget was 2)")
print(f"Research complete: {result['research'] is not None}")
Decomposed into 3 claims:
- system shall authenticate users
- system shall log failed attempts
- system shall rate-limit login
Search tool called 2 times (budget was 2)
Research complete: True

NeoGraph infers the execution mode from the kwargs you pass to @node:

Present kwargsInferred modeBehavior
prompt= + model=thinkSingle LLM call, structured JSON output
prompt= + model= + tools=agentReAct tool loop
NeitherscriptedFunction body runs as-is

You can also set mode= explicitly, which is required for agent if you want to be explicit about the distinction from think.

A think node makes a single LLM call and expects structured JSON back. The framework calls llm.with_structured_output(output_model) and parses the response into your Pydantic schema automatically.

@node(outputs=Claims, model="fast", prompt="req/decompose")
def decompose() -> Claims:
...

No tool loop. No message history management. One call, one typed result.

An agent node runs a ReAct loop: call the LLM, if it requests tool calls execute them, feed results back, repeat. The loop continues until the LLM responds without tool calls, or all tool budgets are exhausted.

@node(
mode="agent",
outputs=ResearchResult,
model="reason",
prompt="req/research",
tools=[Tool(name="search_codebase", budget=2)],
)
def research(decompose: Claims) -> ResearchResult:
...

Each Tool has a budget — the maximum number of calls allowed. When a tool’s budget is exhausted, the framework removes it from the LLM’s available tools. When all budgeted tools are spent, the LLM is forced to produce a final response.

This prevents runaway loops and controls API costs. Set budget=0 for unlimited calls.

Before using any LLM node, you must call configure_llm() with two functions:

configure_llm(
llm_factory=llm_factory, # (tier) -> BaseChatModel
prompt_compiler=prompt_compiler, # (template, input_data) -> list[BaseMessage]
)
  • llm_factory receives a tier string ("fast", "reason", etc.) and returns a LangChain chat model. In production, map tiers to real models (e.g., "fast" -> GPT-4o-mini, "reason" -> Claude Sonnet).
  • prompt_compiler receives a template name and the typed input data, and returns a message list. This is where you build your prompts.

Tool factories create tool instances at runtime. The factory receives the pipeline config and any per-tool config:

register_tool_factory(
"search_codebase",
lambda config, tool_config: MySearchTool(api_key=config["configurable"]["api_key"]),
)

This lets you create tools that depend on runtime context (API keys, rate limiters, database connections) without hardcoding them.


Documentation © 2025-2026 Constantine Mirin, mirin.pro. Licensed under CC BY-ND 4.0.