Skip to content
Built by Postindustria. We help teams build agentic production systems.

Observability

NeoGraph provides observability at two layers. The framework layer (structlog) is always on and captures every node execution, LLM call, tool invocation, and budget event. The tracing layer (Langfuse or any LangChain callback) is opt-in and provides deep LLM call traces with token-level detail.

NeoGraph does not own the observability backend. It emits structured logs and threads callback configuration through the LangGraph runtime. You choose where the data goes.

Every NeoGraph operation emits structured log events via structlog. This happens automatically — no configuration required.

EventFieldsWhen
compile_startconstruct, nodes, node_names, modifierscompile() is called
compile_completeconstruct, state_fieldsGraph compilation finishes
subgraph_compilesubgraph, input, outputSub-Construct compiled
node_startnode, mode, model, prompt, tools, budgetsNode execution begins
node_completenode, mode, duration_sNode execution finishes
llm_calltier, prompt, mode, duration_s, input_tokens, output_tokens, total_tokensLLM call completes
tool_calltool, call_num, duration_sIndividual tool invocation
tool_budget_exhaustedtoolTool hits its budget cap
all_tools_exhaustedexhausted, forcing_responseAll budgeted tools spent
react_final_responseloopReAct loop produces final answer

A produce node emits something like this (formatted as JSON for clarity):

{
"node": "classify",
"mode": "produce",
"model": "reason",
"prompt": "rw/classify",
"event": "node_start",
"output_type": "ClassifiedClaims"
}
{
"tier": "reason",
"prompt": "rw/classify",
"mode": "produce",
"duration_s": 1.847,
"input_tokens": 2340,
"output_tokens": 512,
"total_tokens": 2852,
"event": "llm_call"
}
{
"node": "classify",
"mode": "produce",
"duration_s": 1.851,
"event": "node_complete"
}

A gather node with tools adds tool-level detail:

{
"node": "explore",
"mode": "gather",
"model": "reason",
"prompt": "rw/explore",
"tools": ["search_nodes", "read_artifact"],
"budgets": {"search_nodes": 5, "read_artifact": 10},
"event": "node_start"
}
{
"tool": "search_nodes",
"call_num": 1,
"duration_s": 0.234,
"event": "tool_call"
}
{
"tool": "search_nodes",
"event": "tool_budget_exhausted"
}
{
"tier": "reason",
"prompt": "rw/explore",
"mode": "react",
"loops": 4,
"tool_calls": 7,
"duration_s": 8.312,
"input_tokens": 12400,
"output_tokens": 1890,
"total_tokens": 14290,
"output": "ResearchFindings",
"event": "llm_call"
}

NeoGraph calls structlog.get_logger() — it does not configure structlog itself. You configure structlog in your application the way you normally would:

import structlog
structlog.configure(
processors=[
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.add_log_level,
structlog.dev.ConsoleRenderer(), # pretty for dev
# or structlog.processors.JSONRenderer() for prod
],
)

If you do not configure structlog, it uses its defaults (which are reasonable for development).

Use FromConfig[T] to inject trace providers, span managers, or any shared observability resource into scripted nodes without threading them through state:

from neograph import node, FromConfig
@node(output=Report)
def summarize(
claims: Claims,
tracer: FromConfig[LangfuseTracer],
) -> Report:
with tracer.span("summarize-claims"):
result = Report(summary=f"{len(claims.items)} claims")
return result

At runtime, tracer is resolved from config["configurable"]["tracer"]. Pass it when calling run():

from langfuse import Langfuse
langfuse = Langfuse()
tracer = langfuse.trace(name="ingestion-pipeline")
result = run(
graph,
input={"node_id": "BR-042"},
config={"configurable": {"tracer": tracer}},
)

This pattern works for any shared resource — rate limiters, database pools, metrics collectors — not just tracing. The key is that FromConfig[T] reads from config["configurable"], which is propagated to every node by the LangGraph runtime.

All shared resources and runtime metadata flow through config["configurable"]. When you call run(), input fields are automatically injected there too:

result = run(
graph,
input={"node_id": "BR-042"},
config={
"configurable": {
"tracer": langfuse_tracer,
"rate_limiter": my_limiter,
}
},
)
# Inside every node, config["configurable"] contains:
# {
# "node_id": "BR-042", # from input
# "tracer": <LangfuseTracer>, # from config
# "rate_limiter": <Limiter>, # from config
# }

This means your prompt compiler, LLM factory, and scripted functions can all access pipeline metadata and shared resources uniformly.

Langfuse provides deep LLM call tracing — prompt/completion pairs, token counts, latency waterfalls, cost tracking. NeoGraph supports it (and any LangChain-compatible callback) through the standard config["callbacks"] mechanism.

from langfuse.langchain import CallbackHandler
from neograph import compile, run
# Reads LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST from env
langfuse_handler = CallbackHandler()
graph = compile(pipeline)
result = run(
graph,
input={"node_id": "BR-042"},
config={"callbacks": [langfuse_handler]},
)

Every LLM call inside every node — produce, gather, execute, and even Oracle merge calls — will appear in your Langfuse dashboard with full prompt/completion detail, token counts, and timing.

NeoGraph does not integrate with Langfuse directly. The mechanism is LangGraph’s standard callback threading:

  1. You pass callbacks=[langfuse_handler] in the config dict to run().
  2. LangGraph propagates that config to every node invocation.
  3. NeoGraph’s LLM layer calls llm.invoke(messages, config=config), which passes the callbacks to the underlying LangChain model.
  4. The LangChain model fires callback events that Langfuse (or any other handler) captures.

This means any LangChain-compatible callback works. If you use LangSmith, Weights & Biases, or a custom handler, pass it the same way:

config = {"callbacks": [my_langsmith_handler, my_custom_handler]}
result = run(graph, input={...}, config=config)

The two layers complement each other:

  • structlog gives you the framework-level view: which node ran, how long it took, how many tool calls, total tokens. Useful for pipeline debugging and operational monitoring.
  • Langfuse/callbacks give you the LLM-level view: exact prompts, completions, per-call token breakdown, cost. Useful for prompt engineering and model evaluation.

Both run simultaneously. structlog is always on; callbacks activate only when you pass them.


Documentation © 2025-2026 Constantine Mirin, mirin.pro. Licensed under CC BY-ND 4.0.