Observability

NeoGraph provides observability at two layers. The framework layer (structlog) is always on and captures every node execution, LLM call, tool invocation, and budget event. The tracing layer (Langfuse or any LangChain callback) is opt-in and provides deep LLM call traces with token-level detail.

NeoGraph does not own the observability backend. It emits structured logs and threads callback configuration through the LangGraph runtime. You choose where the data goes.

structlog: always on

Every NeoGraph operation emits structured log events via structlog. This happens automatically — no configuration required.

What gets logged

Event	Fields	When
`compile_start`	`construct`, `nodes`, `node_names`, `modifiers`	`compile()` is called
`compile_complete`	`construct`, `state_fields`	Graph compilation finishes
`subgraph_compile`	`subgraph`, `input`, `output`	Sub-Construct compiled
`node_start`	`node`, `mode`, `model`, `prompt`, `tools`, `budgets`	Node execution begins
`node_complete`	`node`, `mode`, `duration_s`	Node execution finishes
`llm_call`	`tier`, `prompt`, `mode`, `duration_s`, `input_tokens`, `output_tokens`, `total_tokens`	LLM call completes
`tool_call`	`tool`, `call_num`, `duration_s`	Individual tool invocation
`tool_budget_exhausted`	`tool`	Tool hits its budget cap
`all_tools_exhausted`	`exhausted`, `forcing_response`	All budgeted tools spent
`react_final_response`	`loop`	ReAct loop produces final answer

Example output

A produce node emits something like this (formatted as JSON for clarity):

{
  "node": "classify",
  "mode": "produce",
  "model": "reason",
  "prompt": "rw/classify",
  "event": "node_start",
  "output_type": "ClassifiedClaims"
}
{
  "tier": "reason",
  "prompt": "rw/classify",
  "mode": "produce",
  "duration_s": 1.847,
  "input_tokens": 2340,
  "output_tokens": 512,
  "total_tokens": 2852,
  "event": "llm_call"
}
{
  "node": "classify",
  "mode": "produce",
  "duration_s": 1.851,
  "event": "node_complete"
}

A gather node with tools adds tool-level detail:

{
  "node": "explore",
  "mode": "gather",
  "model": "reason",
  "prompt": "rw/explore",
  "tools": ["search_nodes", "read_artifact"],
  "budgets": {"search_nodes": 5, "read_artifact": 10},
  "event": "node_start"
}
{
  "tool": "search_nodes",
  "call_num": 1,
  "duration_s": 0.234,
  "event": "tool_call"
}
{
  "tool": "search_nodes",
  "event": "tool_budget_exhausted"
}
{
  "tier": "reason",
  "prompt": "rw/explore",
  "mode": "react",
  "loops": 4,
  "tool_calls": 7,
  "duration_s": 8.312,
  "input_tokens": 12400,
  "output_tokens": 1890,
  "total_tokens": 14290,
  "output": "ResearchFindings",
  "event": "llm_call"
}

Configuring structlog

NeoGraph calls structlog.get_logger() — it does not configure structlog itself. You configure structlog in your application the way you normally would:

import structlog

structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.processors.add_log_level,
        structlog.dev.ConsoleRenderer(),  # pretty for dev
        # or structlog.processors.JSONRenderer() for prod
    ],
)

If you do not configure structlog, it uses its defaults (which are reasonable for development).

Injecting trace providers via FromConfig

Use FromConfig[T] to inject trace providers, span managers, or any shared observability resource into scripted nodes without threading them through state:

from neograph import node, FromConfig

@node(output=Report)
def summarize(
    claims: Claims,
    tracer: FromConfig[LangfuseTracer],
) -> Report:
    with tracer.span("summarize-claims"):
        result = Report(summary=f"{len(claims.items)} claims")
    return result

At runtime, tracer is resolved from config["configurable"]["tracer"]. Pass it when calling run():

from langfuse import Langfuse

langfuse = Langfuse()
tracer = langfuse.trace(name="ingestion-pipeline")

result = run(
    graph,
    input={"node_id": "BR-042"},
    config={"configurable": {"tracer": tracer}},
)

This pattern works for any shared resource — rate limiters, database pools, metrics collectors — not just tracing. The key is that FromConfig[T] reads from config["configurable"], which is propagated to every node by the LangGraph runtime.

The config[“configurable”] pattern

All shared resources and runtime metadata flow through config["configurable"]. When you call run(), input fields are automatically injected there too:

result = run(
    graph,
    input={"node_id": "BR-042"},
    config={
        "configurable": {
            "tracer": langfuse_tracer,
            "rate_limiter": my_limiter,
        }
    },
)

# Inside every node, config["configurable"] contains:
# {
#     "node_id": "BR-042",        # from input
#     "tracer": <LangfuseTracer>, # from config
#     "rate_limiter": <Limiter>,  # from config
# }

This means your prompt compiler, LLM factory, and scripted functions can all access pipeline metadata and shared resources uniformly.

Langfuse: opt-in LLM tracing

Langfuse provides deep LLM call tracing — prompt/completion pairs, token counts, latency waterfalls, cost tracking. NeoGraph supports it (and any LangChain-compatible callback) through the standard config["callbacks"] mechanism.

One-line setup

from langfuse.langchain import CallbackHandler
from neograph import compile, run

# Reads LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST from env
langfuse_handler = CallbackHandler()

graph = compile(pipeline)

result = run(
    graph,
    input={"node_id": "BR-042"},
    config={"callbacks": [langfuse_handler]},
)

Every LLM call inside every node — produce, gather, execute, and even Oracle merge calls — will appear in your Langfuse dashboard with full prompt/completion detail, token counts, and timing.

How it works

NeoGraph does not integrate with Langfuse directly. The mechanism is LangGraph’s standard callback threading:

You pass callbacks=[langfuse_handler] in the config dict to run().
LangGraph propagates that config to every node invocation.
NeoGraph’s LLM layer calls llm.invoke(messages, config=config), which passes the callbacks to the underlying LangChain model.
The LangChain model fires callback events that Langfuse (or any other handler) captures.

This means any LangChain-compatible callback works. If you use LangSmith, Weights & Biases, or a custom handler, pass it the same way:

config = {"callbacks": [my_langsmith_handler, my_custom_handler]}
result = run(graph, input={...}, config=config)

Combining both layers

The two layers complement each other:

structlog gives you the framework-level view: which node ran, how long it took, how many tool calls, total tokens. Useful for pipeline debugging and operational monitoring.
Langfuse/callbacks give you the LLM-level view: exact prompts, completions, per-call token breakdown, cost. Useful for prompt engineering and model evaluation.

Both run simultaneously. structlog is always on; callbacks activate only when you pass them.