Skip to content

ADR-0001 DAG Workflow Support

Status

Accepted — Phases 1–4 implemented. Expert-validated via 5-agent fleet review, rubber duck critique, and expert consultation.

Implementation Status

Phase Status Notes
1 ✅ Complete Attributes, enums, IWorkflowFactory.CreateGraphWorkflow, reflection-based factory
2 ✅ Complete Source generator discovers graph attributes, emits Create{Name}GraphWorkflow() extensions, Mermaid subgraph. AgentGraphReducerAttribute defined.
3 ✅ Complete 12 diagnostics (NDLRMAF016–022, NDLRMAF024–025, NDLRMAF027–029) across 9 analyzer classes. 137 analyzer tests. Release tracking entries.
4 ✅ Complete IDagRunResult, IDagNodeResult, NodeKind, DagRunResult, DagNodeResult, ReducerNodeInvokedEvent. Docs updated. Runtime wiring for RoutingMode, JoinMode, reducers. End-to-end example app (GraphWorkflowApp). Per-node NodeRoutingMode override on AgentGraphEdgeAttribute. 12 analyzer doc pages. GraphTopologyRegistration DTO + AgentGraphTopologyRegistry source-gen emission + bootstrap registration for AOT-safe topology discovery. Run{Name}GraphWorkflowAsync generated extension method on IGraphWorkflowRunner. GraphNames constants class.

Known Limitations

  • WaitAny requires RunGraphAsync: GraphJoinMode.WaitAny is fully implemented via the RunGraphAsync extension method in NexusLabs.Needlr.AgentFramework.Workflows. However, it is not compatible with CreateGraphWorkflow (which returns a MAF Workflow using BSP execution). CreateGraphWorkflow throws NotSupportedException when WaitAny nodes are detected, directing users to RunGraphAsync. Analyzer NDLRMAF025 catches this at compile time — users see an error in their IDE before they ever run the code.
  • LlmChoice requires RunGraphAsync: GraphRoutingMode.LlmChoice uses the Needlr-native executor (RunGraphAsync) because MAF's BSP engine cannot handle LLM-driven edge selection. CreateGraphWorkflow does not support LlmChoice. When the graph-wide or any per-node routing mode is LlmChoice, RunGraphAsync automatically selects the Needlr-native executor.

Runtime Feature Coverage

All attribute properties are wired at runtime in the Needlr-native executor (RunGraphAsync) with test coverage:

Feature Status Tests
Condition-based routing ✅ Implemented Conditional edge skips branch, unconditional always fires, mixed behavior
IsRequired failure propagation ✅ Implemented Required failure kills graph, optional failure continues, mixed scenario
Reducer invocation ✅ Implemented Reducer called with branch outputs, return value passed downstream
WaitAny join mode ✅ Implemented Proceeds on first-to-complete, verified by timing
RoutingMode enforcement ✅ Implemented FirstMatching fires only first, ExclusiveChoice rejects zero/multiple matches
NodeRoutingMode override ✅ Implemented Per-node override takes precedence over graph-wide mode
Progress events ✅ Implemented AgentInvokedEvent, WorkflowStarted/CompletedEvent emitted during execution
LlmChoice routing ✅ Implemented LLM picks route from condition strings, unchosen branch skipped, no-match fallback

Context

Needlr's agent framework wraps Microsoft Agent Framework (MAF) with an attribute-driven, source-generated workflow model. Three workflow types exist today:

Workflow Type Attribute Factory Method Pattern
Handoff [AgentHandoffsTo] Create{Agent}HandoffWorkflow()
Group Chat [AgentGroupChatMember] Create{GroupName}GroupChatWorkflow()
Sequential [AgentSequenceMember] Create{PipelineName}SequentialWorkflow()

Each follows a consistent pattern: attributes on agent classes declare topology, a source generator (AgentFrameworkFunctionRegistryGenerator) emits IWorkflowFactory extension methods, and Roslyn analyzers (NDLRMAF001NDLRMAF015) provide compile-time validation.

MAF 1.1.0 introduced a graph-based execution model (WorkflowBuilder, Edge, Executor, FunctionExecutor<T>, DirectEdgeData, FanOutEdgeData, FanInEdgeData, SwitchBuilder) using superstep-based Bulk Synchronous Parallel (BSP) execution. This model enables directed acyclic graph (DAG) orchestration: conditional routing, fan-out/fan-in parallelism, and non-linear agent coordination. Needlr does not yet wrap this API.

Without DAG support, users who need branching, parallel, or conditional workflows must drop down to raw MAF APIs, losing Needlr's compile-time validation, diagnostics integration, and attribute-driven developer experience.

Existing infrastructure that this decision builds on:

  • Progress reporting: IProgressReporter with events for workflow lifecycle, agent invocation, LLM calls, tool calls, budget tracking, and superstep progression (SuperStepStartedProgressEvent, SuperStepCompletedProgressEvent).
  • Diagnostics: IPipelineRunResult with per-stage IAgentStageResult entries, token usage tracking via IAgentMetrics, and hierarchical scoping via BeginChildScope.
  • Source generator: Incremental generator with Models/ (metadata types like HandoffEntry, GroupChatEntry, SequenceEntry) and CodeGen/ (emitters like RegistryCodeGenerator, ExtensionsCodeGenerator, BootstrapCodeGenerator, TopologyGraphCodeGenerator).
  • Analyzer infrastructure: MafDiagnosticIds and MafDiagnosticDescriptors centralize diagnostic metadata. Existing analyzers cover cyclic handoffs, orphan detection, group chat validation, sequence ordering, and topology correctness.

Forces

  • Consistency: A 4th workflow type should follow the established attribute → generator → analyzer → factory pattern.
  • Determinism: Orchestration edges encode control flow. Nondeterministic routing by default undermines testability, replayability, and debuggability.
  • Expressiveness: DAGs introduce topological ambiguity that linear and group-chat workflows do not have. Three outgoing conditional edges could mean fan-out, switch-case, or priority routing — the topology alone does not disambiguate.
  • Progressive disclosure: Simple DAGs (linear chains, basic fan-out) should be simple to declare. Complex DAGs (conditional routing, LLM-driven choice, mixed fan-out/fan-in) should be possible without escaping to raw MAF.
  • MAF stability: MAF's graph API is relatively new (v1.1.0). The attribute model should insulate users from breaking changes in the underlying API.

Decision

Add a 4th workflow type — DAG/graph workflows — via [AgentGraphEdge] and [AgentGraphEntry] attributes, a [AgentGraphReducer] attribute for fan-in aggregation, source-generated Create{Name}GraphWorkflow() factory methods, and Roslyn analyzers (NDLRMAF016NDLRMAF022, NDLRMAF024NDLRMAF025, NDLRMAF027).

Core Attribute Design

Edges are declared on the source node (edge-on-source), consistent with [AgentHandoffsTo]:

[AgentGraphEntry("ResearchPipeline")]
[AgentGraphEdge("ResearchPipeline", typeof(WebResearchAgent), Condition = "needs-web")]
[AgentGraphEdge("ResearchPipeline", typeof(SummarizerAgent))]
public class AnalyzerAgent { }
  • [AgentGraphEntry] marks an agent as the entry point for a named graph. It is required and must be explicit — entry points are not inferred from topology.
  • [AgentGraphEdge] declares a directed edge from the decorated class to the specified target agent type, within a named graph.
  • The source generator emits Create{Name}GraphWorkflow() as an extension method on IWorkflowFactory.

Routing: Deterministic Default, LLM Opt-In

Deterministic routing is the default. Condition strings on [AgentGraphEdge] reference a named predicate method on the agent class by convention. The Roslyn analyzer (NDLRMAF028) validates the method exists with the correct signature at compile time. At runtime, the GraphEdgeRouter binds the method via reflection and evaluates it as a Func<object?, bool> predicate.

LLM-driven routing is an explicit opt-in via RoutingMode = GraphRoutingMode.LlmChoice on the entry point attribute. When active, condition strings become handoff-style tool descriptions — the agent's LLM selects which edge to follow. The routing decision is recorded in diagnostics for auditability.

Routing mode enum (GraphRoutingMode):

Mode Behavior MAF Mapping
Deterministic (default) Condition methods evaluated as boolean predicates Individual AddEdge with Func<object?, bool>
LlmChoice Condition strings as tool descriptions; model selects Handoff-style tool mapping
AllMatching All edges whose condition is true are followed (parallel fan-out) MAF's default behavior when multiple edge conditions pass
FirstMatching First edge whose condition is true is followed (priority order) Needlr abstraction — MAF has no ordered-priority routing; generator emits a composite condition wrapper
ExclusiveChoice Exactly one edge must match Needlr abstraction — composite condition validates exactly one match at runtime

RoutingMode is a property on [AgentGraphEntry] as the graph-wide default. Per-node routing overrides (via NodeRoutingMode on [AgentGraphEdge]) are enabled by the source generator's edge-grouping logic and the runtime's per-node effective routing mode computation.

Rationale: DAG edges encode orchestration rules. Making control flow nondeterministic by default undermines testability, replayability, and determinism. LLM routing is powerful but should be a conscious architectural choice.

Fan-Out/Fan-In: Inferred Shape, Explicit Semantics

Fan-out and fan-in shapes are inferred from topology (multiple outgoing edges = fan-out, multiple incoming edges = fan-in). However, the semantics are explicitly declared because topology alone is ambiguous.

Source-side routing (on [AgentGraphEntry] or per-node):

  • AllMatching — run all edges whose condition passes
  • FirstMatching — run the first edge whose condition passes
  • ExclusiveChoice — exactly one edge must match (analyzer error if ambiguous at compile time)
  • LlmChoice — LLM selects

Target-side join is declared via [AgentGraphNode] on the target agent:

[NeedlrAiAgent(Instructions = "Synthesize all findings.")]
[AgentGraphNode("ResearchPipeline", JoinMode = GraphJoinMode.WaitAll)]
public class SummarizerAgent { }
  • WaitAll (default) — barrier; wait for all incoming edges before proceeding
  • WaitAny — proceed when any incoming edge completes

[AgentGraphNode] is a lightweight per-node attribute separate from [AgentGraphEdge] to avoid overloading edge attributes with target-side semantics.

Rationale: Three conditional outgoing edges could mean fan-out, switch-case, or priority routing. The topology says "3 edges" but not what they mean. Explicit semantics prevent runtime ambiguity without requiring users to understand MAF's FanOutEdgeData vs DirectEdgeData distinction.

Non-Agent Nodes: Minimal Phase-1 Reducer

Full FunctionExecutor<T> support is deferred to phase 2. However, phase 1 includes a minimal deterministic reducer node for fan-in convergence, because fan-in without a reducer forces users to wrap pure aggregation logic in agent classes (paying LLM cost and latency for what should be a deterministic function).

[AgentGraphReducer("ResearchPipeline", ReducerMethod = nameof(MergeResults))]
public static class ResearchReducer
{
    public static string MergeResults(IReadOnlyList<string> branchOutputs)
        => string.Join("\n---\n", branchOutputs);
}

The reducer covers the most common fan-in pattern (aggregation of branch outputs) without requiring a full function-executor attribute model. If ReducerMethod is not specified, the convention default "Reduce" is used — the source generator and runtime will look for a public static string Reduce(IReadOnlyList<string>) method on the decorated class.

Diagnostics: Reducer nodes do NOT go through IChatClient pipelines (no LLM calls, no token usage). IDagNodeResult carries a NodeKind discriminator (Agent vs Reducer) so downstream tooling can distinguish agent nodes from pure-function nodes. A dedicated ReducerNodeInvokedEvent progress event carries NodeId, InputBranchCount, and Duration without the LLM-specific metadata of AgentInvokedEvent.

Streaming: Terminal Content, Full Progress

Content streaming is terminal-node only by default. Intermediate node output is internal processing — not user-facing.

Progress and observability use the existing IProgressReporter infrastructure, which already supports AgentInvokedEvent, LlmCallStartedEvent, ToolCallCompletedEvent, and SuperStepStartedProgressEvent. DAG-specific metadata is added as nullable properties to existing event types for backward compatibility:

Field Type Added To Purpose
GraphName string? All DAG progress events Identifies which named graph is executing
NodeId string? AgentInvokedEvent, ReducerNodeInvokedEvent Identifies the node within the graph
BranchId string? AgentInvokedEvent, ReducerNodeInvokedEvent Identifies the parallel branch
IncomingEdgeLabel string? AgentInvokedEvent The condition label of the activating edge
ParallelBranchCount int? SuperStepStartedProgressEvent Active parallel branches in this superstep

One new event type: ReducerNodeInvokedEvent for non-LLM reducer nodes (carries NodeId, BranchId, InputBranchCount, Duration without LLM-specific fields).

Rationale: Users need progress visibility for long-running DAGs, but that is a progress concern, not a streaming concern. The existing IProgressReporter and IProgressSink pipeline is extensible and already consumed by downstream tooling.

Failure Propagation: Per-Edge Required/Optional

Graph-wide failure modes (fail-fast vs continue-parallel) are too coarse for DAGs with heterogeneous branches. Instead, failure semantics are declared per-edge:

  • IsRequired = true (default): if this edge's target node fails, the entire graph fails.
  • IsRequired = false: if this edge's target node fails, the branch is marked degraded but parallel branches continue.

Completed branch outputs are always preserved in IDagRunResult.NodeResults — even when the graph fails, work already done is accessible. This avoids discarding expensive computation (e.g., a completed research branch) when an optional enrichment branch fails.

Diagnostics: IDagRunResult

IDagRunResult extends IPipelineRunResult with:

  • NodeResults — per-node diagnostics with edge metadata and timing offsets
  • BranchResults — parallel branch grouping with degraded/failed status
  • Flat Stages preserved for backward compatibility with existing IPipelineRunResult consumers
  • Token budget tracking via existing hierarchical scoping (BeginChildScope)

Analyzers

Twelve new diagnostics across nine analyzer classes, extending the existing NDLRMAF series:

ID Title Severity
NDLRMAF016 Cycle detected in agent graph Error
NDLRMAF017 Graph has no entry point Error
NDLRMAF018 Graph has multiple entry points Error
NDLRMAF019 Graph edge target is not a declared agent Error
NDLRMAF020 Graph edge source is not a declared agent Warning
NDLRMAF021 Graph entry point is not a declared agent Warning
NDLRMAF022 Graph contains unreachable agents Warning
~~NDLRMAF023~~ ~~MaxSupersteps value is invalid~~ Retired — property removed
NDLRMAF024 All edges from fan-out node are optional Warning
NDLRMAF025 CreateGraphWorkflow incompatible with WaitAny Error
~~NDLRMAF026~~ (reserved — do not reuse) Reserved
NDLRMAF027 Terminal node has outgoing edges Error
NDLRMAF028 Condition method has invalid signature Error
NDLRMAF029 Reducer method has invalid signature Error

NDLRMAF024 catches the pattern where all outgoing edges from a fan-out node have IsRequired = false — the graph could produce empty results if all optional branches fail. NDLRMAF025 catches calls to CreateGraphWorkflow on graphs that contain WaitAny nodes — CreateGraphWorkflow returns a MAF Workflow using BSP execution which only supports WaitAll; the fix is to use RunGraphAsync instead. NDLRMAF027 catches topology errors where a terminal node still has outgoing edges. NDLRMAF028 validates that Condition references on [AgentGraphEdge] point to a public static bool Method(object?) method. NDLRMAF029 validates that ReducerMethod references on [AgentGraphReducer] point to a public static string Method(IReadOnlyList<string>) method.

These follow the existing pattern in MafDiagnosticIds and MafDiagnosticDescriptors, extending the ID range from the current ceiling of NDLRMAF015.

Implementation Phases

Phase Scope Deliverables
1 Attributes + Runtime Factory AgentGraphEdgeAttribute, AgentGraphEntryAttribute, AgentGraphNodeAttribute, GraphRoutingMode/GraphJoinMode enums, IWorkflowFactory.CreateGraphWorkflow, WorkflowFactory graph support
2 Source Generator + Reducer + Mermaid Diagrams AgentGraphReducerAttribute, per-node RoutingMode override, GraphEdgeEntry/GraphEntryPointEntry/GraphNodeEntry/GraphReducerEntry models, TopologyGraphCodeGenerator Mermaid output, RegistryCodeGenerator.GenerateGraphTopologyRegistrySource, BootstrapCodeGenerator graph registration, GraphTopologyRegistration DTO, Run{Name}GraphWorkflowAsync extension, GraphNames constants
3 Analyzers + Release Tracking + Docs NDLRMAF016NDLRMAF025, NDLRMAF027NDLRMAF029, analyzer tests, XML doc comments, README updates
4 Diagnostics + Progress Events + Example App IDagRunResult, NodeKind discriminator, DAG-specific progress metadata, ReducerNodeInvokedEvent, RunGraphAsync extension (WaitAny via Needlr-native executor), Examples/ project demonstrating a research pipeline

Consequences

Positive

  • POS-001: Needlr covers all four MAF workflow patterns (Handoff, Group Chat, Sequential, DAG), eliminating the need for users to drop to raw MAF APIs for non-linear orchestration.
  • POS-002: Deterministic-first routing gives users testable, replayable, and debuggable orchestration by default. LLM-driven routing is available when intelligence-based decisions are genuinely needed.
  • POS-003: Compile-time validation via 9 new analyzer classes (12 diagnostics) catches topology errors (cycles, missing entry points, unreachable nodes) and method signature errors (condition/reducer methods) before runtime, consistent with the existing analyzer experience for other workflow types.
  • POS-004: Per-edge failure semantics (IsRequired) preserve expensive parallel work when optional branches fail, avoiding the all-or-nothing tradeoff of graph-wide failure modes.
  • POS-005: The attribute model insulates users from MAF's graph API surface (WorkflowBuilder, Edge, DirectEdgeData, FanOutEdgeData, etc.), providing a stable abstraction layer if MAF's API evolves.

Negative

  • NEG-001: Significant implementation surface across 4 phases: 4 new attributes, 2 enums, factory method updates, 9 analyzer classes (12 diagnostics), generator updates, a new diagnostics interface, progress event extensions, and a compile-time topology registry.
  • NEG-002: Routing mode (Deterministic, LlmChoice, AllMatching, FirstMatching) and join mode (WaitAll, WaitAny) add cognitive overhead compared to the simpler "just declare edges" model. Users must understand when each mode applies.
  • NEG-003: The phase-1 reducer ([AgentGraphReducer]) is a narrow solution covering only the aggregation pattern. Full FunctionExecutor<T> support is still needed for arbitrary non-agent computation nodes.
  • NEG-004: MAF's graph execution model uses superstep-based BSP, which may surprise users expecting event-driven or streaming DAG execution. Documentation must set clear expectations.

Alternatives Considered

Infer Entry Points from Topology

  • Description: Automatically identify the entry point as the node with zero incoming edges, rather than requiring an explicit [AgentGraphEntry] attribute.
  • Rejection Reason: Topological inference is fragile — a graph with multiple roots (e.g., parallel starting branches) would require disambiguation logic in the generator. Explicit declaration is consistent with the existing pattern where [AgentHandoffsTo] requires the user to mark the initial agent via the generic type parameter on CreateHandoffWorkflow<TInitialAgent>(). Explicit entry points also serve as documentation.

LLM Routing as the Default

  • Description: Default to LLM-driven routing where the agent's model decides which edge to follow based on condition strings as tool descriptions.
  • Rejection Reason: DAG edges encode orchestration control flow. Nondeterministic routing by default undermines testability, replayability, and determinism. Teams that need deterministic pipelines (compliance, financial, safety-critical) would be forced to opt out of the default. Deterministic-default with LLM opt-in respects both camps.

Graph-Wide Failure Mode Instead of Per-Edge

  • Description: Provide a single FailureMode property on [AgentGraphEntry] with FailFast and ContinueParallel options, rather than per-edge IsRequired semantics.
  • Rejection Reason: Graph-wide modes are too coarse. A research pipeline may have a required web-research branch and an optional sentiment-enrichment branch. FailFast would discard completed research if sentiment fails. ContinueParallel would silently ignore failures in the required branch. Per-edge semantics let users express "this branch is optional" without sacrificing safety for required paths.

Defer Entirely Until MAF Stabilizes

  • Description: Wait for MAF's graph API to mature beyond v1.1.0 before adding Needlr support, avoiding rework if the API changes.
  • Rejection Reason: Users need DAG support now for multi-agent research pipelines, content generation workflows, and conditional processing. The attribute model provides an insulation layer — if MAF's underlying API changes, only the factory/generator internals need updating, not the user-facing attribute surface. The risk of rework is contained.

Separate Package for Graph Workflows

  • Description: Create a new NexusLabs.Needlr.AgentFramework.Graph package rather than adding graph support to the existing AgentFramework, Generators, and Analyzers projects.
  • Rejection Reason: The three existing workflow types live in the core AgentFramework package with shared infrastructure (IWorkflowFactory, WorkflowFactory, generator, analyzers). A separate package would duplicate shared types, require cross-package analyzer coordination, and fragment the developer experience. Graph workflows should be a first-class citizen alongside the other three types.

Implementation Notes

  • IMP-001: New attributes (AgentGraphEdgeAttribute, AgentGraphEntryAttribute, AgentGraphReducerAttribute) should follow the established pattern: [AttributeUsage(AttributeTargets.Class, AllowMultiple = true, Inherited = false)], placed in src/NexusLabs.Needlr.AgentFramework/.
  • IMP-002: The source generator requires new model types (GraphEdgeEntry, GraphEntryPointEntry, GraphNodeEntry, GraphReducerEntry) in Generators/Models/ and Mermaid emission in TopologyGraphCodeGenerator in Generators/CodeGen/, following the existing HandoffEntry/GroupChatEntry/SequenceEntry pattern. The registry emitter (RegistryCodeGenerator.GenerateGraphTopologyRegistrySource) produces AgentGraphTopologyRegistry which is registered in the bootstrap ModuleInitializer.
  • IMP-003: Analyzer IDs NDLRMAF016NDLRMAF029 (with 023 retired and 026 reserved) must be registered in MafDiagnosticIds.cs and MafDiagnosticDescriptors.cs. Cycle detection (NDLRMAF016) requires a topological sort or DFS-based algorithm operating on the Roslyn syntax/semantic model. Condition method validation (NDLRMAF028) and reducer method validation (NDLRMAF029) walk the type hierarchy for inherited method resolution.
  • IMP-004: The WorkflowFactory needs a new code path for graph construction that maps [AgentGraphEdge] topology to MAF's WorkflowBuilder API, translating routing modes to the appropriate edge data types (DirectEdgeData, FanOutEdgeData, FanInEdgeData).
  • IMP-005: Success criteria — the feature is correct when: (a) a DAG declared entirely via attributes compiles, runs, and produces the expected output; (b) all 9 analyzer classes (12 diagnostics) fire on invalid topologies with no false positives on valid ones; (c) IDagRunResult captures per-node diagnostics with timing and token usage; (d) existing workflow types are unaffected (no regressions in handoff, group chat, or sequential tests).

References

  • REF-001: MAF graph API source (v1.1.0)WorkflowBuilder, Edge, Executor, FunctionExecutor<T>, DirectEdgeData, FanOutEdgeData, FanInEdgeData
  • REF-002: Existing workflow attributes — src/NexusLabs.Needlr.AgentFramework/AgentHandoffsToAttribute.cs, AgentGroupChatMemberAttribute.cs, AgentSequenceMemberAttribute.cs
  • REF-003: Existing workflow factory — src/NexusLabs.Needlr.AgentFramework/IWorkflowFactory.cs, WorkflowFactory.cs
  • REF-004: Existing source generator — src/NexusLabs.Needlr.AgentFramework.Generators/AgentFrameworkFunctionRegistryGenerator.cs, CodeGen/, Models/
  • REF-005: Existing analyzers — src/NexusLabs.Needlr.AgentFramework.Analyzers/MafDiagnosticIds.cs (IDs NDLRMAF001NDLRMAF015)
  • REF-006: Progress infrastructure — src/NexusLabs.Needlr.AgentFramework/Progress/IProgressReporter.cs, IProgressEvent.cs
  • REF-007: Diagnostics infrastructure — src/NexusLabs.Needlr.AgentFramework/Diagnostics/IPipelineRunResult.cs, IAgentStageResult.cs
  • REF-008: MAF package version — src/Directory.Packages.props pins Microsoft.Agents.AI.Workflows at 1.1.0