NexusLabs.Needlr.AgentFramework.Evaluation

NexusLabs.Needlr.AgentFramework.Evaluation Namespace¶

Classes
AgentRunDiagnosticsContext	Carries an NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot through the `Microsoft.Extensions.AI.Evaluation` evaluator pipeline so that Needlr-native deterministic evaluators can score execution-mode, tool-call trajectory, and termination behaviour without being re-invoked against the LLM.
AgentRunDiagnosticsEvaluationExtensions	Extensions that convert NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics into the input shape expected by `Microsoft.Extensions.AI.Evaluation` evaluators.
EfficiencyEvaluator	Deterministic evaluator that scores the token efficiency and cost profile of an agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
EvaluationCaptureChatClient	Microsoft.Extensions.AI.DelegatingChatClient that persists every LLM request/response pair to an IEvaluationCaptureStore and replays cached responses on subsequent calls with an identical request. Intended to make evaluator runs deterministic and cheap to re-execute.
EvaluationCaptureChatClientExtensions	Extension methods for opting in to EvaluationCaptureChatClient capture/replay behavior.
EvaluationQualityGate	Configurable quality gate that asserts evaluation metrics meet defined thresholds. Designed for CI pipelines — call Assert(EvaluationResult[]) after running evaluators to fail the build when metrics regress.
FileEvaluationCaptureStore	Disk-backed IEvaluationCaptureStore that persists each response as a single JSON file under a caller-supplied directory. File names are the request hash plus a `.json` extension.
IterationCoherenceEvaluator	Deterministic evaluator that scores the iteration coherence of an iterative-loop agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
PipelineCostEvaluator	Deterministic evaluator that scores token usage and cost breakdown per stage of a pipeline run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IPipelineRunResult snapshot carried in a PipelineEvaluationContext.
PipelineEvaluationContext	Carries an NexusLabs.Needlr.AgentFramework.Diagnostics.IPipelineRunResult through the `Microsoft.Extensions.AI.Evaluation` evaluator pipeline so that pipeline-aware evaluators can score per-stage and aggregate metrics without re-invoking the model.
PipelineEvaluationExtensions	Extensions that convert NexusLabs.Needlr.AgentFramework.Diagnostics.IPipelineRunResult into the input shape expected by `Microsoft.Extensions.AI.Evaluation` evaluators.
PipelineStageEvaluator	Deterministic evaluator that scores per-stage success/failure and overall pipeline health from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IPipelineRunResult snapshot carried in a PipelineEvaluationContext.
QualityGateFailedException	Thrown by Assert(EvaluationResult[]) when one or more evaluation metrics violate their configured thresholds.
TaskCompletionEvaluator	LLM-judged evaluator that assesses whether an agent actually accomplished the task it was given. Unlike MEAI's `TaskAdherenceEvaluator` (which checks instruction following), this evaluator checks \<em>task success\</em>: did the agent produce output that satisfies the original request?
TerminationAppropriatenessEvaluator	Deterministic evaluator that scores whether an agent run terminated appropriately, using the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
ToolCallTrajectoryEvaluator	Deterministic evaluator that scores the tool-call trajectory of an agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.

Structs
EvaluationInputs	Inputs shaped for `Microsoft.Extensions.AI.Evaluation` evaluators, derived from a captured agent run. Consumers pass Messages and ModelResponse to `IEvaluator.EvaluateAsync` (or to a `ScenarioRun` obtained via `ReportingConfiguration.CreateScenarioRunAsync`).

Interfaces
IEvaluationCaptureStore	Persists captured Microsoft.Extensions.AI.ChatResponse payloads keyed by a deterministic request hash so that evaluator runs can replay previously observed LLM responses without re-invoking the underlying model.