Skip to content

NexusLabs.Needlr.AgentFramework.Evaluation

NexusLabs.Needlr.AgentFramework.Evaluation Namespace

Classes
AgentRunDiagnosticsContext Carries an NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot through the Microsoft.Extensions.AI.Evaluation evaluator pipeline so that Needlr-native deterministic evaluators can score execution-mode, tool-call trajectory, and termination behaviour without being re-invoked against the LLM.
AgentRunDiagnosticsEvaluationExtensions Extensions that convert NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics into the input shape expected by Microsoft.Extensions.AI.Evaluation evaluators.
EfficiencyEvaluator Deterministic evaluator that scores the token efficiency and cost profile of an agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
EvaluationCaptureChatClient Microsoft.Extensions.AI.DelegatingChatClient that persists every LLM request/response pair to an IEvaluationCaptureStore and replays cached responses on subsequent calls with an identical request. Intended to make evaluator runs deterministic and cheap to re-execute.
EvaluationCaptureChatClientExtensions Extension methods for opting in to EvaluationCaptureChatClient capture/replay behavior.
EvaluationQualityGate Configurable quality gate that asserts evaluation metrics meet defined thresholds. Designed for CI pipelines — call Assert(EvaluationResult[]) after running evaluators to fail the build when metrics regress.
FileEvaluationCaptureStore Disk-backed IEvaluationCaptureStore that persists each response as a single JSON file under a caller-supplied directory. File names are the request hash plus a .json extension.
IterationCoherenceEvaluator Deterministic evaluator that scores the iteration coherence of an iterative-loop agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
PipelineCostEvaluator Deterministic evaluator that scores token usage and cost breakdown per stage of a pipeline run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IPipelineRunResult snapshot carried in a PipelineEvaluationContext.
PipelineEvaluationContext Carries an NexusLabs.Needlr.AgentFramework.Diagnostics.IPipelineRunResult through the Microsoft.Extensions.AI.Evaluation evaluator pipeline so that pipeline-aware evaluators can score per-stage and aggregate metrics without re-invoking the model.
PipelineEvaluationExtensions Extensions that convert NexusLabs.Needlr.AgentFramework.Diagnostics.IPipelineRunResult into the input shape expected by Microsoft.Extensions.AI.Evaluation evaluators.
PipelineStageEvaluator Deterministic evaluator that scores per-stage success/failure and overall pipeline health from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IPipelineRunResult snapshot carried in a PipelineEvaluationContext.
QualityGateFailedException Thrown by Assert(EvaluationResult[]) when one or more evaluation metrics violate their configured thresholds.
TaskCompletionEvaluator LLM-judged evaluator that assesses whether an agent actually accomplished the task it was given. Unlike MEAI's TaskAdherenceEvaluator (which checks instruction following), this evaluator checks \<em>task success\</em>: did the agent produce output that satisfies the original request?
TerminationAppropriatenessEvaluator Deterministic evaluator that scores whether an agent run terminated appropriately, using the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
ToolCallTrajectoryEvaluator Deterministic evaluator that scores the tool-call trajectory of an agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
Structs
EvaluationInputs Inputs shaped for Microsoft.Extensions.AI.Evaluation evaluators, derived from a captured agent run. Consumers pass Messages and ModelResponse to IEvaluator.EvaluateAsync (or to a ScenarioRun obtained via ReportingConfiguration.CreateScenarioRunAsync).
Interfaces
IEvaluationCaptureStore Persists captured Microsoft.Extensions.AI.ChatResponse payloads keyed by a deterministic request hash so that evaluator runs can replay previously observed LLM responses without re-invoking the underlying model.