Skip to content

Index

NexusLabs.Needlr.AgentFramework.Evaluation Namespace

Classes
AgentRunDiagnosticsContext Carries an NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot through the Microsoft.Extensions.AI.Evaluation evaluator pipeline so that Needlr-native deterministic evaluators can score execution-mode, tool-call trajectory, and termination behaviour without being re-invoked against the LLM.
AgentRunDiagnosticsEvaluationExtensions Extensions that convert NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics into the input shape expected by Microsoft.Extensions.AI.Evaluation evaluators.
EvaluationCaptureChatClient Microsoft.Extensions.AI.DelegatingChatClient that persists every LLM request/response pair to an IEvaluationCaptureStore and replays cached responses on subsequent calls with an identical request. Intended to make evaluator runs deterministic and cheap to re-execute.
EvaluationCaptureChatClientExtensions Extension methods for opting in to EvaluationCaptureChatClient capture/replay behavior.
FileEvaluationCaptureStore Disk-backed IEvaluationCaptureStore that persists each response as a single JSON file under a caller-supplied directory. File names are the request hash plus a .json extension.
IterationCoherenceEvaluator Deterministic evaluator that scores the iteration coherence of an iterative-loop agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
TerminationAppropriatenessEvaluator Deterministic evaluator that scores whether an agent run terminated appropriately, using the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
ToolCallTrajectoryEvaluator Deterministic evaluator that scores the tool-call trajectory of an agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
Structs
EvaluationInputs Inputs shaped for Microsoft.Extensions.AI.Evaluation evaluators, derived from a captured agent run. Consumers pass Messages and ModelResponse to IEvaluator.EvaluateAsync (or to a ScenarioRun obtained via ReportingConfiguration.CreateScenarioRunAsync).
Interfaces
IEvaluationCaptureStore Persists captured Microsoft.Extensions.AI.ChatResponse payloads keyed by a deterministic request hash so that evaluator runs can replay previously observed LLM responses without re-invoking the underlying model.