ToolCallTrajectoryEvaluator
NexusLabs.Needlr.AgentFramework.Evaluation¶
ToolCallTrajectoryEvaluator Class¶
Deterministic evaluator that scores the tool-call trajectory of an agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.
Inheritance System.Object 🡒 ToolCallTrajectoryEvaluator
Implements Microsoft.Extensions.AI.Evaluation.IEvaluator
Remarks¶
This evaluator never contacts a language model. It reads the ordered
NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics.ToolCalls collection and produces:
- Tool Calls Total — total number of tool invocations.
- Tool Calls Failed — count of tool invocations whose NexusLabs.Needlr.AgentFramework.Diagnostics.ToolCallDiagnostics.Succeeded is false.
- Tool Call Sequence Gaps — number of missing slots in the NexusLabs.Needlr.AgentFramework.Diagnostics.ToolCallDiagnostics.Sequence stream (a strictly increasing sequence starting at 0 has zero gaps).
- All Tool Calls Succeeded — boolean rollup. true when every tool invocation succeeded (or when no tool calls occurred).
- Consecutive Same-Tool Calls — count of consecutive tool invocations with the same NexusLabs.Needlr.AgentFramework.Diagnostics.ToolCallDiagnostics.ToolName. Useful as a heuristic for stuck or looping agents. Note: parallel fan-out to the same tool (valid usage) will also increment this counter.
- Per-Tool Failure Rate — JSON string mapping each tool name to its failure rate (0.0–1.0), sorted alphabetically.
- Tool Call Latency P50 — 50th percentile of tool call durations in milliseconds (nearest-rank method).
- Tool Call Latency P95 — 95th percentile of tool call durations in milliseconds (nearest-rank method).
When no AgentRunDiagnosticsContext is present in the
additionalContext collection, the evaluator returns an empty
Microsoft.Extensions.AI.Evaluation.EvaluationResult — callers should treat that as "not applicable".
Fields¶
ToolCallTrajectoryEvaluator.AllSucceededMetricName Field¶
Metric name for the boolean rollup indicating every tool call succeeded.
Field Value¶
ToolCallTrajectoryEvaluator.ConsecutiveSameToolMetricName Field¶
Metric name for the count of consecutive tool calls with the same tool name.
Field Value¶
ToolCallTrajectoryEvaluator.FailedMetricName Field¶
Metric name for the failed tool-call count.
Field Value¶
ToolCallTrajectoryEvaluator.LatencyP50MetricName Field¶
Metric name for the 50th percentile tool-call latency in milliseconds.
Field Value¶
ToolCallTrajectoryEvaluator.LatencyP95MetricName Field¶
Metric name for the 95th percentile tool-call latency in milliseconds.
Field Value¶
ToolCallTrajectoryEvaluator.PerToolFailureRateMetricName Field¶
Metric name for the JSON-formatted per-tool failure rate breakdown.
Field Value¶
ToolCallTrajectoryEvaluator.SequenceGapsMetricName Field¶
Metric name for the number of gaps in the recorded tool-call sequence.
Field Value¶
ToolCallTrajectoryEvaluator.TotalMetricName Field¶
Metric name for the total tool-call count.
Field Value¶
Properties¶
ToolCallTrajectoryEvaluator.EvaluationMetricNames Property¶
Gets the Microsoft.Extensions.AI.Evaluation.EvaluationMetric.Names of the Microsoft.Extensions.AI.Evaluation.EvaluationMetrics produced by this Microsoft.Extensions.AI.Evaluation.IEvaluator.
Implements EvaluationMetricNames
Property Value¶
System.Collections.Generic.IReadOnlyCollection<System.String>
Methods¶
ToolCallTrajectoryEvaluator.EvaluateAsync(IEnumerable<ChatMessage>, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken) Method¶
Evaluates the supplied modelResponse and returns an Microsoft.Extensions.AI.Evaluation.EvaluationResult containing one or more Microsoft.Extensions.AI.Evaluation.EvaluationMetrics.
public System.Threading.Tasks.ValueTask<Microsoft.Extensions.AI.Evaluation.EvaluationResult> EvaluateAsync(System.Collections.Generic.IEnumerable<Microsoft.Extensions.AI.ChatMessage> messages, Microsoft.Extensions.AI.ChatResponse modelResponse, Microsoft.Extensions.AI.Evaluation.ChatConfiguration? chatConfiguration=null, System.Collections.Generic.IEnumerable<Microsoft.Extensions.AI.Evaluation.EvaluationContext>? additionalContext=null, System.Threading.CancellationToken cancellationToken=default(System.Threading.CancellationToken));
Parameters¶
messages System.Collections.Generic.IEnumerable<Microsoft.Extensions.AI.ChatMessage>
The conversation history including the request that produced the supplied modelResponse.
modelResponse Microsoft.Extensions.AI.ChatResponse
The response that is to be evaluated.
chatConfiguration Microsoft.Extensions.AI.Evaluation.ChatConfiguration
A Microsoft.Extensions.AI.Evaluation.ChatConfiguration that specifies the Microsoft.Extensions.AI.IChatClient that should be used if one or more composed Microsoft.Extensions.AI.Evaluation.IEvaluators use an AI model to perform evaluation.
additionalContext System.Collections.Generic.IEnumerable<Microsoft.Extensions.AI.Evaluation.EvaluationContext>
Additional contextual information (beyond that which is available in messages) that the Microsoft.Extensions.AI.Evaluation.IEvaluator may need to accurately evaluate the supplied modelResponse.
cancellationToken System.Threading.CancellationToken
A System.Threading.CancellationToken that can cancel the evaluation operation.
Returns¶
System.Threading.Tasks.ValueTask<Microsoft.Extensions.AI.Evaluation.EvaluationResult>
An Microsoft.Extensions.AI.Evaluation.EvaluationResult containing one or more Microsoft.Extensions.AI.Evaluation.EvaluationMetrics.
Remarks¶
The Microsoft.Extensions.AI.Evaluation.EvaluationMetric.Names of the Microsoft.Extensions.AI.Evaluation.EvaluationMetrics contained in the returned Microsoft.Extensions.AI.Evaluation.EvaluationResult should match Microsoft.Extensions.AI.Evaluation.IEvaluator.EvaluationMetricNames.
Also note that chatConfiguration must not be omitted if the evaluation is performed using an AI model.