Skip to content

ToolCallTrajectoryEvaluator

NexusLabs.Needlr.AgentFramework.Evaluation

ToolCallTrajectoryEvaluator Class

Deterministic evaluator that scores the tool-call trajectory of an agent run from the captured NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics snapshot carried in an AgentRunDiagnosticsContext.

public sealed class ToolCallTrajectoryEvaluator : Microsoft.Extensions.AI.Evaluation.IEvaluator

Inheritance System.Object 🡒 ToolCallTrajectoryEvaluator

Implements Microsoft.Extensions.AI.Evaluation.IEvaluator

Remarks

This evaluator never contacts a language model. It reads the ordered NexusLabs.Needlr.AgentFramework.Diagnostics.IAgentRunDiagnostics.ToolCalls collection and produces: - Tool Calls Total — total number of tool invocations. - Tool Calls Failed — count of tool invocations whose NexusLabs.Needlr.AgentFramework.Diagnostics.ToolCallDiagnostics.Succeeded is false. - Tool Call Sequence Gaps — number of missing slots in the NexusLabs.Needlr.AgentFramework.Diagnostics.ToolCallDiagnostics.Sequence stream (a strictly increasing sequence starting at 0 has zero gaps). - All Tool Calls Succeeded — boolean rollup. true when every tool invocation succeeded (or when no tool calls occurred). - Consecutive Same-Tool Calls — count of consecutive tool invocations with the same NexusLabs.Needlr.AgentFramework.Diagnostics.ToolCallDiagnostics.ToolName. Useful as a heuristic for stuck or looping agents. Note: parallel fan-out to the same tool (valid usage) will also increment this counter. - Per-Tool Failure Rate — JSON string mapping each tool name to its failure rate (0.0–1.0), sorted alphabetically. - Tool Call Latency P50 — 50th percentile of tool call durations in milliseconds (nearest-rank method). - Tool Call Latency P95 — 95th percentile of tool call durations in milliseconds (nearest-rank method).

When no AgentRunDiagnosticsContext is present in the additionalContext collection, the evaluator returns an empty Microsoft.Extensions.AI.Evaluation.EvaluationResult — callers should treat that as "not applicable".

Fields

ToolCallTrajectoryEvaluator.AllSucceededMetricName Field

Metric name for the boolean rollup indicating every tool call succeeded.

public const string AllSucceededMetricName = "All Tool Calls Succeeded";

Field Value

System.String

ToolCallTrajectoryEvaluator.ConsecutiveSameToolMetricName Field

Metric name for the count of consecutive tool calls with the same tool name.

public const string ConsecutiveSameToolMetricName = "Consecutive Same-Tool Calls";

Field Value

System.String

ToolCallTrajectoryEvaluator.FailedMetricName Field

Metric name for the failed tool-call count.

public const string FailedMetricName = "Tool Calls Failed";

Field Value

System.String

ToolCallTrajectoryEvaluator.LatencyP50MetricName Field

Metric name for the 50th percentile tool-call latency in milliseconds.

public const string LatencyP50MetricName = "Tool Call Latency P50";

Field Value

System.String

ToolCallTrajectoryEvaluator.LatencyP95MetricName Field

Metric name for the 95th percentile tool-call latency in milliseconds.

public const string LatencyP95MetricName = "Tool Call Latency P95";

Field Value

System.String

ToolCallTrajectoryEvaluator.PerToolFailureRateMetricName Field

Metric name for the JSON-formatted per-tool failure rate breakdown.

public const string PerToolFailureRateMetricName = "Per-Tool Failure Rate";

Field Value

System.String

ToolCallTrajectoryEvaluator.SequenceGapsMetricName Field

Metric name for the number of gaps in the recorded tool-call sequence.

public const string SequenceGapsMetricName = "Tool Call Sequence Gaps";

Field Value

System.String

ToolCallTrajectoryEvaluator.TotalMetricName Field

Metric name for the total tool-call count.

public const string TotalMetricName = "Tool Calls Total";

Field Value

System.String

Properties

ToolCallTrajectoryEvaluator.EvaluationMetricNames Property

Gets the Microsoft.Extensions.AI.Evaluation.EvaluationMetric.Names of the Microsoft.Extensions.AI.Evaluation.EvaluationMetrics produced by this Microsoft.Extensions.AI.Evaluation.IEvaluator.

public System.Collections.Generic.IReadOnlyCollection<string> EvaluationMetricNames { get; }

Implements EvaluationMetricNames

Property Value

System.Collections.Generic.IReadOnlyCollection<System.String>

Methods

ToolCallTrajectoryEvaluator.EvaluateAsync(IEnumerable<ChatMessage>, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken) Method

Evaluates the supplied modelResponse and returns an Microsoft.Extensions.AI.Evaluation.EvaluationResult containing one or more Microsoft.Extensions.AI.Evaluation.EvaluationMetrics.

public System.Threading.Tasks.ValueTask<Microsoft.Extensions.AI.Evaluation.EvaluationResult> EvaluateAsync(System.Collections.Generic.IEnumerable<Microsoft.Extensions.AI.ChatMessage> messages, Microsoft.Extensions.AI.ChatResponse modelResponse, Microsoft.Extensions.AI.Evaluation.ChatConfiguration? chatConfiguration=null, System.Collections.Generic.IEnumerable<Microsoft.Extensions.AI.Evaluation.EvaluationContext>? additionalContext=null, System.Threading.CancellationToken cancellationToken=default(System.Threading.CancellationToken));

Parameters

messages System.Collections.Generic.IEnumerable<Microsoft.Extensions.AI.ChatMessage>

The conversation history including the request that produced the supplied modelResponse.

modelResponse Microsoft.Extensions.AI.ChatResponse

The response that is to be evaluated.

chatConfiguration Microsoft.Extensions.AI.Evaluation.ChatConfiguration

A Microsoft.Extensions.AI.Evaluation.ChatConfiguration that specifies the Microsoft.Extensions.AI.IChatClient that should be used if one or more composed Microsoft.Extensions.AI.Evaluation.IEvaluators use an AI model to perform evaluation.

additionalContext System.Collections.Generic.IEnumerable<Microsoft.Extensions.AI.Evaluation.EvaluationContext>

Additional contextual information (beyond that which is available in messages) that the Microsoft.Extensions.AI.Evaluation.IEvaluator may need to accurately evaluate the supplied modelResponse.

cancellationToken System.Threading.CancellationToken

A System.Threading.CancellationToken that can cancel the evaluation operation.

Implements EvaluateAsync(IEnumerable<ChatMessage>, ChatResponse, ChatConfiguration, IEnumerable<EvaluationContext>, CancellationToken)

Returns

System.Threading.Tasks.ValueTask<Microsoft.Extensions.AI.Evaluation.EvaluationResult>
An Microsoft.Extensions.AI.Evaluation.EvaluationResult containing one or more Microsoft.Extensions.AI.Evaluation.EvaluationMetrics.

Remarks

The Microsoft.Extensions.AI.Evaluation.EvaluationMetric.Names of the Microsoft.Extensions.AI.Evaluation.EvaluationMetrics contained in the returned Microsoft.Extensions.AI.Evaluation.EvaluationResult should match Microsoft.Extensions.AI.Evaluation.IEvaluator.EvaluationMetricNames.

Also note that chatConfiguration must not be omitted if the evaluation is performed using an AI model.