AutoQA Prompting Guide

This guide helps you write effective prompts for LLM-based adherence detection in the By Question metric. Well-structured prompts are specific, measurable, and account for real-world conversation variations.

Prompt Architecture

Every effective AutoQA prompt has three components:

Component	Description
Context	Define the evaluation scenario
Pass Criteria	Specific behaviors that constitute success
Fail Criteria	Clear indicators of non-compliance

Template Structure

CONTEXT: [Conversation type and evaluation scope]

PASS CRITERIA: [Specific behaviors indicating adherence]
* Look for: [required elements]
* Acceptable variations: [alternative approaches]

FAIL CRITERIA: [Behaviors indicating non-adherence]
* Missing: [critical elements]
* Inadequate: [insufficient attempts]

Prompt Development Process

Step 1: Define Success Criteria

Ask yourself:

What specific actions or words indicate adherence?
What variations are acceptable?
What constitutes a clear failure?
How granular should the evaluation be?

Step 2: Create Measurable Standards

Criteria should be:

Observable in conversation transcripts
Objective rather than subjective
Specific enough to avoid interpretation gaps
Comprehensive enough to cover typical scenarios

Step 3: Test Specificity

Validate your prompt by asking:

Could two evaluators reach different conclusions from this prompt?
Are there ambiguous terms that need clarification?
Does it clearly differentiate between pass and fail scenarios?

Example Prompts

Greeting Evaluation

Poor Prompts (Low Precision/Recall)

“Check if the agent was polite when greeting the customer.”
“Did the agent say hello properly?”
“Evaluate the quality of the agent’s opening statement.”

Why these fail: “Polite,” “properly,” and “quality” are subjective-they provide no actionable evaluation framework.

Robust Prompt (High Precision/Recall)

CONTEXT: Evaluate whether the agent provided a complete and professional greeting.

PASS CRITERIA:
Look for all four elements:
* Acknowledgement/welcome phrase (for example, "Hello", "Good morning", "Thank you for calling")
* Company/department identification (for example, "ABC Company", "Technical Support")
* Agent identification (name, employee ID, or role)
* Offer of assistance (for example, "How can I help you?")

Acceptable variations:
* Elements may appear in any order
* Casual but professional tone is acceptable
* Abbreviated company name if commonly recognized
* Combined elements (for example, "This is Sarah from Tech Support, how can I help?")

FAIL CRITERIA:
* Any of the four elements is missing
* Unprofessional language, slang, or inappropriate tone
* Generic greeting without company/agent identification
* No clear offer of assistance

Call Closing Evaluation

Poor Prompts (Low Precision/Recall)

“Check if the agent ended the call nicely.”
“Did the agent close the call professionally and make sure the customer was satisfied?”
“Evaluate whether the call conclusion was appropriate.”

Why these fail: “Nicely” and “appropriately” are subjective, and combining multiple criteria without definitions makes consistent evaluation impossible.

Robust Prompt (High Precision/Recall)

CONTEXT: Evaluate whether the agent provided a complete and professional closing.

PASS CRITERIA:
Look for at least 3 of these 4 elements:
* Issue resolution summary or confirmation of next steps
* Satisfaction verification (for example, "Does that resolve your concern?", "Anything else I can help with?")
* Appreciation statement (for example, "Thank you for calling", "I appreciate your patience")
* Professional sign-off (for example, "Have a great day", company-specific closing phrase)

Acceptable variations:
* Elements may be naturally integrated into the conversation flow
* Customer satisfaction can be implied if the customer explicitly expresses satisfaction first
* Concise closings are acceptable for straightforward resolutions
* Personal touches that maintain professionalism

FAIL CRITERIA:
* Fewer than 3 required elements present
* Customer left with unresolved questions or unclear next steps
* Abrupt disconnection without a closure attempt
* Unprofessional final statements or dismissive tone
* No confirmation of customer understanding when complex solutions are provided

Pre-Deployment Checklist

Before deploying a prompt, verify:

Check	Question
Measurable Criteria	Can each element be objectively identified?
Complete Coverage	Are all success and failure scenarios addressed?
Unambiguous Language	Would different evaluators reach consistent conclusions?
Realistic Expectations	Are the standards achievable for your agent population?
Clear Boundaries	Is the distinction between pass and fail evident?
Consistent Scoring	Does it align with your overall evaluation framework?

Common Pitfalls

Pitfall	Poor Example	Better Alternative
Vague descriptors	”Professional manner"	"Uses courteous language and acknowledges customer concerns”
Subjective judgments	”Friendly tone"	"Uses positive language markers and avoids negative phrasing”
Compound criteria	Mixing multiple criteria without weighting	Separate each criterion with clear pass/fail definitions
Cultural assumptions	Assuming universal communication styles	Define acceptable expressions for each context
Perfectionist standards	Criteria that exclude natural conversation variations	Define acceptable alternatives upfront
Missing specificity	No definition of successful completion	Define what counts as success for each element
Implicit requirements	Unstated evaluator expectations	Make all expectations explicit in the prompt
Binary oversimplification	Not accounting for partial completion	Account for partial completion and contextual appropriateness

Key Success Factors

Specificity Over Generality

Replace broad concepts with concrete, observable behaviors:

Instead of: “Agent was helpful”
Use: “Agent acknowledged the customer’s concern and provided specific action steps”

Observable Actions Over Intentions

Focus on what can be measured in the transcript:

Instead of: “Agent showed empathy”
Use: “Agent used acknowledgement phrases such as ‘I understand’ or ‘That must be frustrating‘“

Inclusive Criteria Design

Account for natural conversation variations:

Allow multiple ways to meet the same requirement
Define acceptable alternatives upfront
Consider different communication styles while maintaining standards

Clear Failure Definition

Be explicit about non-compliance:

Define both missing elements and inadequate attempts
Specify unacceptable alternatives
Address common failure modes directly

Effective prompts balance specificity with flexibility — ensuring consistent evaluation while accommodating natural variations in human communication.

Modules

Platform Services

Administration

References

Prompt Architecture

Template Structure

Prompt Development Process

Step 1: Define Success Criteria

Step 2: Create Measurable Standards

Step 3: Test Specificity

Example Prompts

Greeting Evaluation

Poor Prompts (Low Precision/Recall)

Robust Prompt (High Precision/Recall)

Call Closing Evaluation

Poor Prompts (Low Precision/Recall)

Robust Prompt (High Precision/Recall)

Pre-Deployment Checklist

Common Pitfalls

Key Success Factors

Specificity Over Generality

Observable Actions Over Intentions

Inclusive Criteria Design

Clear Failure Definition

Modules

Platform Services

Administration

References

Documentation Index

​Prompt Architecture

​Template Structure

​Prompt Development Process

​Step 1: Define Success Criteria

​Step 2: Create Measurable Standards

​Step 3: Test Specificity

​Example Prompts

​Greeting Evaluation

​Poor Prompts (Low Precision/Recall)

​Robust Prompt (High Precision/Recall)

​Call Closing Evaluation

​Poor Prompts (Low Precision/Recall)

​Robust Prompt (High Precision/Recall)

​Pre-Deployment Checklist

​Common Pitfalls

​Key Success Factors

​Specificity Over Generality

​Observable Actions Over Intentions

​Inclusive Criteria Design

​Clear Failure Definition

Prompt Architecture

Template Structure

Prompt Development Process

Step 1: Define Success Criteria

Step 2: Create Measurable Standards

Step 3: Test Specificity

Example Prompts

Greeting Evaluation

Poor Prompts (Low Precision/Recall)

Robust Prompt (High Precision/Recall)

Call Closing Evaluation

Poor Prompts (Low Precision/Recall)

Robust Prompt (High Precision/Recall)

Pre-Deployment Checklist

Common Pitfalls

Key Success Factors

Specificity Over Generality

Observable Actions Over Intentions

Inclusive Criteria Design

Clear Failure Definition