Skip to main content
This guide helps you write effective prompts for LLM-based adherence detection in the By Question metric. Well-structured prompts are specific, measurable, and account for real-world conversation variations.

Prompt Architecture

Every effective AutoQA prompt has three components:
ComponentDescription
ContextDefine the evaluation scenario
Pass CriteriaSpecific behaviors that constitute success
Fail CriteriaClear indicators of non-compliance

Template Structure

CONTEXT: [Conversation type and evaluation scope]

PASS CRITERIA: [Specific behaviors indicating adherence]
* Look for: [required elements]
* Acceptable variations: [alternative approaches]

FAIL CRITERIA: [Behaviors indicating non-adherence]
* Missing: [critical elements]
* Inadequate: [insufficient attempts]

Prompt Development Process

Step 1: Define Success Criteria

Ask yourself:
  • What specific actions or words indicate adherence?
  • What variations are acceptable?
  • What constitutes a clear failure?
  • How granular should the evaluation be?

Step 2: Create Measurable Standards

Criteria should be:
  • Observable in conversation transcripts
  • Objective rather than subjective
  • Specific enough to avoid interpretation gaps
  • Comprehensive enough to cover typical scenarios

Step 3: Test Specificity

Validate your prompt by asking:
  • Could two evaluators reach different conclusions from this prompt?
  • Are there ambiguous terms that need clarification?
  • Does it clearly differentiate between pass and fail scenarios?

Example Prompts

Greeting Evaluation

Poor Prompts (Low Precision/Recall)

  • “Check if the agent was polite when greeting the customer.”
  • “Did the agent say hello properly?”
  • “Evaluate the quality of the agent’s opening statement.”
Why these fail: “Polite,” “properly,” and “quality” are subjective — they provide no actionable evaluation framework.

Robust Prompt (High Precision/Recall)

CONTEXT: Evaluate whether the agent provided a complete and professional greeting.

PASS CRITERIA:
Look for all four elements:
* Acknowledgement/welcome phrase (for example, "Hello", "Good morning", "Thank you for calling")
* Company/department identification (for example, "ABC Company", "Technical Support")
* Agent identification (name, employee ID, or role)
* Offer of assistance (for example, "How can I help you?")

Acceptable variations:
* Elements may appear in any order
* Casual but professional tone is acceptable
* Abbreviated company name if commonly recognized
* Combined elements (for example, "This is Sarah from Tech Support, how can I help?")

FAIL CRITERIA:
* Any of the four elements is missing
* Unprofessional language, slang, or inappropriate tone
* Generic greeting without company/agent identification
* No clear offer of assistance

Call Closing Evaluation

Poor Prompts (Low Precision/Recall)

  • “Check if the agent ended the call nicely.”
  • “Did the agent close the call professionally and make sure the customer was satisfied?”
  • “Evaluate whether the call conclusion was appropriate.”
Why these fail: “Nicely” and “appropriately” are subjective, and combining multiple criteria without definitions makes consistent evaluation impossible.

Robust Prompt (High Precision/Recall)

CONTEXT: Evaluate whether the agent provided a complete and professional closing.

PASS CRITERIA:
Look for at least 3 of these 4 elements:
* Issue resolution summary or confirmation of next steps
* Satisfaction verification (for example, "Does that resolve your concern?", "Anything else I can help with?")
* Appreciation statement (for example, "Thank you for calling", "I appreciate your patience")
* Professional sign-off (for example, "Have a great day", company-specific closing phrase)

Acceptable variations:
* Elements may be naturally integrated into the conversation flow
* Customer satisfaction can be implied if the customer explicitly expresses satisfaction first
* Concise closings are acceptable for straightforward resolutions
* Personal touches that maintain professionalism

FAIL CRITERIA:
* Fewer than 3 required elements present
* Customer left with unresolved questions or unclear next steps
* Abrupt disconnection without a closure attempt
* Unprofessional final statements or dismissive tone
* No confirmation of customer understanding when complex solutions are provided

Pre-Deployment Checklist

Before deploying a prompt, verify:
CheckQuestion
Measurable CriteriaCan each element be objectively identified?
Complete CoverageAre all success and failure scenarios addressed?
Unambiguous LanguageWould different evaluators reach consistent conclusions?
Realistic ExpectationsAre the standards achievable for your agent population?
Clear BoundariesIs the distinction between pass and fail evident?
Consistent ScoringDoes it align with your overall evaluation framework?

Common Pitfalls

PitfallPoor ExampleBetter Alternative
Vague descriptors”Professional manner""Uses courteous language and acknowledges customer concerns”
Subjective judgments”Friendly tone""Uses positive language markers and avoids negative phrasing”
Compound criteriaMixing multiple criteria without weightingSeparate each criterion with clear pass/fail definitions
Cultural assumptionsAssuming universal communication stylesDefine acceptable expressions for each context
Perfectionist standardsCriteria that exclude natural conversation variationsDefine acceptable alternatives upfront
Missing specificityNo definition of successful completionDefine what counts as success for each element
Implicit requirementsUnstated evaluator expectationsMake all expectations explicit in the prompt
Binary oversimplificationNot accounting for partial completionAccount for partial completion and contextual appropriateness

Key Success Factors

Specificity Over Generality

Replace broad concepts with concrete, observable behaviors:
  • Instead of: “Agent was helpful”
  • Use: “Agent acknowledged the customer’s concern and provided specific action steps”

Observable Actions Over Intentions

Focus on what can be measured in the transcript:
  • Instead of: “Agent showed empathy”
  • Use: “Agent used acknowledgement phrases such as ‘I understand’ or ‘That must be frustrating‘“

Inclusive Criteria Design

Account for natural conversation variations:
  • Allow multiple ways to meet the same requirement
  • Define acceptable alternatives upfront
  • Consider different communication styles while maintaining standards

Clear Failure Definition

Be explicit about non-compliance:
  • Define both missing elements and inadequate attempts
  • Specify unacceptable alternatives
  • Address common failure modes directly
Effective prompts balance specificity with flexibility — ensuring consistent evaluation while accommodating natural variations in human communication.