This guide helps you write effective prompts for LLM-based adherence detection in the By Question metric. Well-structured prompts are specific, measurable, and account for real-world conversation variations.
Prompt Architecture
Every effective AutoQA prompt has three components:
| Component | Description |
|---|
| Context | Define the evaluation scenario |
| Pass Criteria | Specific behaviors that constitute success |
| Fail Criteria | Clear indicators of non-compliance |
Template Structure
CONTEXT: [Conversation type and evaluation scope]
PASS CRITERIA: [Specific behaviors indicating adherence]
* Look for: [required elements]
* Acceptable variations: [alternative approaches]
FAIL CRITERIA: [Behaviors indicating non-adherence]
* Missing: [critical elements]
* Inadequate: [insufficient attempts]
Prompt Development Process
Step 1: Define Success Criteria
Ask yourself:
- What specific actions or words indicate adherence?
- What variations are acceptable?
- What constitutes a clear failure?
- How granular should the evaluation be?
Step 2: Create Measurable Standards
Criteria should be:
- Observable in conversation transcripts
- Objective rather than subjective
- Specific enough to avoid interpretation gaps
- Comprehensive enough to cover typical scenarios
Step 3: Test Specificity
Validate your prompt by asking:
- Could two evaluators reach different conclusions from this prompt?
- Are there ambiguous terms that need clarification?
- Does it clearly differentiate between pass and fail scenarios?
Example Prompts
Greeting Evaluation
Poor Prompts (Low Precision/Recall)
- “Check if the agent was polite when greeting the customer.”
- “Did the agent say hello properly?”
- “Evaluate the quality of the agent’s opening statement.”
Why these fail: “Polite,” “properly,” and “quality” are subjective — they provide no actionable evaluation framework.
Robust Prompt (High Precision/Recall)
CONTEXT: Evaluate whether the agent provided a complete and professional greeting.
PASS CRITERIA:
Look for all four elements:
* Acknowledgement/welcome phrase (for example, "Hello", "Good morning", "Thank you for calling")
* Company/department identification (for example, "ABC Company", "Technical Support")
* Agent identification (name, employee ID, or role)
* Offer of assistance (for example, "How can I help you?")
Acceptable variations:
* Elements may appear in any order
* Casual but professional tone is acceptable
* Abbreviated company name if commonly recognized
* Combined elements (for example, "This is Sarah from Tech Support, how can I help?")
FAIL CRITERIA:
* Any of the four elements is missing
* Unprofessional language, slang, or inappropriate tone
* Generic greeting without company/agent identification
* No clear offer of assistance
Call Closing Evaluation
Poor Prompts (Low Precision/Recall)
- “Check if the agent ended the call nicely.”
- “Did the agent close the call professionally and make sure the customer was satisfied?”
- “Evaluate whether the call conclusion was appropriate.”
Why these fail: “Nicely” and “appropriately” are subjective, and combining multiple criteria without definitions makes consistent evaluation impossible.
Robust Prompt (High Precision/Recall)
CONTEXT: Evaluate whether the agent provided a complete and professional closing.
PASS CRITERIA:
Look for at least 3 of these 4 elements:
* Issue resolution summary or confirmation of next steps
* Satisfaction verification (for example, "Does that resolve your concern?", "Anything else I can help with?")
* Appreciation statement (for example, "Thank you for calling", "I appreciate your patience")
* Professional sign-off (for example, "Have a great day", company-specific closing phrase)
Acceptable variations:
* Elements may be naturally integrated into the conversation flow
* Customer satisfaction can be implied if the customer explicitly expresses satisfaction first
* Concise closings are acceptable for straightforward resolutions
* Personal touches that maintain professionalism
FAIL CRITERIA:
* Fewer than 3 required elements present
* Customer left with unresolved questions or unclear next steps
* Abrupt disconnection without a closure attempt
* Unprofessional final statements or dismissive tone
* No confirmation of customer understanding when complex solutions are provided
Pre-Deployment Checklist
Before deploying a prompt, verify:
| Check | Question |
|---|
| Measurable Criteria | Can each element be objectively identified? |
| Complete Coverage | Are all success and failure scenarios addressed? |
| Unambiguous Language | Would different evaluators reach consistent conclusions? |
| Realistic Expectations | Are the standards achievable for your agent population? |
| Clear Boundaries | Is the distinction between pass and fail evident? |
| Consistent Scoring | Does it align with your overall evaluation framework? |
Common Pitfalls
| Pitfall | Poor Example | Better Alternative |
|---|
| Vague descriptors | ”Professional manner" | "Uses courteous language and acknowledges customer concerns” |
| Subjective judgments | ”Friendly tone" | "Uses positive language markers and avoids negative phrasing” |
| Compound criteria | Mixing multiple criteria without weighting | Separate each criterion with clear pass/fail definitions |
| Cultural assumptions | Assuming universal communication styles | Define acceptable expressions for each context |
| Perfectionist standards | Criteria that exclude natural conversation variations | Define acceptable alternatives upfront |
| Missing specificity | No definition of successful completion | Define what counts as success for each element |
| Implicit requirements | Unstated evaluator expectations | Make all expectations explicit in the prompt |
| Binary oversimplification | Not accounting for partial completion | Account for partial completion and contextual appropriateness |
Key Success Factors
Specificity Over Generality
Replace broad concepts with concrete, observable behaviors:
- Instead of: “Agent was helpful”
- Use: “Agent acknowledged the customer’s concern and provided specific action steps”
Observable Actions Over Intentions
Focus on what can be measured in the transcript:
- Instead of: “Agent showed empathy”
- Use: “Agent used acknowledgement phrases such as ‘I understand’ or ‘That must be frustrating‘“
Inclusive Criteria Design
Account for natural conversation variations:
- Allow multiple ways to meet the same requirement
- Define acceptable alternatives upfront
- Consider different communication styles while maintaining standards
Clear Failure Definition
Be explicit about non-compliance:
- Define both missing elements and inadequate attempts
- Specify unacceptable alternatives
- Address common failure modes directly
Effective prompts balance specificity with flexibility — ensuring consistent evaluation while accommodating natural variations in human communication.