AWS S3 Connector Setup Guide

The AWS S3 Connector lets you pull conversation recordings and chat transcripts from an S3 bucket into Quality AI Express on a configurable schedule. Use this connector to analyze interactions from third-party Contact Center as a Service (CCaaS) solutions.

Prerequisites

Complete the following before you start.

AWS Requirements

Requirement	Details
S3 bucket	Created in your preferred region with an organized folder structure
IAM permissions	Read-only access (`s3:GetObject`, `s3:ListBucket`) via access keys or IAM role
Audio files	WAV or MP3 format, maximum 50 MB each, accessible via HTTPS
Chat files	JSON format
Timestamps	ISO 8601 format with UTC timezone (`YYYY-MM-DDTHH:MM:SSZ`)
Test file	A `test.csv` file with sample data in each configured S3 folder

Required IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:ListBucket"],
            "Resource": [
                "arn:aws:s3:::your-bucket-name",
                "arn:aws:s3:::your-bucket-name/*"
            ]
        }
    ]
}

Platform Requirements

Requirement	Details
Quality AI Express	Feature enabled in platform settings
Agents	All agents onboarded with valid, matching email addresses
Queues	Service queues configured and ready for mapping
Permissions	You have Integrations & Extensions access

Supported Recording Types

Type	Format	Files per Conversation	Channel Assignment	Analytics
Stereo Voice	WAV/MP3	1	Left = Agent, Right = Customer	Full Analytics
Mono Voice	WAV/MP3	2 (separate agent/customer files)	N/A	Enhanced Analytics
Voice Transcripts	JSON	1	Pre-transcribed audio	Text Analytics
Chat Scripts	JSON	1	Message-level attribution	Full Text Analytics

Mono Recording Requirement

Mono recordings require two separate audio files — one for the agent and one for the customer. A single mixed mono file is not supported.

Supported	Not Supported
`conv-123456-agent.wav` (agent only) + `conv-123456-customer.wav` (customer only)	`conv-123456-mixed.wav` (both speakers combined)

Using a single mixed mono file significantly reduces transcription accuracy.

Data Flow

CSV Metadata Formats

Each recording type requires specific CSV fields. The core fields are the same across all types; only the recording-specific fields differ.

Stereo Voice Recordings

Configuration: recordingType = stereo, channelType = voice

Field	Required	Type	Example	Notes
`conversationId`	Required	String	`conv-123456`	Unique identifier, max 50 chars
`agentEmail`	Required	String	`john.smith@company.com`	Must match a platform user account
`conversationStartTime`	Required	String	`2025-04-10T14:30:00Z`	ISO 8601, UTC timezone
`conversationEndTime`	Required	String	`2025-04-10T14:32:45Z`	Must be after start time
`channelType`	Required	String	`voice`	Always `voice` for audio
`recordingType`	Required	String	`stereo`	Always `stereo` for this format
`chatScriptUrl`	Required	String	`https://your-bucket.s3.amazonaws.com/transcripts/chat-123.json`	Full HTTPS URL to JSON transcript file
`recordingUrl`	Required	String	`https://s3.amazonaws.com/bucket/conv-123456.wav`	HTTPS URL
`transcriptUrl`	Required	String	`https://your-bucket.s3.amazonaws.com/transcripts/voice-123.json`	Full HTTPS URL to JSON transcript file
`queueId`	Required	String	`support-tier1`	Must exist in queue mapping
`agentChannel`	Required	Integer	`0`	Agent audio channel (0 = left, 1 = right)
`customerChannel`	Required	Integer	`1`	Customer audio channel (0 = left, 1 = right)
`language`	Optional	String	`en`	ISO 639-1 format, defaults to `en`
`asprovider`	Optional	String	`microsoft`	Audio service provider

Mono Voice Recordings

Configuration: recordingType = mono, channelType = voice

Mono recordings require two separate CSV rows and two audio files per conversation — one for the agent, one for the customer. Use the same conversationId for both rows.

Field	Required	Type	Example	Notes
`conversationId`	Required	String	`conv-123456`	Same ID for both agent and customer rows
`agentEmail`	Required	String	`john.smith@company.com`	Must match a platform user account
`conversationStartTime`	Required	String	`2025-04-10T14:30:00Z`	ISO 8601, UTC timezone
`conversationEndTime`	Required	String	`2025-04-10T14:32:45Z`	Must be after start time
`channelType`	Required	String	`voice`	Always `voice` for audio
`recordingType`	Required	String	`mono`	Always `mono` for this format
`agentRecordings`	Required	String	`https://s3.amazonaws.com/bucket/conv-123456-agent.wav`	URL to agent audio file
`customerRecordings`	Required	String	`https://s3.amazonaws.com/bucket/conv-123456-customer.wav`	URL to customer audio file
`queueId`	Required	String	`support-tier1`	Must exist in queue mapping
`agentId`	Optional	String	`agent-789`	Internal agent identifier
`language`	Optional	String	`en`	ISO 639-1 format, defaults to `en`
`asProvider`	Optional	String	`microsoft`	Transcription provider

Voice Transcripts (Pre-transcribed Audio)

Configuration: recordingType = transcription, channelType = voice Use this format when you have already transcribed your voice recordings and want to import the text for analysis without reprocessing the audio.

Field	Required	Type	Example	Notes
`conversationId`	Required	String	`conv-123456`	Unique identifier, max 50 chars
`agentEmail`	Required	String	`john.smith@company.com`	Must match a platform user account
`conversationStartTime`	Required	String	`2025-04-10T14:30:00Z`	ISO 8601, UTC timezone
`conversationEndTime`	Required	String	`2025-04-10T14:32:45Z`	Must be after start time
`channelType`	Required	String	`voice`	Always `voice` for audio transcripts
`recordingType`	Required	String	`transcription`	Always `transcription` for this format
`transcriptPath`	Required	String	`transcripts/voice-123.json`	Path to JSON transcript file
`queueId`	Required	String	`support-tier1`	Must exist in queue mapping
`language`	Optional	String	`en`	ISO 639-1 format, defaults to `en`
`asProvider`	Optional	String	`microsoft`	Original audio service provider

Chat Scripts (Live Chat Interactions)

Configuration: recordingType = transcription, channelType = chat Use this format for live chat interactions from web chat, messaging platforms, or chat-based customer service.

Chat scripts support interactions from platforms including web chat, WhatsApp, and Facebook Messenger.

Field	Required	Type	Example	Notes
`conversationId`	Required	String	`conv-123456`	Unique identifier, max 50 chars
`agentEmail`	Required	String	`john.smith@company.com`	Must match a platform user account
`conversationStartTime`	Required	String	`2025-04-10T14:30:00Z`	ISO 8601, UTC timezone
`conversationEndTime`	Required	String	`2025-04-10T14:45:00Z`	Must be after start time
`channelType`	Required	String	`chat`	Always `chat` for text interactions
`recordingType`	Required	String	`transcription`	Always `transcription` for chat
`transcriptPath`	Required	String	`transcripts/chat-123.json`	Path to JSON transcript file
`queueId`	Required	String	`support-tier1`	Must exist in queue mapping
`language`	Optional	String	`en-US`	Defaults to `en` if not specified

For conversations involving agent or queue transfers, use the queueId of the queue where the conversation ended, and the agentEmail of the agent who closed the conversation.

JSON Transcript Schemas

Voice Transcript Format

Full example:

{
  "recognizedPhrases": [
    {
      "recognitionStatus": "Success",
      "channel": 0,
      "offset": "PT14S",
      "duration": "PT2.4S",
      "offsetInTicks": 140000000.0,
      "durationInTicks": 24000000.0,
      "durationMilliseconds": 2400,
      "offsetMilliseconds": 14000,
      "nBest": [
        {
          "confidence": 0.8205426,
          "lexical": "yes one four three four two six",
          "itn": "yes 143426",
          "maskedITN": "yes one four three four two six",
          "display": "Yes, 143426.",
          "words": [
            {
              "word": "yes",
              "offset": "PT14S",
              "duration": "PT0.32S",
              "offsetInTicks": 140000000.0,
              "durationInTicks": 3200000.0,
              "durationMilliseconds": 320,
              "offsetMilliseconds": 14000,
              "confidence": 0.51653963
            }
          ]
        }
      ]
    }
  ]
}

Required fields only:

{
  "recognizedPhrases": [
    {
      "channel": 0,
      "offsetInTicks": 140000000.0,
      "nBest": [
        {
          "lexical": "yes one four three four two six",
          "words": [
            {
              "word": "yes",
              "offsetInTicks": 140000000.0,
              "durationInTicks": 3200000.0,
              "confidence": 0.51653963
            }
          ]
        }
      ]
    }
  ]
}

Chat Transcript Format

Example:

{
  "1": {
    "type": "AGENT",
    "text": "Good afternoon, how can I help you today?",
    "timestamp": 1749562206000,
    "userId": "john.doe@example.com"
  },
  "2": {
    "type": "USER",
    "text": "I need help with my account balance.",
    "timestamp": 1749562253142,
    "userId": "customer_12345"
  }
}

Required fields:

Field	Values	Notes
`type`	`AGENT`, `USER`, or `SYSTEM`	Identifies the speaker
`text`	Message content	The message text
`timestamp`	Unix timestamp in milliseconds	Message time
`userId`	Participant identifier	Agent email or customer ID

Configuration Steps

Step 1: Prepare Your S3 Environment

Choose a folder structure for your S3 bucket. Option 1: Unified Path (voice and chat in one folder):

Option 2: Separate Paths (voice and chat in separate folders):

Before moving on, verify:

All audio files are accessible via HTTPS URLs.
CSV files contain the required fields with correct column headers.
Mono recordings have separate agent and customer files.
A test.csv file exists in each configured folder.
All file sizes are under 50 MB.

File names should not contain any spaces as a best practice.

Step 2: Add the Connector

Navigate to Quality AI > Configure > Connectors.
Select + Add Connector > Amazon S3 > Connect.
Enter a Name for the connector.
Select your AWS Region.
Choose an Auth Type and enter your credentials:
- Access Keys: Enter your Access Key and Secret Key.
- IAM Role: Enter the IAM Role ARN.
Set the folder path:
- Unified Path: Enter a single path for both voice and chat (for example, s3://your-bucket/conversations/).
- Separate Paths: Enter a Voice Path and a Chat Path separately.

Step 3: Test the Connection

Select the Test tab in the connector configuration.
Confirm the following checks pass:
Check Expected Result
Authentication Connected successfully
File Path Access S3 bucket accessible
File Format CSV format validated
Metadata Validation Required fields confirmed

Check	Expected Result
Authentication	Connected successfully
File Path Access	S3 bucket accessible
File Format	CSV format validated
Metadata Validation	Required fields confirmed

If a check fails:

Check	Resolution
Authentication	Verify credentials and IAM permissions; ensure they haven’t expired
File Access	Check bucket name, region, and folder paths; confirm file URLs are accessible
Format/Metadata	Ensure `test.csv` exists with correct structure, column headers, and timestamps

Step 4: Map Queues and Set a Schedule

Navigate to the Queue tab.
Map each queueId value from your CSV files to a queue in Quality AI Express. Values must match exactly.
Navigate to the Schedule tab.
Set the Interval (minutes, hours, or days) and the Start Time (UTC).
Select Save to activate the connector.

Verify the setup is complete:

Queue mappings are saved and validated.
The processing schedule is active.
The first ingestion job appears in the Log tab.
No errors appear in the processing logs.

Success indicators:

Conversations appear in Quality AI Express dashboards.
Analytics data populates for ingested interactions.

Troubleshooting

Authentication Issues

Problem	Symptom	Resolution
Invalid Credentials	Authentication failed error	Verify access key and secret key; check IAM role ARN format; ensure credentials haven’t expired
Permission Denied	Access denied to S3 bucket	Add S3 read permissions to the IAM user or role; verify the bucket policy; confirm the bucket region matches the configuration

Data Processing Issues

Problem	Symptom	Resolution
Timestamp Errors	Invalid timestamp format	Use ISO 8601 format (`YYYY-MM-DDTHH:MM:SSZ`); include UTC timezone; verify end time is after start time

Performance

Processing time is approximately 3-5 minutes per conversation, depending on conversation length, ASR transcription latency (for voice), and LLM response latency.

Modules

Platform Services

Administration

References

AWS S3 Connector Setup Guide

Prerequisites

AWS Requirements

Platform Requirements

Supported Recording Types

Mono Recording Requirement

Data Flow

CSV Metadata Formats

Stereo Voice Recordings

Mono Voice Recordings

Voice Transcripts (Pre-transcribed Audio)

Chat Scripts (Live Chat Interactions)

JSON Transcript Schemas

Voice Transcript Format

Chat Transcript Format

Configuration Steps

Step 1: Prepare Your S3 Environment

Step 2: Add the Connector

Step 3: Test the Connection

Step 4: Map Queues and Set a Schedule

Troubleshooting

Authentication Issues

Data Processing Issues

Performance

Modules

Platform Services

Administration

References

Documentation Index

​Prerequisites

​AWS Requirements

​Platform Requirements

​Supported Recording Types

​Mono Recording Requirement

​Data Flow

​CSV Metadata Formats

​Stereo Voice Recordings

​Mono Voice Recordings

​Voice Transcripts (Pre-transcribed Audio)

​Chat Scripts (Live Chat Interactions)

​JSON Transcript Schemas

​Voice Transcript Format

​Chat Transcript Format

​Configuration Steps

​Step 1: Prepare Your S3 Environment

​Step 2: Add the Connector

​Step 3: Test the Connection

​Step 4: Map Queues and Set a Schedule

​Troubleshooting

​Authentication Issues

​Data Processing Issues

​Performance

Prerequisites

AWS Requirements

Platform Requirements

Supported Recording Types

Mono Recording Requirement

Data Flow

CSV Metadata Formats

Stereo Voice Recordings

Mono Voice Recordings

Voice Transcripts (Pre-transcribed Audio)

Chat Scripts (Live Chat Interactions)

JSON Transcript Schemas

Voice Transcript Format

Chat Transcript Format

Configuration Steps

Step 1: Prepare Your S3 Environment

Step 2: Add the Connector

Step 3: Test the Connection

Step 4: Map Queues and Set a Schedule

Troubleshooting

Authentication Issues

Data Processing Issues

Performance