Diarized Transcription - Argmax Docs | Argmax

Diarized Transcription

Assign a speaker to each word

Argmax Pro SDK offers utility functions to merge speaker diarization results from SpeakerKitPro and transcription results from WhisperKitPro or any other transcription engine output that conforms to the TranscriptionResultPro protocol.

Please review Speaker Diarization and File Transcription first. Then, you may combine the results as follows:

// Word timestamps are required for transcripts
let decodingOptions = DecodingOptions(wordTimestamps: true, chunkingStrategy: .vad)
 
// Produce transcription with word timestamps
let transcribeResult = try await whisperKitPro.transcribe(audioArray: audioArray, decodeOptions: decodingOptions)
let mergedResult = WhisperKitProUtils.mergeTranscriptionResults(transcribeResult)
 
// Produce speaker diarization
let diarizationResult = try await speakerKitPro.diarize()
 
// Merge transcription and diarization to add speakers to words
let updatedSegmentsArray = diarizationResult.addSpeakerInfo(to: [mergedResult])
 
for segments in updatedSegmentsArray {
    for segment in segments {
        print(segment)
    }
}

Ⓘ

Merging results may lead to new errors. Diarization and transcription results are produced by two independent systems. When combining the results, independent errors in each may combine in unexpected ways. This page demonstrates a naive method for merging these results. We will share documentation for an advanced technique to minimize the introduction of new errors during this merging process.

Speaker Diarization FAQ & Troubleshooting