Docs
Diarized Transcription
Diarized Transcription
Assign a speaker to each word
Argmax Pro SDK offers utility functions to merge speaker diarization results from SpeakerKitPro and transcription results from WhisperKitPro or any other transcription engine output that conforms to the TranscriptionResultPro protocol.
Please review Speaker Diarization and File Transcription first. Then, you may combine the results as follows:
// Word timestamps are required for transcripts
let decodingOptions = DecodingOptions(wordTimestamps: true, chunkingStrategy: .vad)
// Produce transcription with word timestamps
let transcribeResult = try await whisperKitPro.transcribe(audioArray: audioArray, decodeOptions: decodingOptions)
let mergedResult = WhisperKitProUtils.mergeTranscriptionResults(transcribeResult)
// Produce speaker diarization
let diarizationOptions = DiarizationOptions(useExclusiveReconciliation: true)
let diarizationResult = try await speakerKitPro.diarize(options: diarizationOptions)
// Merge transcription and diarization to add speakers to words
let updatedSegmentsArray = diarizationResult.addSpeakerInfo(to: [mergedResult])
for segments in updatedSegmentsArray {
for segment in segments {
print(segment)
}
}Ⓘ
useExclusiveReconciliation is an option to reduce the errors when merging transcripts and diarization results. For more information, please see this blog post by pyannoteAI.