Docs
File Transcription
File Transcription
Implementing file-based speech-to-text in your applications
File transcription processes complete audio files or pre-recorded audio buffers, unlike real-time transcription that processes audio in a streaming fashion.
Basic Example
Pro SDK
Argmax Pro SDK includes the WhisperKitPro framework that implements file transcription:
import Argmax
// Initialize Argmax SDK to enable Pro access
await ArgmaxSDK.with(ArgmaxConfig(apiKey: "ax_*****"))
let config = WhisperKitProConfig(model: "large-v3-v20240930_626MB")
let whisperKitPro = try await WhisperKitPro(config)
let results = try? await whisperKitPro.transcribe(audioPath: "path/to/audio.m4a")
let transcript = WhisperKitProUtils.mergeTranscriptionResults(results)Open-source SDK
Argmax Open-source SDK includes the WhisperKit framework that implements file transcription:
import WhisperKit
let config = WhisperKitConfig(model: "large-v3-v20240930_626MB")
let whisperKit = try await WhisperKit(config)
let results = try? await whisperKit.transcribe(audioPath: "path/to/audio.m4a")
let transcript = TranscriptionUtilities.mergeTranscriptionResults(results)With Speakers
Argmax Pro SDK offers utility functions to merge speaker diarization results from SpeakerKitPro and transcription results from WhisperKitPro or any other transcription engine output that conforms to the TranscriptionResultPro protocol.
Basic Example
import Argmax
// Initialize Argmax SDK to enable Pro access
await ArgmaxSDK.with(ArgmaxConfig(apiKey: "ax_*****"))
// Initialize WhisperKitPro
let config = WhisperKitProConfig(model: "large-v3-v20240930_626MB")
let whisperKitPro = try await WhisperKitPro(config)
// Initialize SpeakerKitPro
let config = SpeakerKitProConfig()
let speakerKit = try await SpeakerKitPro(config)
// Word timestamps are required for merging transcripts with diarization
let decodingOptions = DecodingOptions(wordTimestamps: true, chunkingStrategy: .vad)
// Produce transcription with word timestamps
let transcribeResult = try await whisperKitPro.transcribe(audioArray: audioArray, decodeOptions: decodingOptions)
let mergedResult = WhisperKitProUtils.mergeTranscriptionResults(transcribeResult)
// Produce speaker diarization
let diarizationOptions = DiarizationOptions()
let diarizationResult = try await speakerKitPro.diarize(options: diarizationOptions)
// Merge transcription and diarization, adding speakers to words
let updatedSegmentsArray = diarizationResult.addSpeakerInfo(to: [mergedResult], strategy: .subsegment)
for segments in updatedSegmentsArray {
for segment in segments {
print(segment)
}
}