Docs
File Transcription

File Transcription

Implementing file-based speech-to-text in your applications

File transcription processes complete audio files or pre-recorded audio buffers, unlike real-time transcription that processes audio in a streaming fashion.

Basic Example

Pro SDK

Argmax Pro SDK includes the WhisperKitPro framework that implements file transcription:

import Argmax
 
// Initialize Argmax SDK to enable Pro access
await ArgmaxSDK.with(ArgmaxConfig(apiKey: "ax_*****"))
 
let config = WhisperKitProConfig(model: "large-v3-v20240930_626MB")
let whisperKitPro = try await WhisperKitPro(config)
let results = try? await whisperKitPro.transcribe(audioPath: "path/to/audio.m4a")
let transcript = WhisperKitProUtils.mergeTranscriptionResults(results)

Open-source SDK

Argmax Open-source SDK includes the WhisperKit framework that implements file transcription:

import WhisperKit
 
let config = WhisperKitConfig(model: "large-v3-v20240930_626MB")
let whisperKit = try await WhisperKit(config)
let results = try? await whisperKit.transcribe(audioPath: "path/to/audio.m4a")
let transcript = TranscriptionUtilities.mergeTranscriptionResults(results)

With Speakers

Argmax Pro SDK offers utility functions to merge speaker diarization results from SpeakerKitPro and transcription results from WhisperKitPro or any other transcription engine output that conforms to the TranscriptionResultPro protocol.

Basic Example

import Argmax
 
// Initialize Argmax SDK to enable Pro access
await ArgmaxSDK.with(ArgmaxConfig(apiKey: "ax_*****"))
 
// Initialize WhisperKitPro
let config = WhisperKitProConfig(model: "large-v3-v20240930_626MB")
let whisperKitPro = try await WhisperKitPro(config)
 
// Initialize SpeakerKitPro
let config = SpeakerKitProConfig()
let speakerKit = try await SpeakerKitPro(config)
 
// Word timestamps are required for merging transcripts with diarization
let decodingOptions = DecodingOptions(wordTimestamps: true, chunkingStrategy: .vad)
 
// Produce transcription with word timestamps
let transcribeResult = try await whisperKitPro.transcribe(audioArray: audioArray, decodeOptions: decodingOptions)
let mergedResult = WhisperKitProUtils.mergeTranscriptionResults(transcribeResult)
 
// Produce speaker diarization
let diarizationOptions = DiarizationOptions()
let diarizationResult = try await speakerKitPro.diarize(options: diarizationOptions)
 
// Merge transcription and diarization, adding speakers to words
let updatedSegmentsArray = diarizationResult.addSpeakerInfo(to: [mergedResult], strategy: .subsegment)
 
for segments in updatedSegmentsArray {
    for segment in segments {
        print(segment)
    }
}