Docs
Open-source vs Pro SDK

Open-source vs Pro SDK

Feature Set

Open-source SDKPro SDKPro is
WhisperKit Features
File Transcription30% faster
Language Detection
Word Timestampshigher accuracy
Custom Keywords
SRT & VTT Output Formats
Real-time Transcription⚠️higher accuracy
Fast Model Load
SpeakerKit Features
Voice Activity Detection⚠️higher accuracy
Speaker Diarization
RTTM Output Format
Diarized Transcription

⚠️ Rough Feature Matches

Some Pro SDK features have rough counterparts in the Open-source SDK and they are marked with ⚠️.

  • Voice Activity Detection feature in the Open-source SDK is implemented as a simple audio energy thresholding algorithm (called EnergyVAD). This implementation works well for separating silence from non-silence in an audio stream or file. However, it can not distinguish between voice and non-voice, e.g. microphone noise, music etc. On the other hand, the same feature in the Pro SDK is implemented as a high-accuracy deep learning model capable of separating voice from non-voice.
  • Real-time Transcription feature is not included in the Open-source SDK. However, it needs to be implemented in the App code and we share an example implementation in WhisperKit/Examples/WhisperAX. On the other hand, the Pro SDK implements real-time transcription in the WhisperKitPro framework as a unified streaming API called transcribeWhileRecording. This implementation has a more robust core algorithm that matches offline transcription accuracy and is battle-tested with Enterprise customers. The new algorithm also fixes several known error modes that the Open-source App example is susceptible to.

We are continuously improving both SDKs and we intend to fix the sharp corners of our Open-source SDK over time. However, our current focus is ensuring Argmax Pro SDK is best-in-class.