Docs
Open-source vs Pro SDK
Open-source vs Pro SDK
Feature Set
Open-source SDK | Pro SDK | Pro is | |
---|---|---|---|
WhisperKit Features | |||
File Transcription | ✅ | ✅ | 30% faster |
Language Detection | ✅ | ✅ | |
Word Timestamps | ✅ | ✅ | higher accuracy |
Custom Keywords | ✅ | ✅ | |
SRT & VTT Output Formats | ✅ | ✅ | |
Real-time Transcription | ⚠️ | ✅ | higher accuracy |
Fast Model Load | ✅ | ||
SpeakerKit Features | |||
Voice Activity Detection | ⚠️ | ✅ | higher accuracy |
Speaker Diarization | ✅ | ||
RTTM Output Format | ✅ | ||
Diarized Transcription | ✅ |
⚠️ Rough Feature Matches
Some Pro SDK features have rough counterparts in the Open-source SDK and they are marked with ⚠️.
- Voice Activity Detection feature in the Open-source SDK is implemented as a simple audio energy thresholding algorithm (called EnergyVAD). This implementation works well for separating silence from non-silence in an audio stream or file. However, it can not distinguish between voice and non-voice, e.g. microphone noise, music etc. On the other hand, the same feature in the Pro SDK is implemented as a high-accuracy deep learning model capable of separating voice from non-voice.
- Real-time Transcription feature is not included in the Open-source SDK. However, it needs to be implemented in the App code and we share an example implementation in WhisperKit/Examples/WhisperAX. On the other hand, the Pro SDK implements real-time transcription in the
WhisperKitPro
framework as a unified streaming API calledtranscribeWhileRecording
. This implementation has a more robust core algorithm that matches offline transcription accuracy and is battle-tested with Enterprise customers. The new algorithm also fixes several known error modes that the Open-source App example is susceptible to.
We are continuously improving both SDKs and we intend to fix the sharp corners of our Open-source SDK over time. However, our current focus is ensuring Argmax Pro SDK is best-in-class.