Open-source vs Pro SDK - Argmax Docs | Argmax

Open-source vs Pro SDK

Feature Set

	Open-source SDK	Pro SDK	Pro is
WhisperKit Features
File Transcription	✅	✅	~9x faster
Language Detection	✅	✅
Word Timestamps	✅	✅
Custom Vocabulary	✅	✅	10x more keywords allowed
Real-time Transcription	⚠️	✅	~9x faster
Fast Model Load		✅
SpeakerKit Features
Voice Activity Detection	⚠️	✅
Speaker Diarization		✅
Diarized Transcription		✅

Rough Feature Matches

Some Pro SDK features have rough counterparts in the Open-source SDK and they are marked with ⚠️.

Voice Activity Detection feature in the Open-source SDK is implemented as a simple audio energy thresholding algorithm (called EnergyVAD). This implementation works well for separating silence from non-silence in an audio stream or file. However, it can not distinguish between voice and non-voice, e.g. microphone noise, music etc. On the other hand, the same feature in the Pro SDK is implemented as a high-accuracy deep learning model capable of separating voice from non-voice.
Real-time Transcription feature is not included in the Open-source SDK. However, it needs to be implemented in the App code and we share an example implementation in WhisperKit/Examples/WhisperAX. On the other hand, the Pro SDK implements real-time transcription in the WhisperKitPro framework as a unified streaming API called transcribeWhileRecording. This implementation has a more robust core algorithm that matches offline transcription accuracy and is battle-tested with Enterprise customers. The new algorithm also fixes several known error modes that the Open-source App example is susceptible to.

We are continuously improving both SDKs and we intend to fix the sharp corners of our Open-source SDK over time. However, our current focus is ensuring Argmax Pro SDK is best-in-class.

Model Gallery Roadmap

On This Page

Feature Set
Rough Feature Matches