Models
Browse ready-made models in Argmax SDK.
Performance claims in this page are reproducible with OpenBench.
OP
openai/whisper
A family of models for high-accuracy speech-to-text and language detection. Model sizes range from 0.06 GB to 3.1GB. Multilingual variants support 99 languages. English-only variants are suffixed with .en
Speech-to-text
Language Detection
Multilingual
Open Source SDK
Pro SDK
tiny
tiny.en
base
base.en
small
small.en
large-v2
large-v3
large-v3-turbo
NV
nvidia/parakeet
A frontier model that surpasses OpenAI Whisper Large V3 Turbo on English speech-to-text accuracy while being ~9x faster.
Speech-to-text
English Only
Pro SDK
tdt-0.6b-v2
PY
pyannote/commercial
A frontier model for speaker diarization ("who spoke when") with state-of-the-art accuracy (DER) on 13 datasets on OpenBench.
Speaker Diarization
Pro SDK
Multilingual
flagship-0.1b
PY
pyannote/open-source
An open-source model for speaker diarization ("who spoke when") with second-best accuracy (DER) on 13 datasets on OpenBench.
Speaker Diarization
Open Source SDK
Pro SDK
Multilingual
v3.1