Models

Browse ready-made models in Argmax SDK.

Performance claims in this page are reproducible with OpenBench.

OP

openai/whisper

A family of models for high-accuracy speech-to-text and language detection. Model sizes range from 0.06 GB to 3.1GB. Multilingual variants support 99 languages. English-only variants are suffixed with .en

Speech-to-text
Language Detection
Multilingual
Open Source SDK
Pro SDK
tiny
tiny.en
base
base.en
small
small.en
large-v2
large-v3
large-v3-turbo
NV

nvidia/parakeet

A frontier model that surpasses OpenAI Whisper Large V3 Turbo on English speech-to-text accuracy while being ~9x faster.

Speech-to-text
English Only
Pro SDK
tdt-0.6b-v2
PY

pyannote/commercial

A frontier model for speaker diarization ("who spoke when") with state-of-the-art accuracy (DER) on 13 datasets on OpenBench.

Speaker Diarization
Pro SDK
Multilingual
flagship-0.1b
PY

pyannote/open-source

An open-source model for speaker diarization ("who spoke when") with second-best accuracy (DER) on 13 datasets on OpenBench.

Speaker Diarization
Open Source SDK
Pro SDK
Multilingual
v3.1