Models

Browse ready-made models for Argmax Pro SDK.

Please refer to OpenBench for open-source and reproducible latency and accuracy competitive benchmarks

Max RAM Usage (MB)

0 – 2500 MB

openai/whisper-large-v3-turbo

The most recent iteration of Whisper

626 MB

Pre-recorded Transcription

Real-time Transcription

Language Detection

multilingual

+92 more

nvidia/parakeet-v2

A frontier model that surpasses OpenAI Whisper Large V3 Turbo on English speech-to-text accuracy while being ~9x faster

476 MB

Real-time Transcription

Pre-recorded Transcription

nvidia/sortformer-v2-1

A frontier real-time speaker diarization model that surpasses top cloud APIs on accuracy

94 MB

Real-time Speaker Diarization

Pre-recorded Speaker Diarization

language-agnostic

nvidia/parakeet-v3

The most recent iteration of Parakeet. Same size and speed as Parakeet V2, but supports 24 more languages.

494 MB

Real-time Transcription

Pre-recorded Transcription

+17 more

pyannote/precision

A frontier model for speaker diarization ("who spoke when") with state-of-the-art accuracy (DER) on 13 datasets on OpenBench.

90 MB

Pre-recorded Speaker Diarization

language-agnostic

pyannote/community

An open-source model for speaker diarization ("who spoke when") with second-best accuracy (DER) on 13 datasets on OpenBench.

15 MB

Pre-recorded Speaker Diarization

language-agnostic

qwen/qwen3-tts-0.6b

A multilingual text-to-speech model with voice cloning and low-latency streaming support

1054 MB

Text-to-Speech

+2 more

qwen/qwen3-tts-1.7b

A frontier multilingual text-to-speech model with voice cloning, voice design from text instructions

2217 MB

Text-to-Speech

+2 more