Docs
Introduction

Introduction



Argmax SDK is a collection of turn-key on-device inference frameworks:

  • WhisperKit Pro

    • File Transcription
    • Real-time Transcription
    • Language Detection
    • Word Timestamps
    • Custom Vocabulary
  • SpeakerKit Pro

    • Voice Activity Detection
    • Speaker Diarization
    • Real-time Diarization
  • WhisperKit Pro + SpeakerKit Pro

    • File Transcription with Speakers
    • Real-time Transcription with Speakers

Architecture

Argmax SDK follows an open-core architecture where the Pro SDK extends the Open-source SDK:

Please see Open-source vs Pro SDK for a detailed feature set comparison.

Getting Started

Native Apps

Argmax Pro SDK leverages platform-native language tools and languages for iOS, macOS and Android:

  • Swift SDK for iOS and macOS is delivered via Swift Package Manager
  • Kotlin SDK for Android is delivered via Maven

Please see Installation to get started.

Other Apps

Argmax Local Server is built using Argmax Pro SDK and currently offers Real-time Transcription.

Key features include:

  • Node and Python client packages
  • API-compatible with Deepgram
  • macOS only

Please see Using Local Server to get started.

Use Cases

Ambient AI for Healthcare

  • Real-time streaming transcription of doctor-patient conversations
  • Medically-tuned custom model support
  • Speaker diarization to attribute statements to doctor and patient
  • Example product built with Argmax SDK: ModMed Scribe

AI Meeting Notes

  • Real-time streaming transcription of work meetings
  • Custom vocabulary for accurate person and company names
  • Speaker diarization to attribute statements to each meeting attendees
  • Example product built with Argmax SDK: Macwhisper

Personal Dictation

  • Ultra low-latency dictation
  • Custom vocabulary for accurate person and company names
  • Example product built with Argmax SDK: superwhisper

Video content creation

  • Offline captioning (Word timestamps, SRT and VTT output formats)
  • Live captioning (Real-time transcription)
  • Silence removal (Voice Activity Detection)
  • Text-based video editing (Word timestamps)
  • Example product built with Argmax SDK: Detail

Why on-device?

Accuracy

On-device inference does not imply usage of smaller & less accurate models. Argmax builds systems that match or exceed cloud-based API-level accuracy with top-tier models. See Model Gallery for ready-made models.

For the ever-shrinking fraction of users with even older devices, Argmax offers hybrid deployment to fall back to the server-side and retain a user experience with uniform accuracy.

Upholding accuracy is our top priority (even more so than speed). We continuously benchmark our products on industry-standard test sets:

Low Latency

Applications built with real-time inference enjoy lower latency when deployed on device instead of the cloud because on-device is:

  • Optimized for minimum latency for a single user instead of maximum throughput (at the cost of higher latency) for many concurrent users
  • Decoupled from global inference traffic jams which occasionaly lead cloud services to be unavailable or unexpectedly slow
  • Not subject to internet roundtrip latency

Everything Else

ConcernOn-device (with Argmax)Cloud-based
Availability100% by definition< 100% Uptime
Scalability (Usage)UnlimitedRate-limited & concurrency-limited
Scalability (Cost)FixedUnlimited (Usage-based)
TransparencyOpen-core, transparent versioningProprietary, silent versioning
Data PrivacyProcesed locallyUpload required