Docs
Managing Model Files

Managing Model Files

Picking, downloading, and compiling models in your application

Picking the Model

Model Gallery lists all models supported by Argmax SDK. The following sections summarize considerations when picking the best speech-to-text model for your application.

Nvidia Parakeet v2

This model is 9x faster than Whisper Large v3 Turbo on English speech-to-text and achieves slightly higher accuracy.

We recommend using this model for all applications that are English-only.

let config = WhisperKitProConfig(
    model: "parakeet-v2_476MB",
    modelRepo: .parakeetRepo
)

Nvidia Parakeet v3

This model achieves the same speed as Nvidia Parakeet v2 but supports 25 European languages with dynamic switching: en, de, es, fr, nl, it, da, et, fi, el, hu, lv, lt, mt, pl, pt, ro, sk, sl, sv, ru, uk, bg, hr, cs.

We recommend using this model for all applications that require at least one of the 25 languages from above other than the English-only scenario.

let config = WhisperKitProConfig(
    model: "parakeet-v3_494MB",
    modelRepo: .parakeetRepo
)

OpenAI Whisper Large v3 Turbo

The original models for WhisperKit (now called Argmax OSS) are hosted under the .openSourceRepo repository.

Argmax Pro SDK hosts a second set of Whisper models under the .proRepo repository that are further optimized for speed and energy-efficiency compared to their .openSourceRepo counterparts. During this upgrade, accuracy remains identical while speed and energy-efficiency improve significantly.

Usage:

let config = WhisperKitProConfig(
    model: "large-v3-v20240930_626MB",
    modelRepo: .proRepo // or .openSourceRepo
)

Downloading the Model

Initialize Argmax SDK

Argmax SDK requires initialization with an Argmax API key (starts with ax_***, not axst_***) to unlock Pro models and features.

We recommend fetching your API key securely from your backend in production. However, we provide a simple obfuscator to protect against casual inspection and static analysis tools.

Assuming you use ObfuscatedKeyProvider.generateCodeExample to obfuscate your API key, you may initialize Argmax SDK as follows:

var keyProvider = ObfuscatedKeyProvider(mask: 37)  // placeholder values
keyProvider.apiKeyObfuscated = [4, 5, 6]  // placeholder values
 
guard let apiKey = keyProvider.apiKey else {
    fatalError("Missing API key")
}
 
await ArgmaxSDK.with(ArgmaxConfig(apiKey: apiKey))

Note that ArgmaxSDK.with(ArgmaxConfig(apiKey: apiKey)) requires an internet connection during first use and at least once every 30 days to maintain an active license. See this documentation page to learn more.

Initiate Download

ModelStore implements a robust model downloader. On iOS and iPadOS, this model downloader persists progress across foreground-to-background and background-to-foreground app transitions. It evens persists the download progress after the app is killed.

Forward background URLSession events from your UIApplicationDelegate so the SDK can drain them before iOS re-suspends the app:

func application(
    _ application: UIApplication,
    handleEventsForBackgroundURLSession identifier: String,
    completionHandler: @escaping () -> Void
) {
    modelStore.backgroundDownloader.handleEventsForBackgroundSession(
        identifier: identifier,
        completionHandler: completionHandler
    )
}

For SwiftUI apps without an AppDelegate, wire one in with @UIApplicationDelegateAdaptor.

import Network  // for NWInterface.InterfaceType
 
let modelStore = ModelStore(config: config)  // `config` from "Picking the Model"
 
do {
    let result = try await modelStore.downloadModelInBackground(
        name: "large-v3-v20240930_626MB",
        repo: RepoType.proRepo,
        disabledNetworkTypes: [.cellular]   // optional: Wi-Fi only
    )
 
    switch result {
    case .started(let downloadId):
        print("Download started: \(downloadId)")
    case .resumed(let downloadId):
        print("Resuming existing download: \(downloadId)")
    case .alreadyInProgress(let downloadId):
        print("Download already running: \(downloadId)")
    case .waitingForNetwork(let downloadId):
        print("Queued, will resume when Wi-Fi is available: \(downloadId)")
    case .alreadyComplete(let modelPath):
        print("Model already downloaded at: \(modelPath)")
    }
} catch {
    print("Failed to start download: \(error)")
}

Query State

modelStore.getBackgroundDownloadState returns the following state:

public struct BackgroundDownloadState {
    let downloadId: String
    let modelVariant: String
    let repoId: String
    var files: [BackgroundFileDownload]
    var status: BackgroundDownloadStatus  // .pending, .downloading, .paused, .completed, .failed
    let startedAt: Date
    var completedAt: Date?
    var overallProgress: Double  // 0.0 to 1.0
}

Here is a simple way to query download state:

// Get all active downloads
let downloads = modelStore.activeBackgroundDownloads
 
// Get state for first active download
if let download = downloads.first {
    print("Progress: \(download.overallProgress)")
    print("Files completed: \(download.completedFileCount)/\(download.totalFileCount)")
}

For reactive updates, AsyncStream is the preferred API for new code. Each subscriber receives the current snapshot immediately, then every subsequent change:

Task {
    for await downloads in modelStore.backgroundDownloader.activeDownloadsUpdates {
        for download in downloads {
            let pct = Int(download.overallProgress * 100)
            print("\(download.modelVariant): \(pct)%  status=\(download.status.rawValue)")
        }
    }
}

A Combine projection (modelStore.backgroundDownloadsPublisher) is also available for back-compat:

import Combine
 
var cancellables = Set<AnyCancellable>()
 
modelStore.backgroundDownloadsPublisher
    .receive(on: DispatchQueue.main)
    .sink { downloads in
        for download in downloads {
            print("\(download.modelVariant): \(Int(download.overallProgress * 100))%")
            print("Status: \(download.status)")
 
            // Individual file progress
            for file in download.files {
                print("  \(file.destinationURL.lastPathComponent): \(file.status)")
            }
        }
    }
    .store(in: &cancellables)

Pause and Resume

modelStore.pauseBackgroundDownload(downloadId)
do {
    try await modelStore.resumeBackgroundDownload(downloadId)
} catch {
    print("Failed to resume: \(error)")
}

Cancel

modelStore.cancelBackgroundDownload(downloadId, deleteProgress: false)

Network Interface Restrictions

disabledNetworkTypes accepts a list of NWInterface.InterfaceType values that the download is not allowed to use. The SDK auto-pauses when the active path uses a disabled interface and auto-resumes when an allowed one returns. The restriction is persisted with the download record and survives app relaunches.

Change a restriction at any time without disturbing the in-flight transfer:

// Lift the restriction (e.g. user tapped "Allow on cellular"):
modelStore.setDisabledBackgroundDownloadNetworkTypes(nil, for: downloadId)
 
// Add a restriction mid-flight:
modelStore.setDisabledBackgroundDownloadNetworkTypes([.cellular], for: downloadId)

Crash Recovery

Downloads persisted as in-flight but with no live URLSession tasks on relaunch (the signature of an abnormal exit such as a crash, force-quit, or device reboot) auto-resume after ModelStore initializes. No manual user action is required.

Downloads explicitly paused by the user stay paused across relaunches; only crash-paused downloads auto-resume.

Loading the Model

After downloading, call loadModels() to load the MelSpectrogram, AudioEncoder, and TextDecoder models from the model folder into memory. The first load after a download may take 15–90 seconds because CoreML compiles the models on-device. Subsequent loads are near-instant thanks to the OS-level compiled model cache.

let whisperKitPro = try await WhisperKitPro(config) // same config used during "Downloading the Model"
 
// loadModels() is called automatically during WhisperKitPro initialization.
// You can also call it explicitly if you initialized with `load: false`:
try await whisperKitPro.loadModels()

loadModels() supports a prewarmMode parameter. When prewarmMode: true, models are loaded but not fully initialized, allowing you to defer the final initialization to a later point.

Bring Your Own Model

argmax-sdk-swift-2.1.2 and newer supports downloading model files directly from a custom HTTPS URL such as S3 presigned URLs, GCP signed URLs, Azure SAS URLs or your own CDN.

The model folder must be archived as a Apple Archive (.aar). The model folder should preserve the directory structure of the corresponding Argmax ready-made Hugging Face-hosted model.

For example, the .aar file hosted at https://models.example.com/whisperkit-coreml/openai_whisper-tiny.aar should match this directory structure:

let archiveURL = URL(string:
    "https://models.example.com/whisperkit-coreml/openai_whisper-tiny.aar?X-Amz-Signature=..."
)!
 
// One-step: download + extract + register with the model cache.
let modelFolder = try await modelStore.downloadAndExtractInBackground(
    remoteURL: archiveURL,
    destinationRoot: URL.documentsDirectory.appendingPathComponent("Models"),
    disabledNetworkTypes: [.cellular]  // optional: Wi-Fi only
)
 
let config = WhisperKitProConfig(modelFolder: modelFolder.path)
let whisperKit = try await WhisperKitPro(config)