Changelog - Argmax Docs | Argmax - Foundation Models On Device

Docs

Changelog

Notable changes in Argmax Pro SDK

1.5.0

Added

Supports Parakeet v2 models

Fixed

Fixed a crash where telemetry would fail to get a lock when writing to storage

1.4.0

Changed

Adopt WhisperKit Open Source changes from v0.13.0

Added

Includes sortable discovered segments (via SegmentDiscoveryCallback) with VAD transcription
Includes voiceActivityAsync(in:) method for VoiceActivityDetector

1.3.3

Changed

Updated swift-transformers dependency to use .upToNextMinor versioning scheme
- This will allow importing higher versions of swift-transformers if other libs depend on it, while remaining on 0.1.8 by default via Package.resolved

1.3.2

Changed

SpeakerKitPro is now much faster, especially for large audio files. On average, diarization for 1h+ audio files is 8.9x faster.

1.3.1

Added

Added VoiceActivityDetector.modelVAD() for high-quality voice activity detection using CoreML

Can be used standalone for VAD:

let vad = try await VoiceActivityDetector.modelVAD()
let voiceActivity = vad.voiceActivity(in: audioArray)
for (index, isVoice) in voiceActivity.enumerated() {
    let seconds = vad.voiceActivityIndexToSeconds(index)
    print("\(seconds) seconds: \(isVoice ? "Voice" : "Silence")")
}

Or integrated with WhisperKitPro for transcription:

let vad = try await VoiceActivityDetector.modelVAD()
let config = WhisperKitProConfig(
    // ... other config options ...
    voiceActivityDetector: vad
)
let whisperKitPro = try await WhisperKitPro(config)
 
// Use VAD for chunking
let options = DecodingOptions(
    // ... other options ...
    chunkingStrategy: .vad
)
let result = try await whisperKitPro.transcribe(audioArray: audioArray, decodeOptions: options)

1.2.0

Changed

Updated to import latest version of WhisperKit v0.12.0
SpeakerKitPro is now faster and more accurate, especially for large audio files

Fixed

Fixed a rare race condition when adding events to the telemetry queue

1.1.0

Added

SpeakerKitPro now supports:

Lower diarization error rate with more audio context, but overall improved performance with new pyannote-v3-pro models
Deprecated SpeakerKitPro.clusterSpeakers(targetClusters:) in favor of SpeakerKitPro.diarize(options:)
- New DiarizationOptions struct provides additional configuration options for number of speakers, minimum active offset, and more in the future

Changed

Default models for SpeakerKitPro changed to higher performing pyannote-v3-pro

20250219.1.1

Added

Added this CHANGELOG and associated README
New ArgmaxSDK.licenseInfo() helper method that returns a LicenseInfo struct for better license information handling
- Provides typed access to license details
- Includes license id, status, expiration dates, and enabled features

Changed

License refresh request now includes additional body parameters to help track updates to the device and app between license creation requests.
- app_bundle_id
- device_sku
- os_version
- fwk_version
Deprecated ArgmaxSDK.getDiagnosticInfo() in favor of ArgmaxSDK.licenseInfo()
- Returns a [String: String] dictionary with the license info for backward compatibility
- Will be removed in a future release
The new TranscriptionResultPro struct for realtime transcription was missing the mergeTranscriptionResults helper method that has now been restored. However, this is marked as deprecated in favor of the static class method WhisperKitProUtils.mergeTranscriptionResults(_:confirmedWords:)

Fixed

Fast load was not occuring in some cases due to aggressive cache clearing, this has been fixed

20250219.0.4

Changed

Default to using wordtimestamps for DecodingOptionsPro (required for realtime transcription)

Fixed

Fixed diarization in SpeakerKit requiring a full reset between runs

20250219.0.3

Added

Added hypothesis text in TranscribeRecordingTask result callback. Eg:

let recordingTask = whisperKitPro.transcribeWhileRecording(
    options: options,
    audioCallback: {
        // Return the latest audio samples to be appended to the running buffer
        // Note: these should only include new audio samples
        // and exclude audio since the last call to this callback
        return AudioSamples(samples: /* Your audio samples here */, offset: .append)
        // or, use buffer
        try AudioSamples(buffer: /* Your AVAudioPCMBuffer here */, offset: .append)
 
        // Optionally, you can specify the offset
        .at( /* Your time offset here */) or .append
    },
    resultCallback: { result in
        // Handle each transcription result
        transcription += result.text
        print("Transcribed: \\(result.text)")
        print("Hypothesis for next result: \\(result.hypothesisText)")
        return true  // Continue transcribing
    }
)

FAQ & Troubleshooting

On This Page

1.5.0
- Added
- Fixed
1.4.0
- Changed
- Added
1.3.3
- Changed
1.3.2
- Changed
1.3.1
- Added
1.2.0
- Changed
- Fixed
1.1.0
- Added
- Changed
20250219.1.1
20250219.0.4
- Changed
- Fixed
20250219.0.3
- Added