Docs
Changelog
Changelog
Notable changes in Argmax Pro SDK
1.5.0
Added
- Supports Parakeet v2 models
Fixed
- Fixed a crash where telemetry would fail to get a lock when writing to storage
1.4.0
Changed
- Adopt WhisperKit Open Source changes from v0.13.0
Added
- Includes sortable discovered segments (via
SegmentDiscoveryCallback
) with VAD transcription - Includes
voiceActivityAsync(in:)
method for VoiceActivityDetector
1.3.3
Changed
- Updated swift-transformers dependency to use .upToNextMinor versioning scheme
- This will allow importing higher versions of swift-transformers if other libs depend on it, while remaining on 0.1.8 by default via Package.resolved
1.3.2
Changed
- SpeakerKitPro is now much faster, especially for large audio files. On average, diarization for 1h+ audio files is 8.9x faster.
1.3.1
Added
- Added
VoiceActivityDetector.modelVAD()
for high-quality voice activity detection using CoreML- Can be used standalone for VAD:
let vad = try await VoiceActivityDetector.modelVAD() let voiceActivity = vad.voiceActivity(in: audioArray) for (index, isVoice) in voiceActivity.enumerated() { let seconds = vad.voiceActivityIndexToSeconds(index) print("\(seconds) seconds: \(isVoice ? "Voice" : "Silence")") }
- Or integrated with WhisperKitPro for transcription:
let vad = try await VoiceActivityDetector.modelVAD() let config = WhisperKitProConfig( // ... other config options ... voiceActivityDetector: vad ) let whisperKitPro = try await WhisperKitPro(config) // Use VAD for chunking let options = DecodingOptions( // ... other options ... chunkingStrategy: .vad ) let result = try await whisperKitPro.transcribe(audioArray: audioArray, decodeOptions: options)
- Can be used standalone for VAD:
1.2.0
Changed
- Updated to import latest version of WhisperKit v0.12.0
- SpeakerKitPro is now faster and more accurate, especially for large audio files
Fixed
- Fixed a rare race condition when adding events to the telemetry queue
1.1.0
Added
SpeakerKitPro now supports:
- Lower diarization error rate with more audio context, but overall improved performance with new
pyannote-v3-pro
models - Deprecated
SpeakerKitPro.clusterSpeakers(targetClusters:)
in favor ofSpeakerKitPro.diarize(options:)
- New
DiarizationOptions
struct provides additional configuration options for number of speakers, minimum active offset, and more in the future
- New
Changed
- Default models for SpeakerKitPro changed to higher performing
pyannote-v3-pro
20250219.1.1
Added
- Added this CHANGELOG and associated README
- New
ArgmaxSDK.licenseInfo()
helper method that returns aLicenseInfo
struct for better license information handling- Provides typed access to license details
- Includes license id, status, expiration dates, and enabled features
Changed
- License refresh request now includes additional body parameters to help track updates to the device and app between license creation requests.
app_bundle_id
device_sku
os_version
fwk_version
- Deprecated
ArgmaxSDK.getDiagnosticInfo()
in favor ofArgmaxSDK.licenseInfo()
- Returns a
[String: String]
dictionary with the license info for backward compatibility - Will be removed in a future release
- Returns a
- The new
TranscriptionResultPro
struct for realtime transcription was missing themergeTranscriptionResults
helper method that has now been restored. However, this is marked as deprecated in favor of the static class methodWhisperKitProUtils.mergeTranscriptionResults(_:confirmedWords:)
Fixed
- Fast load was not occuring in some cases due to aggressive cache clearing, this has been fixed
20250219.0.4
Changed
- Default to using wordtimestamps for DecodingOptionsPro (required for realtime transcription)
Fixed
- Fixed diarization in SpeakerKit requiring a full reset between runs
20250219.0.3
Added
-
Added hypothesis text in TranscribeRecordingTask result callback. Eg:
let recordingTask = whisperKitPro.transcribeWhileRecording( options: options, audioCallback: { // Return the latest audio samples to be appended to the running buffer // Note: these should only include new audio samples // and exclude audio since the last call to this callback return AudioSamples(samples: /* Your audio samples here */, offset: .append) // or, use buffer try AudioSamples(buffer: /* Your AVAudioPCMBuffer here */, offset: .append) // Optionally, you can specify the offset .at( /* Your time offset here */) or .append }, resultCallback: { result in // Handle each transcription result transcription += result.text print("Transcribed: \\(result.text)") print("Hypothesis for next result: \\(result.hypothesisText)") return true // Continue transcribing } )