Beta
Docs
Changelog

Changelog

Notable changes in Argmax Pro SDK

1.5.0

Added

  • Supports Parakeet v2 models

Fixed

  • Fixed a crash where telemetry would fail to get a lock when writing to storage

1.4.0

Changed

  • Adopt WhisperKit Open Source changes from v0.13.0

Added

  • Includes sortable discovered segments (via SegmentDiscoveryCallback) with VAD transcription
  • Includes voiceActivityAsync(in:) method for VoiceActivityDetector

1.3.3

Changed

  • Updated swift-transformers dependency to use .upToNextMinor versioning scheme
    • This will allow importing higher versions of swift-transformers if other libs depend on it, while remaining on 0.1.8 by default via Package.resolved

1.3.2

Changed

  • SpeakerKitPro is now much faster, especially for large audio files. On average, diarization for 1h+ audio files is 8.9x faster.

1.3.1

Added

  • Added VoiceActivityDetector.modelVAD() for high-quality voice activity detection using CoreML
    • Can be used standalone for VAD:
      let vad = try await VoiceActivityDetector.modelVAD()
      let voiceActivity = vad.voiceActivity(in: audioArray)
      for (index, isVoice) in voiceActivity.enumerated() {
          let seconds = vad.voiceActivityIndexToSeconds(index)
          print("\(seconds) seconds: \(isVoice ? "Voice" : "Silence")")
      }
    • Or integrated with WhisperKitPro for transcription:
      let vad = try await VoiceActivityDetector.modelVAD()
      let config = WhisperKitProConfig(
          // ... other config options ...
          voiceActivityDetector: vad
      )
      let whisperKitPro = try await WhisperKitPro(config)
       
      // Use VAD for chunking
      let options = DecodingOptions(
          // ... other options ...
          chunkingStrategy: .vad
      )
      let result = try await whisperKitPro.transcribe(audioArray: audioArray, decodeOptions: options)

1.2.0

Changed

  • Updated to import latest version of WhisperKit v0.12.0
  • SpeakerKitPro is now faster and more accurate, especially for large audio files

Fixed

  • Fixed a rare race condition when adding events to the telemetry queue

1.1.0

Added

SpeakerKitPro now supports:

  • Lower diarization error rate with more audio context, but overall improved performance with new pyannote-v3-pro models
  • Deprecated SpeakerKitPro.clusterSpeakers(targetClusters:) in favor of SpeakerKitPro.diarize(options:)
    • New DiarizationOptions struct provides additional configuration options for number of speakers, minimum active offset, and more in the future

Changed

  • Default models for SpeakerKitPro changed to higher performing pyannote-v3-pro

20250219.1.1

Added

  • Added this CHANGELOG and associated README
  • New ArgmaxSDK.licenseInfo() helper method that returns a LicenseInfo struct for better license information handling
    • Provides typed access to license details
    • Includes license id, status, expiration dates, and enabled features

Changed

  • License refresh request now includes additional body parameters to help track updates to the device and app between license creation requests.
    • app_bundle_id
    • device_sku
    • os_version
    • fwk_version
  • Deprecated ArgmaxSDK.getDiagnosticInfo() in favor of ArgmaxSDK.licenseInfo()
    • Returns a [String: String] dictionary with the license info for backward compatibility
    • Will be removed in a future release
  • The new TranscriptionResultPro struct for realtime transcription was missing the mergeTranscriptionResults helper method that has now been restored. However, this is marked as deprecated in favor of the static class method WhisperKitProUtils.mergeTranscriptionResults(_:confirmedWords:)

Fixed

  • Fast load was not occuring in some cases due to aggressive cache clearing, this has been fixed

20250219.0.4

Changed

  • Default to using wordtimestamps for DecodingOptionsPro (required for realtime transcription)

Fixed

  • Fixed diarization in SpeakerKit requiring a full reset between runs

20250219.0.3

Added

  • Added hypothesis text in TranscribeRecordingTask result callback. Eg:

    let recordingTask = whisperKitPro.transcribeWhileRecording(
        options: options,
        audioCallback: {
            // Return the latest audio samples to be appended to the running buffer
            // Note: these should only include new audio samples
            // and exclude audio since the last call to this callback
            return AudioSamples(samples: /* Your audio samples here */, offset: .append)
            // or, use buffer
            try AudioSamples(buffer: /* Your AVAudioPCMBuffer here */, offset: .append)
     
            // Optionally, you can specify the offset
            .at( /* Your time offset here */) or .append
        },
        resultCallback: { result in
            // Handle each transcription result
            transcription += result.text
            print("Transcribed: \\(result.text)")
            print("Hypothesis for next result: \\(result.hypothesisText)")
            return true  // Continue transcribing
        }
    )