| name | speech |
| description | Use when implementing speech-to-text, live transcription, or audio transcription. Covers SpeechAnalyzer (iOS 26+), SpeechTranscriber, volatile/finalized results, AssetInventory model management, audio format handling. |
| version | 1.0.0 |
Speech-to-Text with SpeechAnalyzer
Overview
SpeechAnalyzer is Apple's new speech-to-text API introduced in iOS 26. It powers Notes, Voice Memos, Journal, and Call Summarization. The on-device model is faster, more accurate, and better for long-form/distant audio than SFSpeechRecognizer.
Key principle: SpeechAnalyzer is modular—add transcription modules to an analysis session. Results stream asynchronously using Swift's AsyncSequence.
Decision Tree - SpeechAnalyzer vs SFSpeechRecognizer
Need speech-to-text?
├─ iOS 26+ only?
│ └─ Yes → SpeechAnalyzer (preferred)
├─ Need iOS 10-25 support?
│ └─ Yes → SFSpeechRecognizer (or DictationTranscriber)
├─ Long-form audio (meetings, lectures)?
│ └─ Yes → SpeechAnalyzer
├─ Distant audio (across room)?
│ └─ Yes → SpeechAnalyzer
└─ Short dictation commands?
└─ Either works
SpeechAnalyzer advantages:
- Better for long-form and conversational audio
- Works well with distant speakers (meetings)
- On-device, private
- Model managed by system (no app size increase)
- Powers Notes, Voice Memos, Journal
DictationTranscriber (iOS 26+): Same languages as SFSpeechRecognizer, but doesn't require user to enable Siri/dictation in Settings.
Red Flags
Use this skill when you see:
- "Live transcription"
- "Transcribe audio"
- "Speech-to-text"
- "SpeechAnalyzer" or "SpeechTranscriber"
- "Volatile results"
- Building Notes-like or Voice Memos-like features
Pattern 1 - File Transcription (Simplest)
Transcribe an audio file to text in one function.
import Speech
func transcribe(file: URL, locale: Locale) async throws -> AttributedString {
// Set up transcriber
let transcriber = SpeechTranscriber(
locale: locale,
preset: .offlineTranscription
)
// Collect results asynchronously
async let transcriptionFuture = try transcriber.results
.reduce(AttributedString()) { str, result in
str + result.text
}
// Set up analyzer with transcriber module
let analyzer = SpeechAnalyzer(modules: [transcriber])
// Analyze the file
if let lastSample = try await analyzer.analyzeSequence(from: file) {
try await analyzer.finalizeAndFinish(through: lastSample)
} else {
await analyzer.cancelAndFinishNow()
}
return try await transcriptionFuture
}
Key points:
analyzeSequence(from:)reads file and feeds audio to analyzerfinalizeAndFinish(through:)ensures all results are finalized- Results are
AttributedStringwith timing metadata
Pattern 2 - Live Transcription Setup
For real-time transcription from microphone.
Step 1 - Configure SpeechTranscriber
import Speech
class TranscriptionManager: ObservableObject {
private var transcriber: SpeechTranscriber?
private var analyzer: SpeechAnalyzer?
private var analyzerFormat: AudioFormatDescription?
private var inputBuilder: AsyncStream<AnalyzerInput>.Continuation?
@Published var finalizedTranscript = AttributedString()
@Published var volatileTranscript = AttributedString()
func setUp() async throws {
// Create transcriber with options
transcriber = SpeechTranscriber(
locale: Locale.current,
transcriptionOptions: [],
reportingOptions: [.volatileResults], // Enable real-time updates
attributeOptions: [.audioTimeRange] // Include timing
)
guard let transcriber else { throw TranscriptionError.setupFailed }
// Create analyzer with transcriber module
analyzer = SpeechAnalyzer(modules: [transcriber])
// Get required audio format
analyzerFormat = await SpeechAnalyzer.bestAvailableAudioFormat(
compatibleWith: [transcriber]
)
// Ensure model is available
try await ensureModel(for: transcriber)
// Create input stream
let (stream, builder) = AsyncStream<AnalyzerInput>.makeStream()
inputBuilder = builder
// Start analyzer
try await analyzer?.start(inputSequence: stream)
}
}
Step 2 - Ensure Model Availability
func ensureModel(for transcriber: SpeechTranscriber) async throws {
let locale = Locale.current
// Check if language is supported
let supported = await SpeechTranscriber.supportedLocales
guard supported.contains(where: {
$0.identifier(.bcp47) == locale.identifier(.bcp47)
}) else {
throw TranscriptionError.localeNotSupported
}
// Check if model is installed
let installed = await SpeechTranscriber.installedLocales
if installed.contains(where: {
$0.identifier(.bcp47) == locale.identifier(.bcp47)
}) {
return // Already installed
}
// Download model
if let downloader = try await AssetInventory.assetInstallationRequest(
supporting: [transcriber]
) {
// Track progress if needed
let progress = downloader.progress
try await downloader.downloadAndInstall()
}
}
Note: Models are stored in system storage, not app storage. Limited number of languages can be allocated at once.
Step 3 - Handle Results
func startResultHandling() {
Task {
guard let transcriber else { return }
do {
for try await result in transcriber.results {
let text = result.text
if result.isFinal {
// Finalized result - won't change
finalizedTranscript += text
volatileTranscript = AttributedString()
// Access timing info
for run in text.runs {
if let timeRange = run.audioTimeRange {
print("Time: \(timeRange)")
}
}
} else {
// Volatile result - will be replaced
volatileTranscript = text
}
}
} catch {
print("Transcription failed: \(error)")
}
}
}
Pattern 3 - Audio Recording and Streaming
Connect AVAudioEngine to SpeechAnalyzer.
import AVFoundation
class AudioRecorder {
private let audioEngine = AVAudioEngine()
private var outputContinuation: AsyncStream<AVAudioPCMBuffer>.Continuation?
private let transcriptionManager: TranscriptionManager
func startRecording() async throws {
// Request permission
guard await AVAudioApplication.requestRecordPermission() else {
throw RecordingError.permissionDenied
}
// Configure audio session (iOS)
#if os(iOS)
let session = AVAudioSession.sharedInstance()
try session.setCategory(.playAndRecord, mode: .spokenAudio)
try session.setActive(true, options: .notifyOthersOnDeactivation)
#endif
// Set up transcriber
try await transcriptionManager.setUp()
transcriptionManager.startResultHandling()
// Stream audio to transcriber
for await buffer in try audioStream() {
try await transcriptionManager.streamAudio(buffer)
}
}
private func audioStream() throws -> AsyncStream<AVAudioPCMBuffer> {
let inputNode = audioEngine.inputNode
let format = inputNode.outputFormat(forBus: 0)
inputNode.installTap(
onBus: 0,
bufferSize: 4096,
format: format
) { [weak self] buffer, time in
self?.outputContinuation?.yield(buffer)
}
audioEngine.prepare()
try audioEngine.start()
return AsyncStream { continuation in
outputContinuation = continuation
}
}
}
Stream Audio with Format Conversion
extension TranscriptionManager {
private var converter: AVAudioConverter?
func streamAudio(_ buffer: AVAudioPCMBuffer) async throws {
guard let inputBuilder, let analyzerFormat else {
throw TranscriptionError.notSetUp
}
// Convert to analyzer's required format
let converted = try convertBuffer(buffer, to: analyzerFormat)
// Send to analyzer
let input = AnalyzerInput(buffer: converted)
inputBuilder.yield(input)
}
private func convertBuffer(
_ buffer: AVAudioPCMBuffer,
to format: AudioFormatDescription
) throws -> AVAudioPCMBuffer {
// Lazy initialize converter
if converter == nil {
let sourceFormat = buffer.format
let destFormat = AVAudioFormat(cmAudioFormatDescription: format)!
converter = AVAudioConverter(from: sourceFormat, to: destFormat)
}
guard let converter else {
throw TranscriptionError.conversionFailed
}
let outputBuffer = AVAudioPCMBuffer(
pcmFormat: converter.outputFormat,
frameCapacity: buffer.frameLength
)!
try converter.convert(to: outputBuffer, from: buffer)
return outputBuffer
}
}
Pattern 4 - Stopping Transcription
Properly finalize to get remaining volatile results as finalized.
func stopRecording() async {
// Stop audio
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
outputContinuation?.finish()
// Finalize transcription (converts remaining volatile to final)
try? await analyzer?.finalizeAndFinishThroughEndOfInput()
// Cancel any pending tasks
recognizerTask?.cancel()
}
Critical: Always call finalizeAndFinishThroughEndOfInput() to ensure volatile results are finalized.
Pattern 5 - Model Asset Management
Check Supported Languages
// Languages the API supports
let supported = await SpeechTranscriber.supportedLocales
// Languages currently installed on device
let installed = await SpeechTranscriber.installedLocales
Deallocate Languages
Limited number of languages can be allocated. Deallocate unused ones.
func deallocateLanguages() async {
let allocated = await AssetInventory.allocatedLocales
for locale in allocated {
await AssetInventory.deallocate(locale: locale)
}
}
Pattern 6 - Displaying Results with Timing
Highlight text during audio playback using timing metadata.
struct TranscriptView: View {
let transcript: AttributedString
@Binding var playbackTime: CMTime
var body: some View {
Text(highlightedTranscript)
}
var highlightedTranscript: AttributedString {
var result = transcript
for (range, run) in transcript.runs {
guard let timeRange = run.audioTimeRange else { continue }
let isActive = timeRange.containsTime(playbackTime)
if isActive {
result[range].backgroundColor = .yellow
}
}
return result
}
}
Anti-Patterns
Don't - Forget to finalize
// BAD - volatile results lost
func stopRecording() {
audioEngine.stop()
// Missing finalize!
}
// GOOD - volatile results become finalized
func stopRecording() async {
audioEngine.stop()
try? await analyzer?.finalizeAndFinishThroughEndOfInput()
}
Don't - Ignore format conversion
// BAD - format mismatch may fail silently
inputBuilder.yield(AnalyzerInput(buffer: rawBuffer))
// GOOD - convert to analyzer's format
let format = await SpeechAnalyzer.bestAvailableAudioFormat(compatibleWith: [transcriber])
let converted = try convertBuffer(rawBuffer, to: format)
inputBuilder.yield(AnalyzerInput(buffer: converted))
Don't - Skip model availability check
// BAD - may crash if model not installed
let transcriber = SpeechTranscriber(locale: locale, ...)
// Start using immediately
// GOOD - ensure model is ready
let transcriber = SpeechTranscriber(locale: locale, ...)
try await ensureModel(for: transcriber)
// Now safe to use
Presets Reference
| Preset | Use Case |
|---|---|
.offlineTranscription |
File transcription, no real-time feedback needed |
.progressiveLiveTranscription |
Live transcription with volatile updates |
Options Reference
TranscriptionOptions
- Default: None (standard transcription)
ReportingOptions
.volatileResults: Enable real-time approximate results
AttributeOptions
.audioTimeRange: Include CMTimeRange for each text segment
Platform Availability
| Platform | SpeechTranscriber | DictationTranscriber |
|---|---|---|
| iOS 26+ | Yes | Yes |
| macOS Tahoe+ | Yes | Yes |
| watchOS 26+ | No | Yes |
| tvOS 26+ | TBD | TBD |
Hardware requirements: Varies by device. Use supportedLocales to check.
Integration with Apple Intelligence
Combine with Foundation Models for summarization:
import FoundationModels
func generateTitle(for transcript: String) async throws -> String {
let session = LanguageModelSession()
let prompt = "Generate a short, clever title for this story: \(transcript)"
let response = try await session.respond(to: prompt)
return response.content
}
See axiom-ios-ai skill for Foundation Models details.
Checklist
Before shipping speech-to-text:
- Check locale support with
supportedLocales - Ensure model with
AssetInventory.assetInstallationRequest - Handle download progress for user feedback
- Convert audio to
bestAvailableAudioFormat - Enable
.volatileResultsfor live transcription - Call
finalizeAndFinishThroughEndOfInput()on stop - Handle timing with
.audioTimeRangeif needed - Clear volatile results when finalized result arrives
- Request microphone permission before recording
Resources
WWDC: 2025-277
Docs: /speech, /speech/speechanalyzer, /speech/speechtranscriber
Skills: coreml (on-device ML), axiom-ios-ai (Foundation Models)