name	swift-performance
description	Use when optimizing Swift code performance, reducing memory usage, improving runtime efficiency, dealing with COW, ARC overhead, generics specialization, or collection optimization
skill_type	discipline
version	1.2.0

Swift Performance Optimization

Purpose

Core Principle: Optimize Swift code by understanding language-level performance characteristics—value semantics, ARC behavior, generic specialization, and memory layout—to write fast, efficient code without premature micro-optimization.

Swift Version: Swift 6.2+ (for InlineArray, Span, @concurrent) Xcode: 16+ Platforms: iOS 18+, macOS 15+

Related Skills:

performance-profiling — Use Instruments to measure (do this first!)
swiftui-performance — SwiftUI-specific optimizations
build-performance — Compilation speed
swift-concurrency — Correctness-focused concurrency patterns

When to Use This Skill

✅ Use this skill when

App profiling shows Swift code as the bottleneck (Time Profiler hotspots)
Excessive memory allocations or retain/release traffic
Implementing performance-critical algorithms or data structures
Writing framework or library code with performance requirements
Optimizing tight loops or frequently called methods
Dealing with large data structures or collections
Code review identifying performance anti-patterns

❌ Do NOT use this skill for

First step optimization — Use performance-profiling first to measure
SwiftUI performance — Use swiftui-performance skill instead
Build time optimization — Use build-performance skill instead
Premature optimization — Profile first, optimize later
Readability trade-offs — Don't sacrifice clarity for micro-optimizations

Quick Decision Tree

Performance issue identified?
│
├─ Profiler shows excessive copying?
│  └─ → Part 1: Noncopyable Types
│  └─ → Part 2: Copy-on-Write
│
├─ Retain/release overhead in Time Profiler?
│  └─ → Part 4: ARC Optimization
│
├─ Generic code in hot path?
│  └─ → Part 5: Generics & Specialization
│
├─ Collection operations slow?
│  └─ → Part 7: Collection Performance
│
├─ Async/await overhead visible?
│  └─ → Part 8: Concurrency Performance
│
├─ Struct vs class decision?
│  └─ → Part 3: Value vs Reference
│
└─ Memory layout concerns?
   └─ → Part 9: Memory Layout

Part 1: Noncopyable Types (~Copyable)

Swift 6.0+ introduces noncopyable types for performance-critical scenarios where you want to avoid implicit copies.

When to Use

Large types that should never be copied (file handles, GPU buffers)
Types with ownership semantics (must be explicitly consumed)
Performance-critical code where copies are expensive

Basic Pattern

// Noncopyable type
struct FileHandle: ~Copyable {
    private let fd: Int32

    init(path: String) throws {
        self.fd = open(path, O_RDONLY)
        guard fd != -1 else { throw FileError.openFailed }
    }

    deinit {
        close(fd)
    }

    // Must explicitly consume
    consuming func close() {
        _ = consume self
    }
}

// Usage
func processFile() throws {
    let handle = try FileHandle(path: "/data.txt")
    // handle is automatically consumed at end of scope
    // Cannot accidentally copy handle
}

Ownership Annotations

// consuming - takes ownership, caller cannot use after
func process(consuming data: [UInt8]) {
    // data is consumed
}

// borrowing - temporary access without ownership
func validate(borrowing data: [UInt8]) -> Bool {
    // data can still be used by caller
    return data.count > 0
}

// inout - mutable access
func modify(inout data: [UInt8]) {
    data.append(0)
}

Performance Impact

Eliminates implicit copies: Compiler error instead of runtime copy
Zero-cost abstraction: Same performance as manual memory management
Use when: Type is expensive to copy (>64 bytes) and copies are rare

Part 2: Copy-on-Write (COW)

Swift collections use COW for efficient memory sharing. Understanding when copies happen is critical for performance.

How COW Works

var array1 = [1, 2, 3]  // Single allocation
var array2 = array1     // Share storage (no copy)
array2.append(4)        // Now copies (array1 modified array2)

Custom COW Implementation

final class Storage<T> {
    var data: [T]
    init(_ data: [T]) { self.data = data }
}

struct COWArray<T> {
    private var storage: Storage<T>

    init(_ data: [T]) {
        self.storage = Storage(data)
    }

    // COW check before mutation
    private mutating func ensureUnique() {
        if !isKnownUniquelyReferenced(&storage) {
            storage = Storage(storage.data)
        }
    }

    mutating func append(_ element: T) {
        ensureUnique()  // Copy if shared
        storage.data.append(element)
    }

    subscript(index: Int) -> T {
        get { storage.data[index] }
        set {
            ensureUnique()  // Copy before mutation
            storage.data[index] = newValue
        }
    }
}

Performance Tips

// ❌ Accidental copy in loop
for i in 0..<array.count {
    array[i] = transform(array[i])  // Copy on first mutation if shared!
}

// ✅ Reserve capacity first (ensures unique)
array.reserveCapacity(array.count)
for i in 0..<array.count {
    array[i] = transform(array[i])
}

// ❌ Multiple mutations trigger multiple uniqueness checks
array.append(1)
array.append(2)
array.append(3)

// ✅ Single reservation
array.reserveCapacity(array.count + 3)
array.append(contentsOf: [1, 2, 3])

Part 3: Value vs Reference Semantics

Choosing between struct and class has significant performance implications.

Decision Matrix

Factor	Use Struct	Use Class
Size	≤ 64 bytes	> 64 bytes or contains large data
Identity	No identity needed	Needs identity (===)
Inheritance	Not needed	Inheritance required
Mutation	Infrequent	Frequent in-place updates
Sharing	No sharing needed	Must be shared across scope

Small Structs (Fast)

// ✅ Fast - fits in registers, no heap allocation
struct Point {
    var x: Double  // 8 bytes
    var y: Double  // 8 bytes
}  // Total: 16 bytes - excellent for struct

struct Color {
    var r, g, b, a: UInt8  // 4 bytes total - perfect for struct
}

Large Structs (Slow)

// ❌ Slow - excessive copying
struct HugeData {
    var buffer: [UInt8]  // 1MB
    var metadata: String
}

func process(_ data: HugeData) {  // Copies 1MB!
    // ...
}

// ✅ Use reference semantics for large data
final class HugeData {
    var buffer: [UInt8]
    var metadata: String
}

func process(_ data: HugeData) {  // Only copies pointer (8 bytes)
    // ...
}

Indirect Storage for Flexibility

// Best of both worlds
struct LargeDataWrapper {
    private final class Storage {
        var largeBuffer: [UInt8]
        init(_ buffer: [UInt8]) { self.largeBuffer = buffer }
    }

    private var storage: Storage

    init(buffer: [UInt8]) {
        self.storage = Storage(buffer)
    }

    // Value semantics externally, reference internally
    var buffer: [UInt8] {
        get { storage.largeBuffer }
        set {
            if !isKnownUniquelyReferenced(&storage) {
                storage = Storage(newValue)
            } else {
                storage.largeBuffer = newValue
            }
        }
    }
}

Part 4: ARC Optimization

Automatic Reference Counting adds overhead. Minimize it where possible.

Weak vs Unowned Performance

class Parent {
    var child: Child?
}

class Child {
    // ❌ Weak adds overhead (optional, thread-safe zeroing)
    weak var parent: Parent?
}

// ✅ Unowned when you know lifetime guarantees
class Child {
    unowned let parent: Parent  // No overhead, crashes if parent deallocated
}

Performance: unowned is ~2x faster than weak (no atomic operations).

Use when: Child lifetime < Parent lifetime (guaranteed).

Closure Capture Optimization

class DataProcessor {
    var data: [Int]

    // ❌ Captures self strongly, then uses weak - unnecessary weak overhead
    func process(completion: @escaping () -> Void) {
        DispatchQueue.global().async { [weak self] in
            guard let self else { return }
            self.data.forEach { print($0) }
            completion()
        }
    }

    // ✅ Capture only what you need
    func process(completion: @escaping () -> Void) {
        let data = self.data  // Copy value type
        DispatchQueue.global().async {
            data.forEach { print($0) }  // No self captured
            completion()
        }
    }
}

Reducing Retain/Release Traffic

// ❌ Multiple retain/release pairs
for object in objects {
    process(object)  // retain, release
}

// ✅ Single retain for entire loop
func processAll(_ objects: [MyClass]) {
    // Compiler optimizes to single retain/release
    for object in objects {
        process(object)
    }
}

Observable Object Lifetimes

From WWDC 2021-10216: Object lifetimes end at last use, not at closing brace.

// ❌ Relying on observed lifetime is fragile
class Traveler {
    weak var account: Account?

    deinit {
        print("Deinitialized")  // May run BEFORE expected with ARC optimizations!
    }
}

func test() {
    let traveler = Traveler()
    let account = Account(traveler: traveler)
    // traveler's last use is above - may deallocate here!
    account.printSummary()  // weak reference may be nil!
}

// ✅ Explicitly extend lifetime when needed
func test() {
    let traveler = Traveler()
    let account = Account(traveler: traveler)

    withExtendedLifetime(traveler) {
        account.printSummary()  // traveler guaranteed to live
    }
}

// Alternative: defer at end of scope
func test() {
    let traveler = Traveler()
    defer { withExtendedLifetime(traveler) {} }

    let account = Account(traveler: traveler)
    account.printSummary()
}

Why This Matters: Observed object lifetimes are an emergent property of compiler optimizations and can change between:

Xcode versions
Build configurations (Debug vs Release)
Unrelated code changes that enable new optimizations

Build Setting: Enable "Optimize Object Lifetimes" (Xcode 13+) during development to expose hidden lifetime bugs early.

Part 5: Generics & Specialization

Generic code can be fast or slow depending on specialization.

Specialization Basics

// Generic function
func process<T>(_ value: T) {
    print(value)
}

// Calling with concrete type
process(42)  // Compiler specializes: process_Int(42)
process("hello")  // Compiler specializes: process_String("hello")

Existential Overhead

protocol Drawable {
    func draw()
}

// ❌ Existential container - expensive (heap allocation, indirection)
func drawAll(shapes: [any Drawable]) {
    for shape in shapes {
        shape.draw()  // Dynamic dispatch through witness table
    }
}

// ✅ Generic with constraint - can specialize
func drawAll<T: Drawable>(shapes: [T]) {
    for shape in shapes {
        shape.draw()  // Static dispatch after specialization
    }
}

Performance: Generic version ~10x faster (eliminates witness table overhead).

Existential Container Internals

From WWDC 2016-416: any Protocol uses an existential container with specific performance characteristics.

// Existential Container Memory Layout (64-bit systems)
//
// Small Type (≤24 bytes):
// ┌──────────────────┬──────────────┬────────────────┐
// │ Value (inline)   │ Type         │ Protocol       │
// │ 3 words max      │ Metadata     │ Witness Table  │
// │ (24 bytes)       │ (8 bytes)    │ (8 bytes)      │
// └──────────────────┴──────────────┴────────────────┘
//   ↑ No heap allocation - value stored directly
//
// Large Type (>24 bytes):
// ┌──────────────────┬──────────────┬────────────────┐
// │ Heap Pointer →   │ Type         │ Protocol       │
// │ (8 bytes)        │ Metadata     │ Witness Table  │
// │                  │ (8 bytes)    │ (8 bytes)      │
// └──────────────────┴──────────────┴────────────────┘
//   ↑ Heap allocation required - pointer to actual value
//
// Total container size: 40 bytes (5 words on 64-bit)
// Threshold: 3 words (24 bytes) determines inline vs heap

// Small type example - stored inline (FAST)
struct Point: Drawable {
    var x, y, z: Double  // 24 bytes - fits inline!
}

let drawable: any Drawable = Point(x: 1, y: 2, z: 3)
// ✅ Point stored directly in container (no heap allocation)

// Large type example - heap allocated (SLOWER)
struct Rectangle: Drawable {
    var x, y, width, height: Double  // 32 bytes - exceeds inline buffer
}

let drawable: any Drawable = Rectangle(x: 0, y: 0, width: 10, height: 20)
// ❌ Rectangle allocated on heap, container stores pointer

// Performance comparison:
// - Small existential (≤24 bytes): ~5ns access time
// - Large existential (>24 bytes): ~15ns access time (heap indirection)
// - Generic `some Drawable`: ~2ns access time (no container)

Design Tip: Keep protocol-conforming types ≤24 bytes when used as any Protocol for best performance. Use some Protocol instead of any Protocol when possible to eliminate all container overhead.

`@_specialize` Attribute

// Force specialization for common types
@_specialize(where T == Int)
@_specialize(where T == String)
func process<T: Comparable>(_ value: T) -> T {
    // Expensive generic operation
    return value
}

// Compiler generates:
// - func process_Int(_ value: Int) -> Int
// - func process_String(_ value: String) -> String
// - Generic fallback for other types

`any` vs `some`

// ❌ any - existential, runtime overhead
func makeDrawable() -> any Drawable {
    return Circle()  // Heap allocation
}

// ✅ some - opaque type, compile-time type
func makeDrawable() -> some Drawable {
    return Circle()  // No overhead, type known at compile time
}

Part 6: Inlining

Inlining eliminates function call overhead but increases code size.

When to Inline

// ✅ Small, frequently called functions
@inlinable
public func fastAdd(_ a: Int, _ b: Int) -> Int {
    return a + b
}

// ❌ Large functions - code bloat
@inlinable  // Don't do this!
public func complexAlgorithm() {
    // 100 lines of code...
}

Cross-Module Optimization

// Framework code
public struct Point {
    public var x: Double
    public var y: Double

    // ✅ Inlinable for cross-module optimization
    @inlinable
    public func distance(to other: Point) -> Double {
        let dx = x - other.x
        let dy = y - other.y
        return sqrt(dx*dx + dy*dy)
    }
}

// Client code
let p1 = Point(x: 0, y: 0)
let p2 = Point(x: 3, y: 4)
let d = p1.distance(to: p2)  // Inlined across module boundary

`@usableFromInline`

// Internal helper that can be inlined
@usableFromInline
internal func helperFunction() { }

// Public API that uses it
@inlinable
public func publicAPI() {
    helperFunction()  // Can inline internal function
}

Trade-off: @inlinable exposes implementation, prevents future optimization.

Part 7: Collection Performance

Choosing the right collection and using it correctly matters.

Array vs ContiguousArray

// ❌ Array<T> - may use NSArray bridging (Swift/ObjC interop)
let array: Array<Int> = [1, 2, 3]

// ✅ ContiguousArray<T> - guaranteed contiguous memory (no bridging)
let array: ContiguousArray<Int> = [1, 2, 3]

Use ContiguousArray when: No ObjC bridging needed (pure Swift), ~15% faster.

Reserve Capacity

// ❌ Multiple reallocations
var array: [Int] = []
for i in 0..<10000 {
    array.append(i)  // Reallocates ~14 times
}

// ✅ Single allocation
var array: [Int] = []
array.reserveCapacity(10000)
for i in 0..<10000 {
    array.append(i)  // No reallocations
}

Dictionary Hashing

struct BadKey: Hashable {
    var data: [Int]

    // ❌ Expensive hash (iterates entire array)
    func hash(into hasher: inout Hasher) {
        for element in data {
            hasher.combine(element)
        }
    }
}

struct GoodKey: Hashable {
    var id: UUID  // Fast hash
    var data: [Int]  // Not hashed

    // ✅ Hash only the unique identifier
    func hash(into hasher: inout Hasher) {
        hasher.combine(id)
    }
}

InlineArray (Swift 6.2)

Fixed-size arrays stored directly on the stack—no heap allocation, no COW overhead.

// Traditional Array - heap allocated, COW overhead
var sprites: [Sprite] = Array(repeating: .default, count: 40)

// InlineArray - stack allocated, no COW
var sprites = InlineArray<40, Sprite>(repeating: .default)
// Alternative syntax (if available)
var sprites: [40 of Sprite] = ...

When to Use InlineArray:

Fixed size known at compile time
Performance-critical paths (tight loops, hot paths)
Want to avoid heap allocation entirely
Small to medium sizes (practical limit ~1KB stack usage)

Key Characteristics:

let inline = InlineArray<10, Int>(repeating: 0)

// ✅ Stack allocated - no heap
print(MemoryLayout.size(ofValue: inline))  // 80 bytes (10 × 8)

// ✅ Value semantics - but eagerly copied (not COW!)
var copy = inline  // Copies all 10 elements immediately
copy[0] = 100     // No COW check needed

// ✅ Provides Span access for zero-copy operations
let span = inline.span  // Read-only view
let mutableSpan = inline.mutableSpan  // Mutable view

Performance Comparison:

// Array: Heap allocation + COW overhead
var array = Array(repeating: 0, count: 100)
// - Allocation: ~1μs (heap)
// - Copy: ~50ns (COW reference bump)
// - Mutation: ~50ns (uniqueness check)

// InlineArray: Stack allocation, no COW
var inline = InlineArray<100, Int>(repeating: 0)
// - Allocation: 0ns (stack frame)
// - Copy: ~400ns (eager copy all 100 elements)
// - Mutation: 0ns (no uniqueness check)

24-Byte Threshold Connection:

InlineArray relates to the existential container threshold from Part 5:

// Existential containers store ≤24 bytes inline
struct Small: Protocol {
    var a, b, c: Int64  // 24 bytes - fits inline
}

// InlineArray of 3 Int64s also ≤24 bytes
let inline = InlineArray<3, Int64>(repeating: 0)
// Size: 24 bytes - same threshold, different purpose

// Both avoid heap allocation at this size
let existential: any Protocol = Small(...)  // Inline storage
let array = inline  // Stack storage

Copy Semantics Warning:

// ❌ Unexpected: InlineArray copies eagerly
func processLarge(_ data: InlineArray<1000, UInt8>) {
    // Copies all 1000 bytes on call!
}

// ✅ Use Span to avoid copy
func processLarge(_ data: Span<UInt8>) {
    // Zero-copy view, no matter the size
}

// Best practice: Store InlineArray, pass Span
struct Buffer {
    var storage = InlineArray<1000, UInt8>(repeating: 0)

    func process() {
        helper(storage.span)  // Pass view, not copy
    }
}

When NOT to Use InlineArray:

Dynamic sizes (use Array)
Large data (>1KB stack usage risky)
Frequently passed by value (use Span instead)
Need COW semantics (use Array)

Lazy Sequences

// ❌ Eager evaluation - processes entire array
let result = array
    .map { expensive($0) }
    .filter { $0 > 0 }
    .first  // Only need first element!

// ✅ Lazy evaluation - stops at first match
let result = array
    .lazy
    .map { expensive($0) }
    .filter { $0 > 0 }
    .first  // Only evaluates until first match

Part 8: Concurrency Performance

Async/await and actors add overhead. Use appropriately.

Actor Isolation Overhead

actor Counter {
    private var value = 0

    // ❌ Actor call overhead for simple operation
    func increment() {
        value += 1
    }
}

// Calling from different isolation domain
for _ in 0..<10000 {
    await counter.increment()  // 10,000 actor hops!
}

// ✅ Batch operations to reduce actor overhead
actor Counter {
    private var value = 0

    func incrementBatch(_ count: Int) {
        value += count
    }
}

await counter.incrementBatch(10000)  // Single actor hop

async/await vs Completion Handlers

// async/await overhead: ~20-30μs per suspension point

// ❌ Unnecessary async for fast synchronous operation
func compute() async -> Int {
    return 42  // Instant, but pays async overhead
}

// ✅ Keep synchronous operations synchronous
func compute() -> Int {
    return 42  // No overhead
}

Task Creation Cost

// ❌ Creating task per item (~100μs overhead each)
for item in items {
    Task {
        await process(item)
    }
}

// ✅ Single task for batch
Task {
    for item in items {
        await process(item)
    }
}

// ✅ Or use TaskGroup for parallelism
await withTaskGroup(of: Void.self) { group in
    for item in items {
        group.addTask {
            await process(item)
        }
    }
}

`@concurrent` Attribute (Swift 6.2)

// Force background execution
@concurrent
func expensiveComputation() -> Int {
    // Always runs on background thread, even if called from MainActor
    return complexCalculation()
}

// Safe to call from main actor without blocking
@MainActor
func updateUI() async {
    let result = await expensiveComputation()  // Guaranteed off main thread
    label.text = "\(result)"
}

nonisolated Performance

actor DataStore {
    private var data: [Int] = []

    // ❌ Isolated - actor overhead even for read-only
    func getCount() -> Int {
        data.count
    }

    // ✅ nonisolated for immutable state
    nonisolated var storedCount: Int {
        // Must be immutable
        return data.count  // Error: cannot access isolated property
    }
}

Part 9: Memory Layout

Understanding memory layout helps optimize cache performance and reduce allocations.

Struct Padding

// ❌ Poor layout (24 bytes due to padding)
struct BadLayout {
    var a: Bool    // 1 byte + 7 padding
    var b: Int64   // 8 bytes
    var c: Bool    // 1 byte + 7 padding
}
print(MemoryLayout<BadLayout>.size)  // 24 bytes

// ✅ Optimized layout (16 bytes)
struct GoodLayout {
    var b: Int64   // 8 bytes
    var a: Bool    // 1 byte
    var c: Bool    // 1 byte + 6 padding
}
print(MemoryLayout<GoodLayout>.size)  // 16 bytes

Alignment

// Query alignment
print(MemoryLayout<Double>.alignment)  // 8
print(MemoryLayout<Int32>.alignment)   // 4

// Structs align to largest member
struct Mixed {
    var int32: Int32   // 4 bytes, 4-byte aligned
    var double: Double // 8 bytes, 8-byte aligned
}
print(MemoryLayout<Mixed>.alignment)  // 8 (largest member)

Cache-Friendly Data Structures

// ❌ Poor cache locality
struct PointerBased {
    var next: UnsafeMutablePointer<Node>?  // Pointer chasing
}

// ✅ Array-based for cache locality
struct ArrayBased {
    var data: ContiguousArray<Int>  // Contiguous memory
}

// Array iteration ~10x faster due to cache prefetching

Part 10: Typed Throws (Swift 6)

Typed throws can be faster than untyped by avoiding existential overhead.

Untyped vs Typed

// Untyped - existential container for error
func fetchData() throws -> Data {
    // Can throw any Error
    throw NetworkError.timeout
}

// Typed - concrete error type
func fetchData() throws(NetworkError) -> Data {
    // Can only throw NetworkError
    throw NetworkError.timeout
}

Performance Impact

// Measure with tight loop
func untypedThrows() throws -> Int {
    throw GenericError.failed
}

func typedThrows() throws(GenericError) -> Int {
    throw GenericError.failed
}

// Benchmark: typed ~5-10% faster (no existential overhead)

When to Use

Typed: Library code with well-defined error types, hot paths
Untyped: Application code, error types unknown at compile time

Part 11: Span Types

Swift 6.2+ introduces Span—a non-escapable, non-owning view into memory that provides safe, efficient access to contiguous data.

What is Span?

Span is a modern replacement for UnsafeBufferPointer that provides:

Spatial safety: Bounds-checked operations prevent out-of-bounds access
Temporal safety: Lifetime inherited from source, preventing use-after-free
Zero overhead: No heap allocation, no reference counting
Non-escapable: Cannot outlive the data it references

// Traditional unsafe approach
func processUnsafe(_ data: UnsafeMutableBufferPointer<UInt8>) {
    data[100] = 0  // Crashes if out of bounds!
}

// Safe Span approach
func processSafe(_ data: MutableSpan<UInt8>) {
    data[100] = 0  // Traps with clear error if out of bounds
}

When to Use Span vs Array vs UnsafeBufferPointer

Use Case	Recommendation
Own the data	Array (full ownership, COW)
Temporary view for reading	Span (safe, fast)
Temporary view for writing	MutableSpan (safe, fast)
C interop, performance-critical	RawSpan (untyped bytes)
Unsafe performance	UnsafeBufferPointer (legacy, avoid)

Basic Span Usage

let array = [1, 2, 3, 4, 5]

// Get read-only span
let span = array.span
print(span[0])  // 1
print(span.count)  // 5

// Iterate safely
for element in span {
    print(element)
}

// Slicing (creates new span, no copy)
let slice = span[1..<3]  // Span<Int> viewing [2, 3]

MutableSpan for Modifications

var array = [10, 20, 30, 40, 50]

// Get mutable span
let mutableSpan = array.mutableSpan

// Modify through span
mutableSpan[0] = 100
mutableSpan[1] = 200

print(array)  // [100, 200, 30, 40, 50]

// Safe bounds checking
// mutableSpan[10] = 0  // Fatal error: Index out of range

RawSpan for Untyped Bytes

struct PacketHeader {
    var version: UInt8
    var flags: UInt8
    var length: UInt16
}

func parsePacket(_ data: RawSpan) -> PacketHeader? {
    guard data.count >= MemoryLayout<PacketHeader>.size else {
        return nil
    }

    // Safe byte-level access
    let version = data[0]
    let flags = data[1]
    let lengthLow = data[2]
    let lengthHigh = data[3]

    return PacketHeader(
        version: version,
        flags: flags,
        length: UInt16(lengthHigh) << 8 | UInt16(lengthLow)
    )
}

// Usage
let bytes: [UInt8] = [1, 0x80, 0x00, 0x10]  // Version 1, flags 0x80, length 16
let rawSpan = bytes.rawSpan
if let header = parsePacket(rawSpan) {
    print("Packet version: \(header.version)")
}

Span-Providing Properties

Swift 6.2 collections automatically provide span properties:

// Array provides .span and .mutableSpan
let array = [1, 2, 3]
let span: Span<Int> = array.span

// ContiguousArray provides spans
let contiguous = ContiguousArray([1, 2, 3])
let span2 = contiguous.span

// UnsafeBufferPointer provides .span (migration path)
let buffer: UnsafeBufferPointer<Int> = ...
let span3 = buffer.span  // Modern safe wrapper

Performance Characteristics

// ❌ Array copy - heap allocation
func process(_ array: [Int]) {
    // Array copied if passed across module boundary
}

// ❌ UnsafeBufferPointer - no bounds checking
func process(_ buffer: UnsafeBufferPointer<Int>) {
    buffer[100]  // Crash or memory corruption!
}

// ✅ Span - no copy, bounds-checked, temporal safety
func process(_ span: Span<Int>) {
    span[100]  // Safe trap if out of bounds
}

// Performance: Span is as fast as UnsafeBufferPointer (~2ns access)
// but with safety guarantees (bounds checks are optimized away when safe)

Non-Escapable Lifetime Safety

// ✅ Safe - span lifetime bound to array
func useSpan() {
    let array = [1, 2, 3, 4, 5]
    let span = array.span
    process(span)  // Safe - array still alive
}

// ❌ Compiler prevents this
func dangerousSpan() -> Span<Int> {
    let array = [1, 2, 3]
    return array.span  // Error: Cannot return non-escapable value
}

// This is what temporal safety prevents
// (Compare to UnsafeBufferPointer which ALLOWS this bug!)

Integration with InlineArray

// InlineArray provides span access
let inline = InlineArray<10, UInt8>()
let span: Span<UInt8> = inline.span
let mutableSpan: MutableSpan<UInt8> = inline.mutableSpan

// Efficient zero-copy parsing
func parseHeader(_ span: Span<UInt8>) -> Header {
    // Direct access to inline storage via span
    Header(
        magic: span[0],
        version: span[1],
        flags: span[2]
    )
}

let header = parseHeader(inline.span)  // No heap allocation!

Migration from UnsafeBufferPointer

// Old pattern (unsafe)
func processLegacy(_ buffer: UnsafeBufferPointer<Int>) {
    for i in 0..<buffer.count {
        print(buffer[i])
    }
}

// New pattern (safe)
func processModern(_ span: Span<Int>) {
    for element in span {  // Safe iteration
        print(element)
    }
}

// Migration bridge
let buffer: UnsafeBufferPointer<Int> = ...
let span = buffer.span  // Wrap unsafe pointer in safe span
processModern(span)

Common Patterns

// Pattern 1: Binary parsing with RawSpan
func parse<T>(_ span: RawSpan) -> T? {
    guard span.count >= MemoryLayout<T>.size else {
        return nil
    }
    return span.load(as: T.self)  // Safe type reinterpretation
}

// Pattern 2: Chunked processing
func processChunks(_ data: Span<UInt8>, chunkSize: Int) {
    var offset = 0
    while offset < data.count {
        let end = min(offset + chunkSize, data.count)
        let chunk = data[offset..<end]  // Span slice, no copy
        processChunk(chunk)
        offset += chunkSize
    }
}

// Pattern 3: Safe C interop
func sendToC(_ span: Span<UInt8>) {
    span.withUnsafeBufferPointer { buffer in
        // Only escape to unsafe inside controlled scope
        c_function(buffer.baseAddress, buffer.count)
    }
}

When NOT to Use Span

// ❌ Don't use Span for ownership
struct Document {
    var data: Span<UInt8>  // Error: Span can't be stored
}

// ✅ Use Array for owned data
struct Document {
    var data: [UInt8]

    // Provide span access when needed
    var dataSpan: Span<UInt8> {
        data.span
    }
}

// ❌ Don't try to escape Span from scope
func getSpan() -> Span<Int> {  // Error: Non-escapable
    let array = [1, 2, 3]
    return array.span
}

// ✅ Process in scope, return owned data
func processAndReturn() -> [Int] {
    let array = [1, 2, 3]
    process(array.span)  // Process with span
    return array  // Return owned data
}

Copy-Paste Patterns

Pattern 1: COW Wrapper

final class Storage<T> {
    var value: T
    init(_ value: T) { self.value = value }
}

struct COWWrapper<T> {
    private var storage: Storage<T>

    init(_ value: T) {
        storage = Storage(value)
    }

    var value: T {
        get { storage.value }
        set {
            if !isKnownUniquelyReferenced(&storage) {
                storage = Storage(newValue)
            } else {
                storage.value = newValue
            }
        }
    }
}

Pattern 2: Performance-Critical Loop

func processLargeArray(_ input: [Int]) -> [Int] {
    var result = ContiguousArray<Int>()
    result.reserveCapacity(input.count)

    for element in input {
        result.append(transform(element))
    }

    return Array(result)
}

Pattern 3: Inline Cache Lookup

private var cache: [Key: Value] = [:]

@inlinable
func getCached(_ key: Key) -> Value? {
    return cache[key]  // Inlined across modules
}

Anti-Patterns

❌ Anti-Pattern 1: Premature Optimization

// Don't optimize without measuring first!

// ❌ Complex optimization with no measurement
struct OverEngineered {
    @usableFromInline var data: ContiguousArray<UInt8>
    // 100 lines of COW logic...
}

// ✅ Start simple, measure, then optimize
struct Simple {
    var data: [UInt8]
}
// Profile → Optimize if needed

❌ Anti-Pattern 2: Weak Everywhere

class Manager {
    // ❌ Unnecessary weak reference overhead
    weak var delegate: Delegate?
    weak var dataSource: DataSource?
    weak var observer: Observer?
}

// ✅ Use unowned when lifetime is guaranteed
class Manager {
    unowned let delegate: Delegate  // Delegate outlives Manager
    weak var dataSource: DataSource?  // Optional, may be nil
}

❌ Anti-Pattern 3: Actor for Everything

// ❌ Actor overhead for simple synchronous data
actor SimpleCounter {
    private var count = 0

    func increment() {
        count += 1
    }
}

// ✅ Use lock-free atomics or @unchecked Sendable
import Atomics
struct AtomicCounter: @unchecked Sendable {
    private let count = ManagedAtomic<Int>(0)

    func increment() {
        count.wrappingIncrement(ordering: .relaxed)
    }
}

Code Review Checklist

Memory Management

Large structs (>64 bytes) use indirect storage or are classes
COW types use isKnownUniquelyReferenced before mutation
Collections use reserveCapacity when size is known
Weak references only where needed (prefer unowned when safe)

Generics

Protocol types use some instead of any where possible
Hot paths use concrete types or @_specialize
Generic constraints are as specific as possible

Collections

Pure Swift code uses ContiguousArray over Array
Dictionary keys have efficient hash(into:) implementations
Lazy evaluation used for short-circuit operations

Concurrency

Synchronous operations don't use async
Actor calls are batched when possible
Task creation is minimized (use TaskGroup)
CPU-intensive work uses @concurrent (Swift 6.2)

Optimization

Profiling data exists before optimization
Inlining only for small, frequently called functions
Memory layout optimized for cache locality (large structs)

Pressure Scenarios

Scenario 1: "Just make it faster, we ship tomorrow"

The Pressure: Manager sees "slow" in profiler, demands immediate action.

Red Flags:

No baseline measurements
No Time Profiler data showing hotspots
"Make everything faster" without targets

Time Cost Comparison:

Premature optimization: 2 days of work, no measurable improvement
Profile-guided optimization: 2 hours profiling + 4 hours fixing actual bottleneck = 40% faster

How to Push Back Professionally:

"I want to optimize effectively. Let me spend 30 minutes with Instruments
to find the actual bottleneck. This prevents wasting time on code that's
not the problem. I've seen this save days of work."

Scenario 2: "Use actors everywhere for thread safety"

The Pressure: Team adopts Swift 6, decides "everything should be an actor."

Red Flags:

Actor for simple value types
Actor for synchronous-only operations
Async overhead in tight loops

Time Cost Comparison:

Actor everywhere: 100μs overhead per operation, janky UI
Appropriate isolation: 10μs overhead, smooth 60fps

How to Push Back Professionally:

"Actors are great for isolation, but they add overhead. For this simple
counter, lock-free atomics are 10x faster. Let's use actors where we need
them—shared mutable state—and avoid them for pure value types."

Scenario 3: "Inline everything for speed"

The Pressure: Someone reads that inlining is faster, marks everything @inlinable.

Red Flags:

Large functions marked @inlinable
Internal implementation details exposed
Binary size increases 50%

Time Cost Comparison:

Inline everything: Code bloat, slower app launch (3s → 5s)
Selective inlining: Fast launch, actual hotspots optimized

How to Push Back Professionally:

"Inlining trades code size for speed. The compiler already inlines when
beneficial. Manual @inlinable should be for small, frequently called
functions. Let's profile and inline the 3 actual hotspots, not everything."

Real-World Examples

Example 1: Image Processing Pipeline

Problem: Processing 1000 images takes 30 seconds.

Investigation:

// Original code
func processImages(_ images: [UIImage]) -> [ProcessedImage] {
    var results: [ProcessedImage] = []
    for image in images {
        results.append(expensiveProcess(image))  // Reallocations!
    }
    return results
}

Solution:

func processImages(_ images: [UIImage]) -> [ProcessedImage] {
    var results = ContiguousArray<ProcessedImage>()
    results.reserveCapacity(images.count)  // Single allocation

    for image in images {
        results.append(expensiveProcess(image))
    }

    return Array(results)
}

Result: 30s → 8s (73% faster) by eliminating reallocations.

Example 2: Actor Batching for Counter

Problem: Actor counter in tight loop causes UI jank.

Investigation:

// Original - 10,000 actor hops
for _ in 0..<10000 {
    await counter.increment()  // ~100μs each = 1 second total!
}

Solution:

// Batch operations
actor Counter {
    private var value = 0

    func incrementBatch(_ count: Int) {
        value += count
    }
}

await counter.incrementBatch(10000)  // Single actor hop

Result: 1000ms → 0.1ms (10,000x faster) by batching.

Example 3: Generic Specialization

Problem: Protocol-based rendering is slow.

Investigation:

// Original - existential overhead
func render(shapes: [any Shape]) {
    for shape in shapes {
        shape.draw()  // Dynamic dispatch
    }
}

Solution:

// Specialized generic
func render<S: Shape>(shapes: [S]) {
    for shape in shapes {
        shape.draw()  // Static dispatch after specialization
    }
}

// Or use @_specialize
@_specialize(where S == Circle)
@_specialize(where S == Rectangle)
func render<S: Shape>(shapes: [S]) { }

Result: 100ms → 10ms (10x faster) by eliminating witness table overhead.

Example 4: Apple Password Monitoring Migration

Problem: Apple's Password Monitoring service needed to scale while reducing costs.

Original Implementation: Java-based service

High memory usage (gigabytes)
50% Kubernetes cluster utilization
Moderate throughput

Swift Rewrite Benefits:

// Key performance wins from Swift's features:

// 1. Deterministic memory management (no GC pauses)
//    - No stop-the-world garbage collection
//    - Predictable latency for real-time processing

// 2. Value semantics + COW
//    - Efficient data sharing without defensive copying
//    - Reduced memory churn

// 3. Zero-cost abstractions
//    - Generic specialization eliminates runtime overhead
//    - Protocol conformances optimized away

Results (Apple's published metrics):

40% throughput increase vs Java implementation
100x memory reduction: Gigabytes → Megabytes
50% Kubernetes capacity freed: Same workload, half the resources

Why This Matters: This real-world production service demonstrates that the performance patterns in this skill (COW, value semantics, generic specialization, ARC) deliver measurable business impact at scale.

Source: Swift.org - Password Monitoring Case Study

Resources

WWDC Sessions

Session	Title	Key Topics
WWDC 2025-312	Improve memory usage and performance with Swift	InlineArray, Span, value generics, non-escapable types
WWDC 2024-10217	Explore Swift performance	Function calls, memory allocation, layout, copying
WWDC 2016-416	Understanding Swift Performance	Value vs reference, protocol witness tables, COW
WWDC 2021-10216	ARC in Swift: Basics and beyond	Object lifetimes, weak/unowned, withExtendedLifetime
WWDC 2024-10170	Consume noncopyable types in Swift	~Copyable, ownership, consuming/borrowing

Documentation

Related Axiom Skills

performance-profiling — Instruments workflows
swift-concurrency — Correctness-focused concurrency
swiftui-performance — SwiftUI-specific optimizations

Last Updated: 2025-12-18 Swift Version: 6.2+ (for InlineArray, Span, @concurrent) Status: Production-ready

Remember: Profile first, optimize later. Readability > micro-optimizations.

Install Skill

SKILL.md

Swift Performance Optimization

Purpose

When to Use This Skill

✅ Use this skill when

❌ Do NOT use this skill for

Quick Decision Tree

Part 1: Noncopyable Types (~Copyable)

When to Use

Basic Pattern

Ownership Annotations

Performance Impact

Part 2: Copy-on-Write (COW)

How COW Works

Custom COW Implementation

Performance Tips

Part 3: Value vs Reference Semantics

Decision Matrix

Small Structs (Fast)

Large Structs (Slow)

Indirect Storage for Flexibility

Part 4: ARC Optimization

Weak vs Unowned Performance

Closure Capture Optimization

Reducing Retain/Release Traffic

Observable Object Lifetimes

Part 5: Generics & Specialization

Specialization Basics

Existential Overhead

Existential Container Internals

@_specialize Attribute

any vs some

Part 6: Inlining

When to Inline

Cross-Module Optimization

@usableFromInline

Part 7: Collection Performance

Array vs ContiguousArray

Reserve Capacity

Dictionary Hashing

InlineArray (Swift 6.2)

Lazy Sequences

Part 8: Concurrency Performance

Actor Isolation Overhead

async/await vs Completion Handlers

Task Creation Cost

@concurrent Attribute (Swift 6.2)

nonisolated Performance

Part 9: Memory Layout

Struct Padding

Alignment

Cache-Friendly Data Structures

Part 10: Typed Throws (Swift 6)

Untyped vs Typed

Performance Impact

When to Use

Part 11: Span Types

What is Span?

When to Use Span vs Array vs UnsafeBufferPointer

Basic Span Usage

MutableSpan for Modifications

RawSpan for Untyped Bytes

Span-Providing Properties

Performance Characteristics

Non-Escapable Lifetime Safety

Integration with InlineArray

Migration from UnsafeBufferPointer

Common Patterns

When NOT to Use Span

Copy-Paste Patterns

Pattern 1: COW Wrapper

Pattern 2: Performance-Critical Loop

Pattern 3: Inline Cache Lookup

Anti-Patterns

❌ Anti-Pattern 1: Premature Optimization

❌ Anti-Pattern 2: Weak Everywhere

❌ Anti-Pattern 3: Actor for Everything

Code Review Checklist

Memory Management

`@_specialize` Attribute

`any` vs `some`

`@usableFromInline`

`@concurrent` Attribute (Swift 6.2)