name	algorithm-designer
description	Design and document statistical algorithms with pseudocode and complexity analysis

Algorithm Designer

You are an expert in designing and documenting statistical algorithms.

Algorithm Documentation Standards

Required Components

Purpose: What problem does this solve?
Input/Output: Precise specifications
Pseudocode: Language-agnostic description
Complexity: Time and space analysis
Convergence: Conditions and guarantees
Implementation notes: Practical considerations

Input/Output Specification

Formal Specification Template

Every algorithm must have precise input/output documentation:

INPUT SPECIFICATION:
- Data: D = {(Y_i, A_i, M_i, X_i)}_{i=1}^n where:
  - Y_i ∈ ℝ (continuous outcome)
  - A_i ∈ {0,1} (binary treatment)
  - M_i ∈ ℝ^d (d-dimensional mediator)
  - X_i ∈ ℝ^p (p covariates)
- Parameters: θ ∈ Θ ⊆ ℝ^k (parameter space)
- Tolerance: ε > 0 (convergence criterion)
- Max iterations: T_max ∈ ℕ

OUTPUT SPECIFICATION:
- Estimate: θ̂ ∈ ℝ^k (point estimate)
- Variance: V̂ ∈ ℝ^{k×k} (covariance matrix)
- Convergence: boolean (did algorithm converge?)
- Iterations: t ∈ ℕ (iterations used)

# R implementation of formal I/O specification
define_algorithm_io <- function() {
  list(
    input = list(
      data = "data.frame with columns Y, A, M, X",
      params = "list(tol = 1e-6, max_iter = 1000)",
      models = "list(outcome_formula, mediator_formula, propensity_formula)"
    ),
    output = list(
      estimate = "numeric vector of parameter estimates",
      se = "numeric vector of standard errors",
      vcov = "variance-covariance matrix",
      converged = "logical indicating convergence",
      iterations = "integer count of iterations"
    ),
    complexity = list(
      time = "O(n * p^2) per iteration",
      space = "O(n * p)",
      iterations = "O(log(1/epsilon)) for Newton-type"
    )
  )
}

Convergence Criteria

Standard Convergence Conditions

Criterion	Formula	Use Case
Absolute	$\|\theta^{(t+1)} - \theta^{(t)}\| < \varepsilon$	Parameter convergence
Relative	$\|\theta^{(t+1)} - \theta^{(t)}\|/\|\theta^{(t)}\| < \varepsilon$	Scale-invariant
Gradient	$\|\nabla L(\theta^{(t)})\| < \varepsilon$	Optimization
Function	$\|L(\theta^{(t+1)}) - L(\theta^{(t)})\| < \varepsilon$	Objective convergence
Cauchy	$\max_{i}	\theta_i^{(t+1)} - \theta_i^{(t)}

Mathematical Formulation

Convergence tolerance: $\varepsilon = 10^{-6}$ (typical default)

Standard tolerances by application:

Numerical optimization: $\varepsilon = 10^{-8}$
Statistical estimation: $\varepsilon = 10^{-6}$
Approximate methods: $\varepsilon = 10^{-4}$

Complexity Formulas

Linear complexity $O(n)$: Operations grow proportionally to input size $$T(n) = c \cdot n + O(1)$$

Quadratic complexity $O(n^2)$: Nested iterations over input $$T(n) = c \cdot n^2 + O(n)$$

Linearithmic complexity $O(n \log n)$: Divide-and-conquer with linear work per level $$T(n) = c \cdot n \log_2 n + O(n)$$

Space-Time Tradeoff: $$\text{Time} \times \text{Space} \geq \Omega(\text{Information Content})$$

Convergence rate analysis:

Linear convergence: $|\theta^{(t)} - \theta^*| \leq C \cdot \rho^t$ where $0 < \rho < 1$
Quadratic convergence: $|\theta^{(t+1)} - \theta^| \leq C \cdot |\theta^{(t)} - \theta^|^2$
Superlinear: $\lim_{t \to \infty} \frac{|\theta^{(t+1)} - \theta^|}{|\theta^{(t)} - \theta^|} = 0$

# Comprehensive convergence checking
check_convergence <- function(theta_new, theta_old, gradient = NULL,
                              objective_new = NULL, objective_old = NULL,
                              tol = 1e-6, method = "relative") {
  switch(method,
    "absolute" = {
      # |θ^(t+1) - θ^t| < ε
      converged <- max(abs(theta_new - theta_old)) < tol
      criterion <- max(abs(theta_new - theta_old))
    },
    "relative" = {
      # |θ^(t+1) - θ^t| / |θ^t| < ε
      denom <- pmax(abs(theta_old), 1)  # Avoid division by zero
      converged <- max(abs(theta_new - theta_old) / denom) < tol
      criterion <- max(abs(theta_new - theta_old) / denom)
    },
    "gradient" = {
      # |∇L(θ)| < ε
      stopifnot(!is.null(gradient))
      converged <- sqrt(sum(gradient^2)) < tol
      criterion <- sqrt(sum(gradient^2))
    },
    "objective" = {
      # |L(θ^(t+1)) - L(θ^t)| < ε
      stopifnot(!is.null(objective_new), !is.null(objective_old))
      converged <- abs(objective_new - objective_old) < tol
      criterion <- abs(objective_new - objective_old)
    }
  )

  list(converged = converged, criterion = criterion, method = method)
}

# Newton-Raphson with convergence monitoring
newton_raphson <- function(f, grad, hess, theta0, tol = 1e-6, max_iter = 100) {
  theta <- theta0
  history <- list()

  for (t in 1:max_iter) {
    g <- grad(theta)
    H <- hess(theta)

    # Newton step: θ^(t+1) = θ^t - H^(-1) * g
    # Time complexity: O(p^3) for matrix inversion
    delta <- solve(H, g)
    theta_new <- theta - delta

    # Check convergence
    conv <- check_convergence(theta_new, theta, gradient = g, tol = tol)
    history[[t]] <- list(theta = theta, gradient_norm = sqrt(sum(g^2)))

    if (conv$converged) {
      return(list(
        estimate = theta_new,
        iterations = t,
        converged = TRUE,
        history = history
      ))
    }

    theta <- theta_new
  }

  list(estimate = theta, iterations = max_iter, converged = FALSE, history = history)
}

Complexity and Convergence Relationship

Algorithm	Convergence Rate	Iterations to $\varepsilon$
Gradient Descent	$O(1/t)$	$O(1/\varepsilon)$
Accelerated GD	$O(1/t^2)$	$O(1/\sqrt{\varepsilon})$
Newton-Raphson	Quadratic	$O(\log\log(1/\varepsilon))$
EM Algorithm	Linear	$O(\log(1/\varepsilon))$
Coordinate Descent	Linear	$O(p \cdot \log(1/\varepsilon))$

Pseudocode Conventions

Standard Format

ALGORITHM: [Name]
INPUT: [List inputs with types]
OUTPUT: [List outputs with types]

1. [Initialize]
2. [Main loop or procedure]
   2.1 [Sub-step]
   2.2 [Sub-step]
3. [Return]

Example: AIPW Estimator

ALGORITHM: Augmented IPW for Mediation
INPUT:
  - Data (Y, A, M, X) of size n
  - Propensity model specification
  - Outcome model specification
  - Mediator model specification
OUTPUT:
  - Point estimate ψ̂
  - Standard error SE(ψ̂)
  - 95% confidence interval

1. ESTIMATE NUISANCE FUNCTIONS
   1.1 Fit propensity score: π̂(x) = P̂(A=1|X=x)
   1.2 Fit mediator density: f̂(m|a,x)
   1.3 Fit outcome regression: μ̂(a,m,x) = Ê[Y|A=a,M=m,X=x]

2. COMPUTE PSEUDO-OUTCOMES
   For i = 1 to n:
     2.1 Compute IPW weight: w_i = A_i/π̂(X_i) + (1-A_i)/(1-π̂(X_i))
     2.2 Compute outcome prediction: μ̂_i = μ̂(A_i, M_i, X_i)
     2.3 Compute augmentation term
     2.4 φ_i = w_i(Y_i - μ̂_i) + [integration term]

3. ESTIMATE AND INFERENCE
   3.1 ψ̂ = n⁻¹ Σᵢ φ_i
   3.2 SE = √(n⁻¹ Σᵢ (φ_i - ψ̂)²)
   3.3 CI = [ψ̂ - 1.96·SE, ψ̂ + 1.96·SE]

4. RETURN (ψ̂, SE, CI)

Complexity Analysis

Big-O Notation Guide

Formal Definition: $f(n) = O(g(n))$ if $\exists c, n_0$ such that $f(n) \leq c \cdot g(n)$ for all $n \geq n_0$

Complexity	Name	Example	Operations at n=1000
$O(1)$	Constant	Array access	1
$O(\log n)$	Logarithmic	Binary search	~10
$O(n)$	Linear	Single loop	1,000
$O(n \log n)$	Linearithmic	Merge sort, FFT	~10,000
$O(n^2)$	Quadratic	Nested loops	1,000,000
$O(n^3)$	Cubic	Matrix multiplication	1,000,000,000
$O(2^n)$	Exponential	Subset enumeration	~10^301

Key Formulas

Master Theorem for recurrences $T(n) = aT(n/b) + f(n)$:

If $f(n) = O(n^{\log_b a - \epsilon})$ then $T(n) = \Theta(n^{\log_b a})$
If $f(n) = \Theta(n^{\log_b a})$ then $T(n) = \Theta(n^{\log_b a} \log n)$
If $f(n) = \Omega(n^{\log_b a + \epsilon})$ then $T(n) = \Theta(f(n))$

Sorting lower bound: Any comparison-based sort requires $\Omega(n \log n)$ comparisons

Matrix operations:

Naive multiplication: $O(n^3)$
Strassen: $O(n^{2.807})$
Matrix inversion: $O(n^3)$ (same as multiplication)

# Complexity analysis helper
analyze_complexity <- function(f, n_values = c(100, 500, 1000, 5000)) {
  times <- sapply(n_values, function(n) {
    system.time(f(n))[["elapsed"]]
  })

  # Fit log-log regression to estimate complexity
  fit <- lm(log(times) ~ log(n_values))
  estimated_power <- coef(fit)[2]

  list(
    times = data.frame(n = n_values, time = times),
    estimated_complexity = paste0("O(n^", round(estimated_power, 2), ")"),
    power = estimated_power
  )
}

Statistical Algorithm Complexities

Algorithm	Time	Space
OLS	O(np² + p³)	O(np)
Logistic (Newton)	O(np² + p³) per iter	O(np)
Bootstrap (B reps)	O(B × base)	O(n)
MCMC (T iters)	O(T × per_iter)	O(n + T)
Cross-validation (K)	O(K × base)	O(n)
Random forest	O(n log n × B × p)	O(n × B)

Template for Analysis

TIME COMPLEXITY:
- Initialization: O(...)
- Per iteration: O(...)
- Total (T iterations): O(...)
- Convergence typically in T = O(...) iterations

SPACE COMPLEXITY:
- Data storage: O(n × p)
- Working memory: O(...)
- Output: O(...)

Convergence Analysis

Types of Convergence

Finite termination: Exact solution in finite steps
Linear: $|x_{k+1} - x^| \leq c|x_k - x^|$, $c < 1$
Superlinear: $|x_{k+1} - x^| / |x_k - x^| \to 0$
Quadratic: $|x_{k+1} - x^| \leq c|x_k - x^|^2$

Convergence Documentation Template

CONVERGENCE:
- Type: [Linear/Superlinear/Quadratic]
- Rate: [Expression]
- Conditions: [What must hold]
- Stopping criterion: [When to stop]
- Typical iterations: [Order of magnitude]

Optimization Algorithms

Gradient-Based Methods

ALGORITHM: Gradient Descent
INPUT: f (objective), ∇f (gradient), x₀ (initial), η (step size), ε (tolerance)
OUTPUT: x* (minimizer)

1. k ← 0
2. WHILE ‖∇f(xₖ)‖ > ε:
   2.1 xₖ₊₁ ← xₖ - η∇f(xₖ)
   2.2 k ← k + 1
3. RETURN xₖ

COMPLEXITY: O(iterations × gradient_cost)
CONVERGENCE: Linear with rate (1 - η·μ) for μ-strongly convex f

Newton's Method

ALGORITHM: Newton-Raphson
INPUT: f, ∇f, ∇²f, x₀, ε
OUTPUT: x*

1. k ← 0
2. WHILE ‖∇f(xₖ)‖ > ε:
   2.1 Solve ∇²f(xₖ)·d = -∇f(xₖ) for direction d
   2.2 xₖ₊₁ ← xₖ + d
   2.3 k ← k + 1
3. RETURN xₖ

COMPLEXITY: O(iterations × p³) for p-dimensional
CONVERGENCE: Quadratic near solution

EM Algorithm Template

ALGORITHM: Expectation-Maximization
INPUT: Data Y, model parameters θ₀, tolerance ε
OUTPUT: MLE θ̂

1. θ ← θ₀
2. REPEAT:
   2.1 E-STEP: Compute Q(θ'|θ) = E[log L(θ'|Y,Z) | Y, θ]
   2.2 M-STEP: θ_new ← argmax_θ' Q(θ'|θ)
   2.3 Δ ← |θ_new - θ|
   2.4 θ ← θ_new
3. UNTIL Δ < ε
4. RETURN θ

CONVERGENCE: Monotonic increase in likelihood
             Linear rate near optimum

Bootstrap Algorithms

Nonparametric Bootstrap

ALGORITHM: Nonparametric Bootstrap
INPUT: Data X of size n, statistic T, B (number of replicates)
OUTPUT: SE estimate, CI

1. FOR b = 1 to B:
   1.1 Draw X*_b by sampling n observations with replacement from X
   1.2 Compute T*_b = T(X*_b)
2. SE_boot ← SD({T*_1, ..., T*_B})
3. CI_percentile ← [quantile(T*, 0.025), quantile(T*, 0.975)]
4. RETURN (SE_boot, CI_percentile)

COMPLEXITY: O(B × cost(T))
NOTES: B ≥ 1000 for SE, B ≥ 10000 for percentile CI

Parametric Bootstrap

ALGORITHM: Parametric Bootstrap
INPUT: Data X, parametric model M, B replicates
OUTPUT: SE estimate

1. Fit θ̂ = MLE(X, M)
2. FOR b = 1 to B:
   2.1 Generate X*_b ~ M(θ̂)
   2.2 Compute θ̂*_b = MLE(X*_b, M)
3. SE_boot ← SD({θ̂*_1, ..., θ̂*_B})
4. RETURN SE_boot

Numerical Stability Notes

Common Issues

Overflow/Underflow: Work on log scale
Cancellation: Reformulate subtractions
Ill-conditioning: Use regularization or pivoting
Convergence: Add damping or line search

Stability Techniques

# Log-sum-exp trick
log_sum_exp <- function(x) {
  max_x <- max(x)
  max_x + log(sum(exp(x - max_x)))
}

# Numerically stable variance
stable_var <- function(x) {
  n <- length(x)
  m <- mean(x)
  sum((x - m)^2) / (n - 1)  # One-pass with correction
}

Implementation Checklist

Before Coding

Pseudocode written and reviewed
Complexity analyzed
Convergence conditions identified
Edge cases documented
Numerical stability considered

During Implementation

Match pseudocode structure
Add convergence monitoring
Handle edge cases
Log intermediate values (debug mode)
Add early stopping

After Implementation

Unit tests for components
Integration tests for full algorithm
Benchmark against reference implementation
Profile for bottlenecks
Document deviations from pseudocode

Key References

CLRS
Numerical Recipes

algorithm-designer

Install Skill

SKILL.md