name	adaptive-rejection-sampler
description	Guidance for implementing adaptive rejection sampling (ARS) algorithms for generating random samples from log-concave probability distributions. This skill should be used when tasks involve implementing ARS, rejection sampling, or Monte Carlo methods that require sampling from custom probability distributions, particularly in R or other statistical computing languages.

Adaptive Rejection Sampler

Overview

This skill provides guidance for implementing adaptive rejection sampling (ARS) as described by Gilks et al. (1992). ARS is a method for generating random samples from log-concave probability distributions by constructing piecewise linear upper and lower bounds (envelopes) that adapt as samples are drawn.

Algorithm Foundation

Before writing any code, document the mathematical foundations:

Core Components to Understand

Log-concavity requirement: The target density f(x) must be log-concave, meaning log(f(x)) is concave
Upper hull construction: Piecewise linear envelope above log(f(x)) using tangent lines at abscissae
Lower hull (squeeze function): Piecewise linear envelope below log(f(x)) connecting abscissae
Rejection sampling: Sample from exponential of upper hull, accept based on squeeze/density comparison
Adaptive updating: Add rejected points as new abscissae to tighten the envelope

Algorithm Pseudocode

Document these steps before implementing:

1. Initialize with k ≥ 2 abscissae where log-density is finite
2. Construct upper hull from tangent lines at abscissae
3. Construct lower hull connecting abscissae points
4. Loop:
   a. Sample x* from exponential of upper hull
   b. Sample u ~ Uniform(0,1)
   c. If u ≤ exp(lower_hull(x*) - upper_hull(x*)):
      Accept x* (squeeze test)
   d. Else if u ≤ exp(log_f(x*) - upper_hull(x*)):
      Accept x* (rejection test)
      Add x* to abscissae
   e. Else:
      Reject x*
      Add x* to abscissae
5. Return accepted samples

Implementation Approach

Step 1: Verify File Writes

After every file write operation, read back the file to confirm content is complete. Truncated writes are a common failure mode that can leave syntax errors or incomplete functions.

# After writing, always verify:
source("ars.R")  # Will error on syntax issues
# Or read and check line count matches expected

Step 2: Implement Log-Concavity Check

The log-concavity check is critical and error-prone:

# Common mistake: Using strict inequality
# WRONG: if (diff2 < 0) { not_concave }
# CORRECT: if (diff2 <= tol) { is_concave }  # with small tolerance

# For numerical stability, use tolerance:
tol <- 1e-8  # Document why this tolerance was chosen
is_log_concave <- all(second_differences <= tol)

Rationale for tolerance: Floating-point arithmetic can produce small positive values (e.g., 1e-15) for theoretically zero second differences. A tolerance of 1e-8 accommodates numerical noise while still detecting true non-concavity.

Step 3: Handle Domain Boundaries

Implement explicit handling for:

Unbounded domains (-∞, +∞): Initialize with points spanning the mode
Lower-bounded domains [a, +∞): Include point near lower bound
Upper-bounded domains (-∞, b]: Include point near upper bound
Bounded domains [a, b]: Include points near both bounds

# Example initialization strategy for different domains:
if (is.infinite(lower) && is.infinite(upper)) {
  # Unbounded: span around mode
  x_init <- c(mode - 2*sd, mode, mode + 2*sd)
} else if (!is.infinite(lower) && is.infinite(upper)) {
  # Lower-bounded: start from boundary
  x_init <- c(lower + 0.01, lower + 1, lower + 5)
}

Step 4: Initialization Point Selection

Poor initialization is a major source of failures:

Find the mode first: Use optimization to locate the density mode
Ensure points span the mode: Initial abscissae should be on both sides of the mode
Verify log-density is finite: All initial points must have finite log(f(x))
Use domain-appropriate spacing: Wider for heavy-tailed distributions

# Verify initialization before sampling
for (x in x_init) {
  log_val <- log_f(x)
  if (!is.finite(log_val)) {
    stop(paste("Initial point", x, "has non-finite log-density"))
  }
}

Step 5: Numerical Stability

Guard against common numerical issues:

# Prevent underflow in exponentials
log_envelope <- pmin(log_envelope, 700)  # exp(700) ≈ 1e304

# Prevent division by zero in hull construction
if (abs(x[i+1] - x[i]) < 1e-10) {
  # Handle degenerate case
}

# Use log-sum-exp trick for probability calculations
log_p <- log_envelope - log_sum_exp(log_envelope)

Step 6: Convergence Safeguards

Implement maximum iteration limits and diagnostics:

max_iter <- n_samples * 100  # Reasonable limit
iter <- 0
while (length(samples) < n_samples && iter < max_iter) {
  iter <- iter + 1
  # ... sampling logic
}
if (iter >= max_iter) {
  warning("Maximum iterations reached; returning partial sample")
}

Verification Strategies

Statistical Testing

Use proper statistical tests, not just moment comparisons:

# Kolmogorov-Smirnov test (preferred)
ks_test <- ks.test(samples, reference_cdf)
if (ks_test$p.value < 0.01) {
  warning("KS test suggests samples don't match target distribution")
}

# Chi-square goodness of fit for discrete comparisons
# Quantile-quantile plots for visual verification

Moment-Based Checks (with proper tolerances)

If using moment comparisons, justify tolerances:

# Tolerance should scale with sample size
# Standard error of mean ≈ sd / sqrt(n)
n <- length(samples)
se_mean <- sd(samples) / sqrt(n)
tolerance_mean <- 4 * se_mean  # ~99.99% CI

if (abs(mean(samples) - expected_mean) > tolerance_mean) {
  warning("Sample mean outside expected range")
}

Test Cases to Include

Standard normal: Mean=0, SD=1 (baseline)
Shifted normal: Mean≠0 to test mode-finding
Exponential: Tests handling of semi-bounded domain
Truncated distributions: Tests boundary handling
Non-log-concave (should reject): Mixture distributions to verify detection

Modular Testing

Test components in isolation before integration:

# Test hull construction separately
test_upper_hull <- function() {
  # Verify hull is above log-density at test points
}

# Test sampling from envelope separately
test_envelope_sampling <- function() {
  # Verify samples follow envelope distribution
}

Common Pitfalls

1. Incomplete File Writes

Always verify file content after writing. Truncated writes produce syntax errors that may not surface until runtime.

2. Strict vs. Non-Strict Inequality in Log-Concavity

Using < instead of <= (with tolerance) causes rejection of valid distributions like the exponential where second derivatives are exactly zero.

3. Initialization Outside Mode Region

When initial points don't span the mode, the envelope may be poor and sampling becomes extremely inefficient or biased.

4. Missing Boundary Handling

Failing to handle bounded domains causes samples to violate support constraints.

5. Insufficient Error Messages

Generic errors make debugging difficult. Include context in all error messages:

# BAD
stop("Invalid input")

# GOOD
stop(paste("Log-density is -Inf at x =", x,
           "; ensure initial points are in distribution support"))

6. No Convergence Limits

Infinite loops can occur with poorly-initialized samplers. Always include maximum iteration counts.

7. Masking Root Causes

When tests fail, understand why before fixing. Expanding initialization points may work but could mask deeper algorithmic issues.

Performance Considerations

Envelope updates have O(k) complexity where k is number of abscissae
Consider limiting maximum number of abscissae to prevent O(n²) behavior
Pre-allocate output vectors for large sample sizes
Profile hot paths if performance is critical

References

When implementing, verify against the original paper:

Gilks, W.R. and Wild, P. (1992). "Adaptive Rejection Sampling for Gibbs Sampling." Applied Statistics, 41(2), 337-348.

adaptive-rejection-sampler

Install Skill

SKILL.md