name	stan-development
description	Expert guidance for Stan probabilistic programming language development, including modern syntax, cmdstanr/cmdstanpy integration, and testing patterns

Stan Development

Use this skill when working with Stan models for Bayesian inference and probabilistic programming, particularly when integrating with R (cmdstanr) or Python (cmdstanpy).

Modern Stan Syntax

Array Declarations (IMPORTANT)

Use new array syntax introduced in Stan 2.26+:

// Good - Modern syntax (Stan 2.26+)
array[10] int x;              // Array of 10 integers
array[N] real y;              // Array of N reals
array[N, M] real z;           // 2D array
array[N] vector[K] v;         // Array of vectors

// Avoid - Old syntax (deprecated)
int x[10];                    // Old style
real y[N];                    // Old style
real z[N, M];                 // Old style
vector[K] v[N];               // Old style

Why the change?

More consistent and readable
Clearer distinction between arrays and matrix/vector types
Aligns with modern Stan style

Other Key Syntax

Data types:

// Scalar types
int n;
real x;

// Vector/Matrix types
vector[N] v;           // Column vector
row_vector[N] rv;      // Row vector
matrix[N, M] A;        // Matrix

// Arrays of vectors/matrices
array[K] vector[N] vectors;
array[K] matrix[N, M] matrices;

// Constrained types
real<lower=0> sigma;                    // Lower bound
real<lower=0, upper=1> theta;          // Bounded
simplex[K] pi;                         // Simplex constraint
ordered[N] sorted_values;              // Ordered constraint

Stan File Organization

Standard Structure

project/
├── inst/stan/                 # Stan models (R packages)
│   ├── model.stan            # Main model file
│   ├── functions/            # Reusable Stan functions
│   │   ├── likelihood.stan
│   │   ├── priors.stan
│   │   └── utils.stan
│   └── chunks/               # Reusable code chunks (optional)
└── stan/                      # Alternative location (Python projects)

Modular Stan Code

Main model includes functions:

functions {
  #include functions/likelihood.stan
  #include functions/priors.stan
  #include functions/utils.stan
}

data {
  int<lower=0> N;
  array[N] real y;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  // Use functions from included files
  y ~ custom_likelihood(mu, sigma);
  mu ~ custom_prior();
}

Benefits of modular approach:

Reusable functions across models
Easier testing of individual components
Better organization for complex models
Cleaner main model files

Stan Integration with R (cmdstanr)

Basic Workflow

library(cmdstanr)

# Compile Stan model
model <- cmdstan_model("path/to/model.stan")

# Prepare data (list with names matching Stan data block)
stan_data <- list(
  N = 100,
  y = rnorm(100)
)

# Fit model
fit <- model$sample(
  data = stan_data,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000,
  iter_sampling = 1000
)

# Extract samples
draws <- fit$draws()
summary <- fit$summary()

Model Compilation and Caching

# Get model path from package
model_path <- system.file("stan", "model.stan", package = "mypackage")

# Compile with cmdstanr
model <- cmdstan_model(model_path)

# Package-specific model functions
# Many packages provide wrapper functions:
my_package_model <- get_model()  # Returns compiled model

Inference Algorithms

# NUTS sampling (default, most robust)
fit <- model$sample(data = stan_data)

# Variational inference (faster, approximate)
fit <- model$variational(data = stan_data)

# Pathfinder (fast, approximate)
fit <- model$pathfinder(data = stan_data)

# Laplace approximation (very fast, approximate)
fit <- model$laplace(data = stan_data)

# Optimization (MAP estimate)
fit <- model$optimize(data = stan_data)

Testing Stan Functions

Exposing Stan Functions to R

In R packages with cmdstanr:

# Expose Stan functions for testing
# Typically in a package function
expose_stan_functions <- function() {
  model_path <- system.file("stan", "functions.stan", package = "mypackage")
  # Note: This typically works on Linux only
  cmdstanr::cmdstan_model(model_path, compile_standalone = TRUE)
}

# Then in tests
test_that("Stan function works correctly", {
  # Exposed functions are now available in R
  result <- stan_function_name(args)
  expect_equal(result, expected)
})

Common pattern in test setup:

# tests/testthat/setup.R
if (on_linux()) {
  expose_stan_functions()
}

# tests/testthat/test-stan-functions.R
test_that("likelihood function works", {
  skip_if_not(on_linux())  # Stan function exposure often Linux-only

  result <- stan_likelihood(data, params)
  expect_gt(result, -Inf)
})

Stan-R Interface Patterns

Data Preparation

# Convert R data structures to Stan format
stan_data_list <- list(
  N = nrow(data),
  K = ncol(X),
  y = data$outcome,
  X = as.matrix(X),
  # Arrays use R vectors/matrices directly
  group = as.integer(data$group)
)

# For complex models, packages often provide conversion functions
stan_data <- prepare_stan_data(preprocessed_data, model_spec)

Prior Specification as Data

Empirical Bayes approach - priors as data:

data {
  // Data
  int<lower=0> N;
  array[N] real y;

  // Priors as data (empirical Bayes)
  real prior_mu_mean;
  real<lower=0> prior_mu_sd;
  real<lower=0> prior_sigma_alpha;
  real<lower=0> prior_sigma_beta;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  // Use data-specified priors
  mu ~ normal(prior_mu_mean, prior_mu_sd);
  sigma ~ gamma(prior_sigma_alpha, prior_sigma_beta);
  y ~ normal(mu, sigma);
}

Benefits:

Priors can be updated without recompiling
Easy to specify different priors for different models
Enables automated prior selection

Distribution Lookup Systems

For packages with multiple distribution options:

# R side - get distribution ID
dist_id <- get_distribution_id("lognormal")

# Stan side - use distribution
stan_data <- list(
  distribution = dist_id,  # Integer ID
  # ... other data
)

// Stan functions for distribution dispatch
real compute_lpdf(real y, int distribution, vector params) {
  if (distribution == 1) {  // Normal
    return normal_lpdf(y | params[1], params[2]);
  } else if (distribution == 2) {  // Lognormal
    return lognormal_lpdf(y | params[1], params[2]);
  }
  // ... other distributions
}

Common Stan Patterns

Vectorization

// Good - vectorized (much faster)
y ~ normal(mu, sigma);

// Avoid - loop (slower)
for (n in 1:N) {
  y[n] ~ normal(mu, sigma);
}

Efficient Matrix Operations

// Use built-in matrix operations
vector[N] mu = X * beta;  // Matrix-vector multiplication

// Avoid loops when vectorization possible

Handling Missing Data

data {
  int<lower=0> N;
  int<lower=0> N_obs;
  array[N_obs] int<lower=1, upper=N> obs_idx;
  array[N_obs] real y_obs;
}

parameters {
  array[N] real y_latent;
  real mu;
  real<lower=0> sigma;
}

model {
  // Likelihood for observed data
  y_obs ~ normal(mu, sigma);

  // Prior/model for all data
  y_latent ~ normal(mu, sigma);

  // Constrain observed values
  y_latent[obs_idx] ~ normal(y_obs, 0.001);  // Small noise
}

Debugging Stan Models

Common Issues

Divergences:

# Increase adapt_delta
fit <- model$sample(
  data = stan_data,
  adapt_delta = 0.95  # Default 0.8
)

Slow mixing:

# Use non-centered parameterization
# Reparameterize hierarchical models

Model diagnostics:

# Check convergence
fit$diagnostic_summary()

# Check R-hat, ESS
fit$summary(c("Rhat", "ess_bulk", "ess_tail"))

# Pairs plot for problematic parameters
bayesplot::mcmc_pairs(fit$draws(), pars = c("mu", "sigma"))

When to Use This Skill

Activate this skill when:

Writing Stan models
Integrating Stan with R packages (cmdstanr)
Testing Stan functions
Debugging Stan models
Working with Bayesian hierarchical models
Implementing custom likelihoods or priors in Stan

This skill provides Stan-specific development patterns. Project-specific model architecture and domain knowledge should remain in project CLAUDE.md files.

stan-development

Install Skill

SKILL.md