| name | stan-development |
| description | Expert guidance for Stan probabilistic programming language development, including modern syntax, cmdstanr/cmdstanpy integration, and testing patterns |
Stan Development
Use this skill when working with Stan models for Bayesian inference and probabilistic programming, particularly when integrating with R (cmdstanr) or Python (cmdstanpy).
Modern Stan Syntax
Array Declarations (IMPORTANT)
Use new array syntax introduced in Stan 2.26+:
// Good - Modern syntax (Stan 2.26+)
array[10] int x; // Array of 10 integers
array[N] real y; // Array of N reals
array[N, M] real z; // 2D array
array[N] vector[K] v; // Array of vectors
// Avoid - Old syntax (deprecated)
int x[10]; // Old style
real y[N]; // Old style
real z[N, M]; // Old style
vector[K] v[N]; // Old style
Why the change?
- More consistent and readable
- Clearer distinction between arrays and matrix/vector types
- Aligns with modern Stan style
Other Key Syntax
Data types:
// Scalar types
int n;
real x;
// Vector/Matrix types
vector[N] v; // Column vector
row_vector[N] rv; // Row vector
matrix[N, M] A; // Matrix
// Arrays of vectors/matrices
array[K] vector[N] vectors;
array[K] matrix[N, M] matrices;
// Constrained types
real<lower=0> sigma; // Lower bound
real<lower=0, upper=1> theta; // Bounded
simplex[K] pi; // Simplex constraint
ordered[N] sorted_values; // Ordered constraint
Stan File Organization
Standard Structure
project/
├── inst/stan/ # Stan models (R packages)
│ ├── model.stan # Main model file
│ ├── functions/ # Reusable Stan functions
│ │ ├── likelihood.stan
│ │ ├── priors.stan
│ │ └── utils.stan
│ └── chunks/ # Reusable code chunks (optional)
└── stan/ # Alternative location (Python projects)
Modular Stan Code
Main model includes functions:
functions {
#include functions/likelihood.stan
#include functions/priors.stan
#include functions/utils.stan
}
data {
int<lower=0> N;
array[N] real y;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
// Use functions from included files
y ~ custom_likelihood(mu, sigma);
mu ~ custom_prior();
}
Benefits of modular approach:
- Reusable functions across models
- Easier testing of individual components
- Better organization for complex models
- Cleaner main model files
Stan Integration with R (cmdstanr)
Basic Workflow
library(cmdstanr)
# Compile Stan model
model <- cmdstan_model("path/to/model.stan")
# Prepare data (list with names matching Stan data block)
stan_data <- list(
N = 100,
y = rnorm(100)
)
# Fit model
fit <- model$sample(
data = stan_data,
chains = 4,
parallel_chains = 4,
iter_warmup = 1000,
iter_sampling = 1000
)
# Extract samples
draws <- fit$draws()
summary <- fit$summary()
Model Compilation and Caching
# Get model path from package
model_path <- system.file("stan", "model.stan", package = "mypackage")
# Compile with cmdstanr
model <- cmdstan_model(model_path)
# Package-specific model functions
# Many packages provide wrapper functions:
my_package_model <- get_model() # Returns compiled model
Inference Algorithms
# NUTS sampling (default, most robust)
fit <- model$sample(data = stan_data)
# Variational inference (faster, approximate)
fit <- model$variational(data = stan_data)
# Pathfinder (fast, approximate)
fit <- model$pathfinder(data = stan_data)
# Laplace approximation (very fast, approximate)
fit <- model$laplace(data = stan_data)
# Optimization (MAP estimate)
fit <- model$optimize(data = stan_data)
Testing Stan Functions
Exposing Stan Functions to R
In R packages with cmdstanr:
# Expose Stan functions for testing
# Typically in a package function
expose_stan_functions <- function() {
model_path <- system.file("stan", "functions.stan", package = "mypackage")
# Note: This typically works on Linux only
cmdstanr::cmdstan_model(model_path, compile_standalone = TRUE)
}
# Then in tests
test_that("Stan function works correctly", {
# Exposed functions are now available in R
result <- stan_function_name(args)
expect_equal(result, expected)
})
Common pattern in test setup:
# tests/testthat/setup.R
if (on_linux()) {
expose_stan_functions()
}
# tests/testthat/test-stan-functions.R
test_that("likelihood function works", {
skip_if_not(on_linux()) # Stan function exposure often Linux-only
result <- stan_likelihood(data, params)
expect_gt(result, -Inf)
})
Stan-R Interface Patterns
Data Preparation
# Convert R data structures to Stan format
stan_data_list <- list(
N = nrow(data),
K = ncol(X),
y = data$outcome,
X = as.matrix(X),
# Arrays use R vectors/matrices directly
group = as.integer(data$group)
)
# For complex models, packages often provide conversion functions
stan_data <- prepare_stan_data(preprocessed_data, model_spec)
Prior Specification as Data
Empirical Bayes approach - priors as data:
data {
// Data
int<lower=0> N;
array[N] real y;
// Priors as data (empirical Bayes)
real prior_mu_mean;
real<lower=0> prior_mu_sd;
real<lower=0> prior_sigma_alpha;
real<lower=0> prior_sigma_beta;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
// Use data-specified priors
mu ~ normal(prior_mu_mean, prior_mu_sd);
sigma ~ gamma(prior_sigma_alpha, prior_sigma_beta);
y ~ normal(mu, sigma);
}
Benefits:
- Priors can be updated without recompiling
- Easy to specify different priors for different models
- Enables automated prior selection
Distribution Lookup Systems
For packages with multiple distribution options:
# R side - get distribution ID
dist_id <- get_distribution_id("lognormal")
# Stan side - use distribution
stan_data <- list(
distribution = dist_id, # Integer ID
# ... other data
)
// Stan functions for distribution dispatch
real compute_lpdf(real y, int distribution, vector params) {
if (distribution == 1) { // Normal
return normal_lpdf(y | params[1], params[2]);
} else if (distribution == 2) { // Lognormal
return lognormal_lpdf(y | params[1], params[2]);
}
// ... other distributions
}
Common Stan Patterns
Vectorization
// Good - vectorized (much faster)
y ~ normal(mu, sigma);
// Avoid - loop (slower)
for (n in 1:N) {
y[n] ~ normal(mu, sigma);
}
Efficient Matrix Operations
// Use built-in matrix operations
vector[N] mu = X * beta; // Matrix-vector multiplication
// Avoid loops when vectorization possible
Handling Missing Data
data {
int<lower=0> N;
int<lower=0> N_obs;
array[N_obs] int<lower=1, upper=N> obs_idx;
array[N_obs] real y_obs;
}
parameters {
array[N] real y_latent;
real mu;
real<lower=0> sigma;
}
model {
// Likelihood for observed data
y_obs ~ normal(mu, sigma);
// Prior/model for all data
y_latent ~ normal(mu, sigma);
// Constrain observed values
y_latent[obs_idx] ~ normal(y_obs, 0.001); // Small noise
}
Debugging Stan Models
Common Issues
Divergences:
# Increase adapt_delta
fit <- model$sample(
data = stan_data,
adapt_delta = 0.95 # Default 0.8
)
Slow mixing:
# Use non-centered parameterization
# Reparameterize hierarchical models
Model diagnostics:
# Check convergence
fit$diagnostic_summary()
# Check R-hat, ESS
fit$summary(c("Rhat", "ess_bulk", "ess_tail"))
# Pairs plot for problematic parameters
bayesplot::mcmc_pairs(fit$draws(), pars = c("mu", "sigma"))
When to Use This Skill
Activate this skill when:
- Writing Stan models
- Integrating Stan with R packages (cmdstanr)
- Testing Stan functions
- Debugging Stan models
- Working with Bayesian hierarchical models
- Implementing custom likelihoods or priors in Stan
This skill provides Stan-specific development patterns. Project-specific model architecture and domain knowledge should remain in project CLAUDE.md files.