| name | methods-paper-writer |
| description | JASA/Biometrika manuscript structure with VanderWeele notation standards |
Methods Paper Writer
Comprehensive guide for writing statistical methodology manuscripts
Use this skill when working on: methodology manuscripts, journal submissions, methods sections, simulation study write-ups, theoretical results presentation, or adapting papers for specific journals (JASA, Biometrika, Biostatistics).
JASA Format
Journal of the American Statistical Association Requirements
| Element | JASA Requirement |
|---|---|
| Page limit | ~25 pages main text + unlimited supplement |
| Abstract | 150-200 words, no math symbols |
| Keywords | 3-6 keywords after abstract |
| Sections | Standard: Intro, Methods, Theory, Simulation, Application, Discussion |
| References | Author-year format (natbib) |
| Figures | High resolution, grayscale-compatible |
| Code | Reproducibility materials required |
# JASA-compliant simulation results table
create_jasa_table <- function(results_df) {
# Format for JASA: clean, no vertical lines, proper decimal alignment
results_df %>%
mutate(across(where(is.numeric), ~sprintf("%.3f", .))) %>%
kable(format = "latex",
booktabs = TRUE,
align = c("l", rep("r", ncol(.) - 1)),
caption = "Simulation results: Bias, SE, and Coverage") %>%
kable_styling(latex_options = "hold_position") %>%
add_header_above(c(" " = 1, "n = 200" = 3, "n = 500" = 3))
}
JASA LaTeX Template
\documentclass[12pt]{article}
\usepackage{natbib}
\usepackage{amsmath,amssymb}
\usepackage{graphicx}
\usepackage{booktabs}
\title{Your Title Here}
\author{Author One\thanks{Department, University, email} \and
Author Two\thanks{Department, University, email}}
\date{}
\begin{document}
\maketitle
\begin{abstract}
Your abstract here (150-200 words, no math symbols).
\end{abstract}
\noindent\textbf{Keywords:} keyword1; keyword2; keyword3
Introduction Structure
The 6-Paragraph Introduction Formula
| Paragraph | Purpose | Word Count |
|---|---|---|
| 1 | Hook + Scientific Problem | 100-150 |
| 2 | Existing Methods | 150-200 |
| 3 | Gap/Limitation | 100-150 |
| 4 | Our Contribution | 150-200 |
| 5 | Results Preview | 100-150 |
| 6 | Paper Organization | 50-100 |
# Template for tracking introduction components
intro_checklist <- function() {
data.frame(
paragraph = 1:6,
element = c("Hook + Problem", "Literature", "Gap",
"Contribution", "Results", "Organization"),
key_phrases = c(
"is fundamental to..., has important implications for...",
"Existing methods include..., Prior work has...",
"However, current approaches cannot..., A key limitation is...",
"We propose..., Our method..., We develop...",
"We show that..., Simulations demonstrate..., Application reveals...",
"The remainder of this paper is organized as follows..."
),
status = rep("pending", 6)
)
}
Simulation Section
Standard Simulation Study Structure
1. Simulation Design
- Data generating process (DGP)
- Sample sizes
- Number of replications
- Scenarios/conditions
2. Methods Compared
- Proposed method
- Competing methods (2-4)
- Oracle/benchmark
3. Performance Metrics
- Bias
- Standard error / RMSE
- Coverage probability
- Efficiency (relative to oracle)
4. Results
- Tables by scenario
- Figures for key patterns
- Sensitivity analyses
# Complete simulation template for mediation methods paper
run_simulation_study <- function(n_sims = 1000, n_vec = c(200, 500, 1000)) {
scenarios <- expand.grid(
n = n_vec,
misspecification = c("none", "outcome", "mediator", "both"),
effect_size = c("small", "medium", "large")
)
results <- map_dfr(1:nrow(scenarios), function(i) {
scenario <- scenarios[i, ]
replicate_results <- replicate(n_sims, {
# Generate data under scenario
data <- generate_dgp(
n = scenario$n,
misspec = scenario$misspecification,
effect = scenario$effect_size
)
# Apply all methods
list(
proposed = proposed_method(data),
baron_kenny = baron_kenny(data),
product = product_method(data),
bootstrap = bootstrap_method(data)
)
}, simplify = FALSE)
# Summarize across replications
summarize_simulation(replicate_results, true_effect)
})
results
}
# Standard metrics calculation
calculate_metrics <- function(estimates, true_value, ses) {
list(
bias = mean(estimates) - true_value,
empirical_se = sd(estimates),
mean_se = mean(ses),
rmse = sqrt(mean((estimates - true_value)^2)),
coverage = mean(abs(estimates - true_value) < 1.96 * ses)
)
}
Notation Conventions
Standard Statistical Notation
| Symbol | Meaning | Usage |
|---|---|---|
| $Y$ | Outcome | Capital for random variable |
| $y$ | Observed value | Lowercase for realization |
| $A$ | Treatment | Binary: $A \in {0,1}$ |
| $M$ | Mediator | Can be vector $\mathbf{M}$ |
| $X$ | Covariates | Often $\mathbf{X}$ for vector |
| $\theta$ | Parameter | Target of estimation |
| $\hat{\theta}$ | Estimator | Hat for estimate |
| $P, \mathbb{P}$ | Probability | Distribution |
| $E, \mathbb{E}$ | Expectation | Expected value |
VanderWeele Mediation Notation
% Standard potential outcomes notation
Y(a) % Outcome under treatment a
M(a) % Mediator under treatment a
Y(a,m) % Outcome under treatment a and mediator m
% Mediation effects
NDE(a) = E[Y(1,M(a)) - Y(0,M(a))] % Natural direct effect
NIE(a) = E[Y(a,M(1)) - Y(a,M(0))] % Natural indirect effect
TE = NDE + NIE % Total effect decomposition
Figure Guidelines
JASA Figure Requirements
| Aspect | Requirement |
|---|---|
| Resolution | 300+ DPI for print |
| Format | PDF or EPS preferred |
| Colors | Must work in grayscale |
| Font size | Legible at print size (8pt minimum) |
| Legends | Inside figure, not separate |
| Captions | Below figure, complete description |
# JASA-compliant ggplot theme
theme_jasa <- function() {
theme_bw(base_size = 11) +
theme(
panel.grid.minor = element_blank(),
panel.grid.major = element_line(color = "gray90"),
strip.background = element_rect(fill = "gray95"),
legend.position = "bottom",
legend.box = "horizontal",
axis.text = element_text(size = 9),
axis.title = element_text(size = 10),
plot.title = element_text(size = 11, face = "bold")
)
}
# Create publication-ready figure
create_simulation_figure <- function(results) {
ggplot(results, aes(x = n, y = bias, shape = method, linetype = method)) +
geom_point(size = 2) +
geom_line() +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
facet_wrap(~scenario, scales = "free_y") +
scale_shape_manual(values = c(16, 17, 15, 18)) +
scale_linetype_manual(values = c("solid", "dashed", "dotted", "dotdash")) +
labs(
x = "Sample Size",
y = "Bias",
shape = "Method",
linetype = "Method"
) +
theme_jasa()
ggsave("figure1.pdf", width = 7, height = 5, dpi = 300)
}
Manuscript Structure
Standard Methods Paper Sections
1. Title
2. Abstract (structured or unstructured)
3. Introduction
4. Methods / Methodology
- Notation and Setup
- Identification
- Estimation
- Inference
5. Simulation Study
6. Application / Data Analysis
7. Discussion
8. Acknowledgments
9. References
10. Appendix / Supplementary Materials
- Proofs
- Additional simulations
- Implementation details
Section-by-Section Guidelines
1. Title
Formula: [Method/Approach] for [Problem/Setting]
Examples:
- "Efficient Estimation of Natural Direct and Indirect Effects"
- "Double Robust Inference for Mediation Analysis with Unmeasured Confounding"
- "A Semiparametric Approach to Sequential Mediation Analysis"
Tips:
- Lead with the contribution (method name or key concept)
- Include the setting/problem
- Avoid jargon unless widely known
- Keep under 15 words
2. Abstract
Structure (150-250 words):
[1-2 sentences: Problem/motivation]
[1-2 sentences: Gap in existing methods]
[2-3 sentences: Our contribution/approach]
[1-2 sentences: Key results - theory + empirical]
[1 sentence: Implications/availability]
Example:
Mediation analysis is fundamental for understanding causal mechanisms in health research. Existing methods for sequential mediation assume correctly specified parametric models and cannot accommodate high-dimensional confounders. We develop a doubly robust estimator for sequential mediation effects that remains consistent when either the outcome or mediator models are correctly specified. We derive the efficient influence function and show our estimator achieves the semiparametric efficiency bound. Simulations demonstrate substantial efficiency gains over existing approaches, particularly under model misspecification. We apply our method to study the pathway from childhood adversity through inflammation to adult depression using MIDUS data. Software is available in the R package medrobust.
3. Introduction
Structure (4-6 paragraphs):
Paragraph 1: Problem and Motivation
- State the scientific problem
- Why does it matter?
- Concrete example/application
Paragraph 2: Existing Approaches
- What methods exist?
- What do they accomplish?
- (Be fair and accurate)
Paragraph 3: Gap/Limitation
- What can't current methods do?
- Why is this a problem?
- Make the need compelling
Paragraph 4: Our Contribution
- What do we propose?
- How does it address the gap?
- Key properties (robust, efficient, etc.)
Paragraph 5: Results Preview
- What do we show theoretically?
- What do simulations demonstrate?
- What does the application reveal?
Paragraph 6: Paper Organization
- "The remainder of this paper is organized as follows..."
- Brief section-by-section overview
Tips:
- Start broad, narrow to specific contribution
- Cite 3-5 key papers per existing approach
- Don't oversell or bash competitors
- Be specific about contributions
4. Notation and Setup
Template:
\section{Notation and Setup}
\label{sec:setup}
Let $O = (Y, A, M, X)$ denote the observed data, where:
\begin{itemize}
\item $Y \in \mathcal{Y}$ is the outcome of interest
\item $A \in \{0,1\}$ is the binary treatment
\item $M \in \mathcal{M}$ is the mediator
\item $X \in \mathcal{X}$ is a vector of pre-treatment confounders
\end{itemize}
We assume $n$ i.i.d. copies $O_1, \ldots, O_n$ from distribution $P$.
\subsection{Causal Framework}
We adopt the potential outcomes framework \citep{Rubin1974}. Let $Y(a)$
denote the potential outcome under treatment $A=a$, and $Y(a,m)$ the
potential outcome when treatment is set to $a$ and mediator to $m$.
Tips:
- Define ALL notation before use
- Use consistent notation throughout
- Follow field conventions (VanderWeele for mediation)
- Keep notation minimal but precise
5. Identification
Structure:
\section{Identification}
\label{sec:identification}
\subsection{Target Estimand}
Our target estimand is [precise definition with formula].
\subsection{Identification Assumptions}
We require the following assumptions:
\begin{assumption}[Consistency]
\label{A:consistency}
$Y = Y(A, M)$ and $M = M(A)$.
\end{assumption}
[... additional assumptions ...]
\subsection{Identification Result}
\begin{theorem}[Identification]
\label{thm:identification}
Under Assumptions \ref{A:consistency}--\ref{A:positivity},
the estimand $\psi$ is identified by [formula].
\end{theorem}
Tips:
- Number assumptions (A1, A2, ... or Assumption 1, 2, ...)
- State assumptions precisely
- Discuss plausibility of each assumption
- Proof in main text if simple, appendix if long
6. Estimation
Structure:
\section{Estimation}
\label{sec:estimation}
\subsection{Proposed Estimator}
Based on the identification result, we propose the estimator:
\begin{equation}
\hat{\psi}_n = [estimator formula]
\end{equation}
\subsection{Nuisance Estimation}
The estimator depends on nuisance functions $\eta = (\mu, \pi, \ldots)$.
We estimate these using [approach].
\subsection{Algorithm}
[Pseudocode or step-by-step procedure]
Tips:
- Motivate why this estimator (efficiency, robustness)
- Be explicit about nuisance estimation
- Provide algorithm/pseudocode for implementation
- Discuss computational considerations
7. Asymptotic Properties
Structure:
\section{Asymptotic Properties}
\label{sec:theory}
\subsection{Regularity Conditions}
We impose the following regularity conditions:
\begin{condition}
\label{C1}
[Condition statement]
\end{condition}
\subsection{Main Result}
\begin{theorem}[Asymptotic Normality]
\label{thm:asymptotics}
Under Conditions \ref{C1}--\ref{Cn}, as $n \to \infty$:
\[
\sqrt{n}(\hat{\psi}_n - \psi_0) \xrightarrow{d} N(0, V)
\]
where $V = E[\phi(O)^2]$ and $\phi$ is the influence function given by [formula].
\end{theorem}
\subsection{Variance Estimation}
Consistent variance estimation via [approach].
\subsection{Efficiency} [optional]
\begin{theorem}[Semiparametric Efficiency]
The estimator $\hat{\psi}_n$ achieves the semiparametric efficiency bound.
\end{theorem}
Tips:
- State conditions clearly (not buried in proof)
- Main results in theorems, not prose
- Provide intuition for influence function
- Proofs typically in appendix
8. Simulation Study
Structure:
\section{Simulation Study}
\label{sec:simulation}
\subsection{Design}
We assess finite-sample performance through Monte Carlo simulation.
\paragraph{Data Generation.}
[Describe DGP with formulas]
\paragraph{Parameter Grid.}
\begin{itemize}
\item Sample size: $n \in \{200, 500, 1000, 2000\}$
\item Effect size: $\psi \in \{0, 0.1, 0.3\}$
\item [Other factors]
\end{itemize}
\paragraph{Estimators.}
We compare:
\begin{enumerate}
\item Proposed estimator
\item [Competitor 1] \citep{...}
\item [Competitor 2] \citep{...}
\item Oracle (if applicable)
\end{enumerate}
\paragraph{Performance Metrics.}
\begin{itemize}
\item Bias: $\text{Bias} = \bar{\hat{\psi}} - \psi_0$
\item Empirical SE: $\text{ESE} = \text{SD}(\hat{\psi})$
\item Average SE: $\text{ASE} = \bar{\widehat{SE}}$
\item Coverage: $\text{Cov} = \text{proportion of CIs containing } \psi_0$
\item MSE: $\text{MSE} = \text{Bias}^2 + \text{ESE}^2$
\end{itemize}
Each scenario: 1000 replications.
\subsection{Results}
[Tables and interpretation]
Tips:
- Follow Morris et al. (2019) guidelines
- Include enough scenarios to stress-test
- Show both when method works AND when it doesn't
- Include oracle/optimal for context
- Report MCSE (Monte Carlo standard error)
9. Application
Structure:
\section{Application}
\label{sec:application}
\subsection{Data Description}
We apply our method to [dataset] to study [scientific question].
[Describe sample, variables, missingness]
\subsection{Analysis}
[Model specification, covariate selection, etc.]
\subsection{Results}
[Point estimates, CIs, interpretation]
\subsection{Sensitivity Analysis}
[Robustness to assumptions]
Tips:
- Use a compelling, relevant application
- Describe data clearly (can reproduce)
- Report all model specifications
- Include sensitivity analyses
- Interpret substantively (not just "significant")
10. Discussion
Structure (4-5 paragraphs):
Paragraph 1: Summary
- Brief recap of contribution
- Key findings (theory + empirical)
Paragraph 2: Implications
- What does this mean for practice?
- When should researchers use this?
Paragraph 3: Limitations
- What can't the method do?
- When might it fail?
- (Being honest builds credibility)
Paragraph 4: Future Directions
- Natural extensions
- Open problems
- Ongoing work (brief)
Paragraph 5: Conclusion
- Final statement of contribution
- Availability of software
Journal-Specific Requirements
JASA (Journal of the American Statistical Association)
Format:
- Double-spaced, 12pt font
- Separate title page with abstract
- Figures/tables at end
- Supplementary materials allowed
Abstract: ~150 words, unstructured
Sections: Standard methods paper structure
Key reviewer expectations:
- Novel methodology (not just application)
- Rigorous theory
- Comprehensive simulation
- Compelling application
- Reproducibility (code/data)
Word limit: ~25-30 pages (main), unlimited supplement
Biometrika
Format:
- Double-spaced
- Abstract on title page
- References: author-year
Abstract: ~100-150 words
Emphasis:
- Mathematical rigor
- Elegant theory
- Concise writing
- Deep results > breadth
Word limit: ~20-25 pages
Biostatistics
Format:
- Double-spaced
- Structured abstract (Background, Methods, Results, Conclusions)
Abstract: 250 words max
Emphasis:
- Biomedical motivation
- Practical impact
- Software availability
- Real data analysis essential
Word limit: ~30 pages
Statistics in Medicine
Format:
- Double-spaced
- Structured abstract
Emphasis:
- Medical statistics focus
- Tutorial aspect welcomed
- Practical guidance
- Reproducibility
Notation Standards
VanderWeele Notation (Mediation/Causal)
| Symbol | Meaning |
|---|---|
| $Y(a)$ | Potential outcome under $A=a$ |
| $Y(a,m)$ | Potential outcome under $A=a$, $M=m$ |
| $M(a)$ | Potential mediator under $A=a$ |
| $NDE$ | Natural Direct Effect |
| $NIE$ | Natural Indirect Effect |
| $CDE(m)$ | Controlled Direct Effect at $M=m$ |
| $TE$ | Total Effect |
| $P_M$ | Proportion Mediated |
Statistical Notation
| Symbol | Meaning |
|---|---|
| $\theta_0$ | True parameter value |
| $\hat{\theta}_n$ | Estimator based on $n$ observations |
| $\phi(O)$ | Influence function |
| $\mathbb{P}_n$ | Empirical measure: $n^{-1}\sum_i \delta_{O_i}$ |
| $\mathbb{G}_n$ | Empirical process: $\sqrt{n}(\mathbb{P}_n - P)$ |
| $\xrightarrow{p}$ | Convergence in probability |
| $\xrightarrow{d}$ | Convergence in distribution |
| $O_p(\cdot)$, $o_p(\cdot)$ | Stochastic order |
Consistency in Notation
- Define ALL symbols before first use
- Use same symbol for same concept throughout
- Avoid notation conflicts within paper
- Follow journal/field conventions
Common Writing Patterns
Introducing Assumptions
We require the following assumptions for identification:
\begin{assumption}[Name]
\label{A:name}
[Mathematical statement]
\end{assumption}
Assumption \ref{A:name} requires that [plain language explanation]. This is plausible when [conditions]. It would be violated if [counter-examples].
Presenting Theorems
Our main theoretical result establishes the asymptotic properties of $\hat{\psi}_n$.
\begin{theorem}[Title]
\label{thm:main}
Under Conditions \ref{C1}--\ref{Cn}, [statement].
\end{theorem}
Theorem \ref{thm:main} shows that [interpretation]. The key insight is [intuition]. Compared to [existing result], our result [improvement].
Comparing to Existing Methods
Our approach differs from \citet{Author2020} in several ways. First, [difference 1]. Second, [difference 2]. Whereas their method requires [strong assumption], our estimator only needs [weaker assumption]. In the simulation study, we demonstrate [empirical comparison].
Discussing Limitations
Several limitations deserve mention. First, our method assumes [assumption], which may not hold in settings where [violation scenario]. Second, the asymptotic approximation requires [sample size consideration]. Future work could address these by [potential solutions].
LaTeX Best Practices
Document Structure
\documentclass[12pt]{article}
\usepackage{amsmath,amsthm,amssymb}
\usepackage{natbib}
\usepackage{graphicx}
\usepackage{booktabs}
% Theorem environments
\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{assumption}{Assumption}
\newtheorem{condition}{Condition}
% Custom commands
\newcommand{\E}{\mathbb{E}}
\newcommand{\Var}{\text{Var}}
\newcommand{\Cov}{\text{Cov}}
\newcommand{\indep}{\perp\!\!\!\perp}
\begin{document}
...
\end{document}
Tables
\begin{table}[ht]
\centering
\caption{Simulation results: Bias ($\times 100$), ESE, ASE, and Coverage (\%)}
\label{tab:sim}
\begin{tabular}{lcccccc}
\toprule
& \multicolumn{3}{c}{$n=500$} & \multicolumn{3}{c}{$n=1000$} \\
\cmidrule(lr){2-4} \cmidrule(lr){5-7}
Method & Bias & SE & Cov & Bias & SE & Cov \\
\midrule
Proposed & 0.2 & 0.15 & 94.8 & 0.1 & 0.11 & 95.2 \\
Naive & 5.3 & 0.12 & 82.1 & 5.1 & 0.09 & 71.3 \\
\bottomrule
\end{tabular}
\end{table}
Figures
\begin{figure}[ht]
\centering
\includegraphics[width=0.8\textwidth]{figures/sim_results.pdf}
\caption{Simulation results across sample sizes. Left: Bias. Right: Coverage.
Dashed line indicates nominal 95\% level.}
\label{fig:sim}
\end{figure}
Quality Checklist
Before Submission
Content:
- All claims supported by theory or evidence
- All notation defined before use
- Assumptions clearly stated and discussed
- Proofs complete and correct
- Simulations comprehensive
- Application compelling and well-analyzed
Writing:
- Clear, concise prose
- Logical flow between sections
- Active voice where appropriate
- No undefined acronyms
- Consistent terminology
Formatting:
- Follows journal guidelines
- Figures high resolution
- Tables properly formatted
- References complete and consistent
- Supplementary materials organized
Reproducibility:
- Code available (GitHub, Zenodo)
- Data available or simulated data provided
- Random seeds documented
- Software versions noted
Integration with Other Skills
This skill works with:
- proof-architect - For presenting theoretical results
- identification-theory - For identification sections
- asymptotic-theory - For inference sections
- simulation-architect - For simulation study design
- manuscript-writing-guide - For project-specific standards
Key References
VanderWeele notation
JASA style guide
APA citations
Morris, T.P. et al. (2019). Using simulation studies to evaluate statistical methods. Statistics in Medicine.
VanderWeele, T.J. (2015). Explanation in Causal Inference. Oxford.
van der Laan, M.J. & Rose, S. (2018). Targeted Learning in Data Science. Springer.
Version: 1.0 Created: 2025-12-08 Domain: Statistical Methods, Scientific Writing