| name | literature-gap-finder |
| description | Method×Setting matrices and systematic gap identification |
Literature Gap Finder
Systematic framework for identifying research opportunities in statistical methodology
Use this skill when: positioning research contributions, finding gaps in methodology literature, identifying unexplored combinations of methods and settings, building literature reviews, or deciding on research directions.
The Gap-Finding Framework
What Makes a Good Research Gap?
A publishable gap must be:
- Real - Not already addressed (check thoroughly!)
- Important - Solves a problem researchers face
- Tractable - Can be solved with available tools
- Novel - Provides new insight, not just combination
- Timely - Relevant to current research trends
Types of Gaps
| Gap Type | Description | Example |
|---|---|---|
| Method Gap | No method exists for setting | No mediation analysis for network data |
| Theory Gap | Method exists but lacks theory | Bootstrap for mediation lacks consistency proof |
| Efficiency Gap | Methods exist but are inefficient | Doubly robust mediation more efficient |
| Robustness Gap | Methods fail under violations | Mediation under measurement error |
| Computational Gap | Existing methods don't scale | Mediation with high-dimensional confounders |
| Extension Gap | Existing method needs generalization | Binary → continuous mediator |
Method-Setting Matrix
Systematic Gap Identification Framework
The method-setting matrix is the core tool for finding research gaps systematically:
# Build a method-setting matrix programmatically
create_gap_matrix <- function() {
methods <- c("Regression", "Weighting/IPW", "DR/AIPW", "TMLE", "ML-based")
settings <- c("Binary treatment", "Continuous treatment",
"Time-varying", "Clustered", "High-dimensional",
"Measurement error", "Missing data", "Network")
matrix_data <- expand.grid(method = methods, setting = settings)
matrix_data$status <- "unknown" # To be filled: "developed", "partial", "gap"
matrix_data$priority <- NA
matrix_data$references <- ""
matrix_data
}
# Visualize the gap matrix
visualize_gaps <- function(gap_matrix) {
library(ggplot2)
ggplot(gap_matrix, aes(x = method, y = setting, fill = status)) +
geom_tile(color = "white") +
scale_fill_manual(values = c(
"developed" = "#2ecc71",
"partial" = "#f39c12",
"gap" = "#e74c3c",
"unknown" = "#95a5a6"
)) +
theme_minimal() +
labs(title = "Method × Setting Gap Matrix",
x = "Method", y = "Setting") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
Verification Process
Confirming a Gap is Real
Before claiming a gap, verify systematically:
| Step | Action | Tools |
|---|---|---|
| 1 | Search major databases | Google Scholar, Web of Science, Scopus |
| 2 | Search preprint servers | arXiv, bioRxiv, SSRN |
| 3 | Search R packages | CRAN, GitHub, R-universe |
| 4 | Check conference proceedings | ICML, NeurIPS, JSM, ENAR |
| 5 | Search dissertations | ProQuest, university repositories |
| 6 | Email domain experts | 2-3 experts for confirmation |
# Systematic verification checklist
verify_gap <- function(topic, keywords) {
checklist <- list(
databases_searched = c("google_scholar", "web_of_science", "pubmed", "scopus"),
search_terms = keywords,
date_range = paste(Sys.Date() - 365*5, "to", Sys.Date()),
results = list(
papers_found = 0,
closest_related = c(),
why_not_the_same = ""
),
expert_consultation = list(
experts_contacted = c(),
responses = c()
),
verification_status = "pending" # pending, confirmed, rejected
)
checklist
}
# Document the verification
document_verification <- function(gap_description, search_log) {
cat("## Gap Verification Report\n\n")
cat("**Gap:**", gap_description, "\n\n")
cat("**Search Date:**", as.character(Sys.Date()), "\n\n")
cat("**Databases Searched:**\n")
for (db in search_log$databases_searched) {
cat("- ", db, "\n")
}
cat("\n**Search Terms:**", paste(search_log$search_terms, collapse = ", "), "\n")
cat("\n**Conclusion:**", search_log$verification_status, "\n")
}
Priority Ranking
Scoring Research Gaps
| Criterion | Weight | Score 1-5 |
|---|---|---|
| Impact (how many benefit?) | 0.25 | ___ |
| Novelty (how new?) | 0.20 | ___ |
| Tractability (can we solve it?) | 0.20 | ___ |
| Timeliness (is it hot now?) | 0.15 | ___ |
| Fit (matches our expertise?) | 0.10 | ___ |
| Publication potential | 0.10 | ___ |
Priority Score = Σ(weight × score)
# Priority scoring function
score_research_gap <- function(
impact, # 1-5: How many researchers would benefit
novelty, # 1-5: How new/original is this
tractability, # 1-5: How likely can we solve it
timeliness, # 1-5: Is this currently hot
fit, # 1-5: Matches our expertise
publication # 1-5: Publication potential
) {
weights <- c(0.25, 0.20, 0.20, 0.15, 0.10, 0.10)
scores <- c(impact, novelty, tractability, timeliness, fit, publication)
priority <- sum(weights * scores)
list(
priority_score = priority,
interpretation = case_when(
priority >= 4.0 ~ "High priority - pursue immediately",
priority >= 3.0 ~ "Medium priority - develop further",
priority >= 2.0 ~ "Low priority - back burner",
TRUE ~ "Skip - not worth pursuing"
),
breakdown = data.frame(
criterion = c("Impact", "Novelty", "Tractability",
"Timeliness", "Fit", "Publication"),
weight = weights,
score = scores,
weighted = weights * scores
)
)
}
# Compare multiple gaps
rank_gaps <- function(gaps_list) {
scores <- sapply(gaps_list, function(g) g$priority_score)
order(scores, decreasing = TRUE)
}
Method × Setting Matrix
The Core Framework
Systematically map methods against settings to find gaps:
METHODS
│ Regression │ Weighting │ DR/TMLE │ ML-based │
──────────┼────────────┼───────────┼─────────┼──────────│
Binary A │ ✓ │ ✓ │ ✓ │ ✓ │
Continuous│ ✓ │ ? │ ✓ │ ? │
SETTINGS ├────────────┼───────────┼─────────┼──────────│
Time-vary │ ? │ ✓ │ ✓ │ ✗ │
Clustered │ ✓ │ ? │ ? │ ✗ │
High-dim │ ✗ │ ✗ │ ? │ ✓ │
✓ = Well-developed ? = Partial/emerging ✗ = Gap
Building Your Matrix
Step 1: Identify Dimensions
For mediation analysis:
| Dimension | Variations |
|---|---|
| Treatment | Binary, continuous, multi-level, time-varying |
| Mediator | Single, multiple, high-dimensional, latent |
| Outcome | Continuous, binary, count, survival, longitudinal |
| Confounding | Measured, unmeasured, time-varying |
| Structure | Single mediator, parallel, sequential, moderated |
| Data | Cross-sectional, longitudinal, clustered, network |
| Assumptions | Standard, relaxed positivity, measurement error |
Step 2: List Methods
| Method Family | Specific Methods |
|---|---|
| Regression | Baron-Kenny, product of coefficients, difference |
| Weighting | IPW, MSM, sequential g-estimation |
| Doubly Robust | AIPW, TMLE, cross-fitted |
| Semiparametric | Influence function-based |
| Bayesian | MCMC, variational |
| Machine Learning | Causal forests, DML, neural |
| Bounds | Partial identification, sensitivity |
Step 3: Fill and Analyze
Mark each cell:
- ✓ (green): Well-established with theory + software
- ? (yellow): Emerging or partial coverage
- ✗ (red): Clear gap
Example: Sequential Mediation Matrix
│ Product │ Weighting │ DR │ Bounds │
─────────────────────────┼─────────┼───────────┼────┼────────│
2 mediators, linear │ ✓ │ ✓ │ ✓ │ ? │
2 mediators, nonlinear │ ? │ ✓ │ ? │ ✗ │
3+ mediators, linear │ ? │ ? │ ✗ │ ✗ │
3+ mediators, nonlinear │ ✗ │ ? │ ✗ │ ✗ │
With measurement error │ ✗ │ ✗ │ ✗ │ ✗ │
With unmeasured conf. │ ✗ │ ✗ │ ✗ │ ? │
Gaps identified:
- DR methods for 3+ mediators
- Any method with measurement error
- Bounds approach underdeveloped
Assumption Relaxation Trees
The Framework
Map how assumptions have been relaxed over time:
Standard Mediation (Baron-Kenny 1986)
│
┌─────────────────┼─────────────────┐
↓ ↓ ↓
No unmeasured Linearity No interaction
confounding assumed assumed
│ │ │
↓ ↓ ↓
┌───────┴───────┐ Nonparametric VanderWeele
↓ ↓ (Imai 2010) 4-way decomp
Sensitivity Bounds │
(Imai 2010) (partial ID) ↓
│ │ Multiple mediators?
↓ ↓ Longitudinal?
E-value Sharp bounds? Measurement error?
(Ding 2016) │ │
│ ↓ ↓
↓ [YOUR GAP?] [YOUR GAP?]
[YOUR GAP?]
Building the Tree
Step 1: Identify Original Assumptions
For a classic method, list ALL assumptions:
- Explicit assumptions (stated in paper)
- Implicit assumptions (unstated but required)
- Computational assumptions (required for implementation)
Step 2: Trace Relaxation History
For each assumption, find papers that:
- Relax it partially
- Relax it completely
- Replace it with different assumption
- Show consequences of violation
Step 3: Find Unexplored Branches
Look for:
- Combinations of relaxations not yet explored
- Relaxations in one method not applied to another
- Partial relaxations that could be completed
Example: Positivity Assumption
Positivity: P(A=a|X) > ε > 0 for all a, x
│
┌───────────────┼───────────────┐
↓ ↓ ↓
Near-violation Practical Structural
positivity violations
│ │ │
↓ ↓ ↓
Trimming Overlap Extrapolation
weights assessment methods
│ │ │
↓ ↓ ↓
Truncation? Diagnostics? Bounds under
violations?
Citation Network Analysis
Forward and Backward Searching
Backward: From recent key paper, trace citations:
- What foundational papers are cited?
- What parallel developments exist?
- What's the intellectual lineage?
Forward: Using Google Scholar "Cited by":
- Who has built on this work?
- What extensions were made?
- What gaps remain unaddressed?
Key Paper Identification
For any topic, identify:
| Category | Description | How to Find |
|---|---|---|
| Foundational | Original method papers | Most-cited, oldest |
| Textbook | Comprehensive treatments | Citations across subfields |
| Recent reviews | State-of-the-art summaries | "Review" in title, last 5 years |
| Frontier | Latest developments | Top journals, last 2 years |
| Your competition | Groups working on same gap | Recent similar titles |
Building a Citation Map
1986: Baron & Kenny [foundations]
│
├──→ 1990s: SEM extensions
│
├──→ 2004: Robins & Greenland [causal foundations]
│ │
│ ├──→ 2010: Imai et al. [sensitivity]
│ │
│ ├──→ 2010: VanderWeele [4-way]
│ │ │
│ │ └──→ 2015: Book [comprehensive]
│ │
│ └──→ 2014: Tchetgen [semiparametric]
│
└──→ 2020s: ML integration [frontier]
Gap Verification Checklist
Before claiming a gap, verify:
1. Literature Search
- Searched Google Scholar with multiple keyword combinations
- Searched arXiv stat.ME and stat.TH
- Searched JSTOR for older statistics journals
- Searched bioRxiv/medRxiv for preprints
- Checked reference lists of review papers
- Checked "cited by" for key papers
2. Terminology Check
- Same concept might have different names in different fields
- Checked econometrics terminology
- Checked biostatistics terminology
- Checked machine learning terminology
- Checked psychology/SEM terminology
3. Adjacent Literature
- Checked related but not identical settings
- Method might exist for similar problem
- Checked if general framework applies
4. Working Papers
- Checked key authors' websites
- Checked conference proceedings (JSM, ENAR)
- Asked collaborators/experts
5. Final Verification
- Gap is not addressed in supplementary materials
- Gap is not "obvious" extension reviewers will dismiss
- Gap is important enough to publish
Gap Characterization Template
When you identify a gap:
## Gap: [Brief Title]
### Setting
[Precise description of the setting where the gap exists]
### Current State
- **What exists**: [Methods that partially address this]
- **What works**: [Aspects of the problem already solved]
- **What fails**: [Where current methods break down]
### The Gap
- **Precise statement**: [What is missing]
- **Why it matters**: [Who needs this, for what applications]
- **Why it's hard**: [Technical challenges]
### Evidence of Gap
- [ ] Literature search documented
- [ ] No existing solution found
- [ ] Experts consulted (optional)
### Potential Approaches
1. [Approach 1]: [Brief description]
- Pros: [Advantages]
- Cons: [Challenges]
2. [Approach 2]: [Brief description]
- Pros: [Advantages]
- Cons: [Challenges]
### Related Work
- [Paper 1]: [How it relates, why it doesn't solve gap]
- [Paper 2]: [How it relates, why it doesn't solve gap]
### Contribution Positioning
"While [existing work] addresses [related problem], no method currently
handles [specific gap]. We propose [approach] which provides [properties]."
Common Gap Patterns in Mediation
Pattern 1: Data Structure Mismatch
Gap template: "[Method] assumes [simple structure], but in [application] data has [complex structure]"
Examples:
- Methods assume iid, but data is clustered
- Methods assume cross-sectional, but data is longitudinal
- Methods assume complete data, but missingness exists
Pattern 2: Assumption Violation
Gap template: "[Method] requires [assumption], which is violated when [situation]"
Examples:
- Unmeasured mediator-outcome confounding
- Measurement error in mediator
- Treatment-mediator interaction
Pattern 3: Estimand Ambiguity
Gap template: "When [complication], standard estimands [NDE/NIE] are not well-defined or interpretable"
Examples:
- Post-treatment confounding
- Time-varying treatments/mediators
- Multiple versions of treatment
Pattern 4: Efficiency vs Robustness
Gap template: "Efficient methods require [strong assumptions], while robust methods are inefficient"
Examples:
- Doubly robust methods for mediation
- Semiparametric efficiency in complex settings
- Adaptive methods
Pattern 5: Computational Barrier
Gap template: "Theoretically valid approach exists but [computational limitation]"
Examples:
- High-dimensional settings
- Continuous mediators requiring integration
- Bootstrap in complex models
Research Positioning Strategies
The Contribution Statement
Strong positioning formula:
"Although [Author Year] developed [method] for [setting], their approach [limitation]. In contrast, our method [advantage] while maintaining [property]. Specifically, we contribute: (1) [theoretical contribution], (2) [methodological contribution], (3) [practical contribution]."
Positioning Types
| Position | When to Use | Example Language |
|---|---|---|
| Extension | Build on existing | "We extend [method] to [new setting]" |
| Synthesis | Combine approaches | "We unify [method A] and [method B]" |
| Alternative | Different approach | "We propose an alternative that [advantage]" |
| Correction | Fix limitation | "We address the limitation of [method]" |
| Generalization | Broader framework | "We develop a general framework that includes [special cases]" |
Differentiation Matrix
| Dimension | Competitor 1 | Competitor 2 | Our Method |
|---|---|---|---|
| Setting | Binary A only | Any A | Any A |
| Theory | Consistency | + Normality | + Efficiency |
| Assumptions | Strong | Medium | Weaker |
| Computation | Fast | Slow | Medium |
| Software | R package | None | R + Python |
Integration with Other Skills
This skill works with:
- cross-disciplinary-ideation - Find solutions from other fields for identified gaps
- method-transfer-engine - Transfer methods to fill gaps
- identification-theory - Understand what assumptions are needed
- methods-paper-writer - Write up the gap and solution
Key References
On Finding Research Gaps
- Alvesson, M. & Sandberg, J. (2011). Generating research questions through problematization
- Sandberg, J. & Alvesson, M. (2011). Ways of constructing research questions
Mediation Reviews (for gap identification)
- VanderWeele, T.J. (2016). Mediation analysis: A practitioner's guide. Annual Review
- Nguyen, T.Q. et al. (2021). Clarifying causal mediation analysis. Psychological Methods
Causal Inference Reviews
- Hernán, M.A. (2018). The C-word: Scientific euphemisms do not improve causal inference
- Imbens, G.W. (2020). Potential outcome and directed acyclic graph approaches
Version: 1.0 Created: 2025-12-08 Domain: Research Strategy, Literature Review