name	archr-local
description	ArchR docs served from downloaded_docs/archr_scrape - comprehensive scATAC-seq analysis toolkit with all HTML files explicitly listed

Archr-Local Skill

Comprehensive assistance with ArchR (Analysis of Regulatory Regions) - the premier toolkit for single-cell ATAC-seq data analysis, generated from official documentation.

When to Use This Skill

This skill should be triggered when:

Core Analysis Tasks

Working with scATAC-seq data: Processing fragment files, creating Arrow files, quality control
Dimensionality reduction and clustering: LSI analysis, UMAP visualization, cell clustering
Peak analysis: Differential accessibility, co-accessibility, peak-to-gene links
Motif enrichment: Finding regulatory motifs, transcription factor analysis
Integration: Multiome analysis (scATAC + scRNA), batch correction
Visualization: Browser tracks, embedding plots, TSS enrichment plots
Project management: Creating projects, subsetting, doublet detection

Specific Use Cases

Setting up new ArchR projects from fragment files or BAM files
Creating UMAP embeddings and clustering scATAC-seq data
Identifying cell clusters and markers
Performing differential accessibility analysis
Generating browser tracks for genomic regions
Adding motif annotations and performing motif enrichment
Integrating scATAC-seq with scRNA-seq data
Exporting results and creating publication-ready plots

Questions That Need This Skill

"How do I create an ArchR project from my scATAC-seq data?"
"How do I identify cell types in my ATAC-seq data?"
"How do I find differentially accessible peaks between clusters?"
"How do I add motif information to my peaks?"
"How do I visualize my data with UMAP plots?"

Quick Reference

Common Patterns

Pattern 1: Setting up ArchR and creating a project

library(ArchR)
addArchRGenome("hg38")
addArchRThreads(8)
addArchRLocking(locking = TRUE)
set.seed(1)

# Create Arrow files from fragment data
ArrowFiles <- createArrowFiles(
  inputFiles = atacFiles,
  sampleNames = names(atacFiles),
  minTSS = 4,
  minFrags = 1000,
  addTileMat = TRUE,
  addGeneScoreMat = TRUE
)

# Create ArchR project
proj <- ArchRProj(
  ArrowFiles = ArrowFiles,
  outputDirectory = "ArchR-Output",
  copyArrows = TRUE
)

Pattern 2: Quality control and filtering

# Plot TSS enrichment for QC
p <- plotTSSEnrichment(proj, groupBy = "Sample")
plotPDF(p, name = "TSS-Enrich", ArchRProj = proj)

# Filter doublets
proj <- filterDoublets(proj)

# Subset to high-quality cells
proj <- subsetArchRProject(
  ArchRProj = proj,
  cells = proj$cellColData[proj$cellColData$Clusters != "Doublet", ]
)

Pattern 3: Dimensionality reduction and clustering

# Add Iterative LSI
proj <- addIterativeLSI(
  ArchRProj = proj,
  useMatrix = "TileMatrix",
  name = "IterativeLSI",
  force = TRUE
)

# Add UMAP embedding
proj <- addUMAP(
  ArchRProj = proj,
  reducedDims = "IterativeLSI",
  name = "UMAP",
  nNeighbors = 30,
  minDist = 0.5,
  metric = "cosine"
)

# Add clusters
proj <- addClusters(
  input = proj,
  reducedDims = "IterativeLSI",
  method = "Seurat",
  name = "Clusters",
  resolution = 0.8
)

Pattern 4: Visualization and plotting

# Plot UMAP colored by sample
p1 <- plotEmbedding(ArchRProj = proj, colorBy = "cellColData", name = "Sample", embedding = "UMAP")

# Plot UMAP colored by clusters
p2 <- plotEmbedding(ArchRProj = proj, colorBy = "cellColData", name = "Clusters", embedding = "UMAP")

# Side-by-side comparison
ggAlignPlots(p1, p2, type = "h")
plotPDF(p1, p2, name = "Plot-UMAP-Sample-Clusters.pdf", ArchRProj = proj, addDOC = FALSE)

Pattern 5: Gene accessibility and marker analysis

# Add gene expression scores
proj <- addGeneScoreMatrix(
  ArchRProj = proj,
  useMatrix = "TileMatrix",
  matrixName = "GeneScoreMatrix"
)

# Plot gene scores on UMAP
p <- plotEmbedding(
  ArchRProj = proj,
  embedding = "UMAP",
  colorBy = "GeneScoreMatrix",
  name = "CD34",
  size = 1
)

Pattern 6: Differential accessibility analysis

# Identify marker peaks
markersPeaks <- getMarkerFeatures(
  ArchRProj = proj,
  useMatrix = "PeakMatrix",
  groupBy = "Clusters",
  bias = c("TSSEnrichment", "log10(nFrags)"),
  testMethod = "wilcoxon"
)

# Extract region matrix
markerList <- getFeatures(markersPeaks, name = "PeakMatrix")
heatmapPeaks <- plotMarkerHeatmap(
  peakMatrix = getMatrixFromProject(proj, useMatrix = "PeakMatrix"),
  features = markerList,
  cutOff = "FDR <= 0.1 & Log2FC >= 1"
)

Pattern 7: Co-accessibility analysis

# Add co-accessibility
proj <- addCoAccessibility(
  ArchRProj = proj,
  reducedDims = "IterativeLSI"
)

# Get co-accessibility loops
cA <- getCoAccessibility(
  ArchRProj = proj,
  corCutOff = 0.5,
  resolution = 1000,
  returnLoops = TRUE
)

# Plot browser tracks with co-accessibility loops
p <- plotBrowserTrack(
  ArchRProj = proj,
  groupBy = "Clusters",
  geneSymbol = c("CD14", "CD3D"),
  upstream = 50000,
  downstream = 50000,
  loops = cA
)

Pattern 8: Motif enrichment analysis

# Add motif annotations
proj <- addMotifAnnotations(ArchRProj = proj, motifSet = "cisbp", name = "Motif")

# Perform motif enrichment
enrichMotifs <- peakAnnoEnrichment(
  seMarker = getFeatures(markersPeaks, name = "PeakMatrix"),
  ArchRProj = proj,
  peakAnnotation = "Motif"
)

# Plot motif enrichment
plotEnrichHeatmap(enrichMotifs, cutOff = "FDR <= 0.1 & Log2FC >= 1")

Pattern 9: Multiome data integration

# Import scRNA-seq data
rnaMatrix <- import10xFeatureMatrix(inputFiles = rnaFiles)

# Add gene expression matrix to ArchR project
proj <- addGeneExpressionMatrix(
  input = proj,
  matrices = rnaMatrix,
  strictMatch = TRUE
)

# Create dimensionality reduction using both modalities
proj <- addCombinedDims(
  ArchRProj = proj,
  reducedDims = c("IterativeLSI", "GeneIntegrationMatrix"),
  name = "CombinedDims"
)

Pattern 10: Exporting results and data

# Export group BigWig files
bw <- getGroupBW(
  ArchRProj = proj,
  groupBy = "Clusters",
  normMethod = "ReadsInTSS",
  tileSize = 100
)

# Export group fragment files
frags <- getGroupFragments(
  ArchRProj = proj,
  groupBy = "Clusters"
)

# Save project
proj <- saveArchRProject(proj, outputDirectory = "ArchR-Project-Output", load = FALSE)

Key Concepts

Core ArchR Objects

ArchRProject: Main container for single-cell ATAC-seq data and analyses
ArrowFiles: Efficient storage format for fragment data
PeakMatrix: Peak-by-cell accessibility matrix
TileMatrix: Fixed-size tile-by-cell accessibility matrix
GeneScoreMatrix: Gene-by-cell accessibility score matrix

Analysis Workflow

Data Input: Import fragment files/BAM files → Create Arrow files
Quality Control: TSS enrichment → Doublet filtering → Cell subsetting
Dimensionality Reduction: Iterative LSI → UMAP
Clustering: Identify cell populations
Downstream Analysis: Differential accessibility, motif analysis, co-accessibility

Important Parameters

minTSS: Minimum TSS enrichment score for cell retention (usually 4-10)
minFrags: Minimum fragments per cell (usually 1000-5000)
resolution: Clustering resolution (higher = more clusters)
corCutOff: Correlation cutoff for co-accessibility (usually 0.3-0.5)

Reference Files

This skill includes comprehensive documentation in references/:

Core Documentation

getting_started.md - Installation, basic setup, and introduction to ArchR
data_preparation.md - Input file formats, project creation, and data import
dimensionality_reduction.md - LSI, UMAP, and other dimensionality reduction methods
clustering.md - Cell clustering, group creation, and population identification
visualization.md - Plotting functions and data visualization

Analysis Functions

analysis_functions.md - Core analysis functions for modalities and integrative analysis
peak_analysis.md - Peak calling, differential accessibility, and peak-to-gene linking
gene_analysis.md - Gene scoring, expression integration, and gene-based analyses
enrichment_analysis.md - GO analysis, motif enrichment, and pathway analysis
trajectory_analysis.md - Pseudotime analysis, lineage trajectories, and differentiation

Advanced Topics

advanced.md - Advanced techniques and specialized analyses
integration.md - Multi-omics integration and batch correction
export.md - Data export, result saving, and sharing

Utilities

project_management.md - Project organization, subsetting, and management
utility_functions.md - Helper functions and utilities
visualization_functions.md - Additional plotting and visualization tools
other.md - Miscellaneous topics and supplementary information

Reference Content Structure

Each reference file contains:

Detailed explanations of functions and workflows
Code examples with syntax highlighting
Parameter descriptions and usage notes
Best practices and troubleshooting tips

Use file names to navigate specific topics (e.g., view clustering.md for clustering guidance).

Working with This Skill

For Beginners

Start with getting_started.md - Learn ArchR installation and basic concepts
Read data_preparation.md - Understand how to format and import your data
Follow dimensionality_reduction.md and clustering.md - Create your first analyses
Use visualization.md - Learn to plot and interpret your results

For Intermediate Users

Review peak_analysis.md and gene_analysis.md - Perform differential analyses
Explore enrichment_analysis.md - Add regulatory insights to your work
Check integration.md - Combine multiple modalities or datasets
Use export.md - Generate publication-ready outputs

For Advanced Users

Study advanced.md - Implement sophisticated analytical techniques
Review trajectory_analysis.md - Study cellular differentiation
Customize utility_functions.md - Extend ArchR functionality
Optimize workflows using project_management.md

Navigation Tips

Use specific file names in your queries (e.g., "show me clustering.md")
Ask for specific functions by name (e.g., "how does addIterativeLSI work?")
Request examples from particular documentation sections
Use the Quick Reference patterns for common workflows

Resources

Documentation Structure

references/ - Complete extracted documentation organized by topic
Quick Reference - Frequently used code patterns and workflows
Key Concepts - Essential terminology and best practices

Getting Help

Reference specific files for detailed function descriptions
Use Quick Reference patterns for common tasks
Ask about specific parameters or troubleshooting scenarios

Notes

ArchR is specifically designed for single-cell ATAC-seq data analysis
The Arrow file format enables memory-efficient processing of large datasets
ArchR integrates seamlessly with other Bioconductor packages
All major scATAC-seq file formats are supported (10x, sci-ATAC, etc.)
The toolkit includes extensive QC metrics and validation steps

Updating

To refresh this skill with updated documentation:

Re-run the scraper with the same configuration
This skill will be rebuilt with the latest information from the ArchR documentation

archr-local

Install Skill

SKILL.md