| name | archr-local |
| description | ArchR docs served from downloaded_docs/archr_scrape - comprehensive scATAC-seq analysis toolkit with all HTML files explicitly listed |
Archr-Local Skill
Comprehensive assistance with ArchR (Analysis of Regulatory Regions) - the premier toolkit for single-cell ATAC-seq data analysis, generated from official documentation.
When to Use This Skill
This skill should be triggered when:
Core Analysis Tasks
- Working with scATAC-seq data: Processing fragment files, creating Arrow files, quality control
- Dimensionality reduction and clustering: LSI analysis, UMAP visualization, cell clustering
- Peak analysis: Differential accessibility, co-accessibility, peak-to-gene links
- Motif enrichment: Finding regulatory motifs, transcription factor analysis
- Integration: Multiome analysis (scATAC + scRNA), batch correction
- Visualization: Browser tracks, embedding plots, TSS enrichment plots
- Project management: Creating projects, subsetting, doublet detection
Specific Use Cases
- Setting up new ArchR projects from fragment files or BAM files
- Creating UMAP embeddings and clustering scATAC-seq data
- Identifying cell clusters and markers
- Performing differential accessibility analysis
- Generating browser tracks for genomic regions
- Adding motif annotations and performing motif enrichment
- Integrating scATAC-seq with scRNA-seq data
- Exporting results and creating publication-ready plots
Questions That Need This Skill
- "How do I create an ArchR project from my scATAC-seq data?"
- "How do I identify cell types in my ATAC-seq data?"
- "How do I find differentially accessible peaks between clusters?"
- "How do I add motif information to my peaks?"
- "How do I visualize my data with UMAP plots?"
Quick Reference
Common Patterns
Pattern 1: Setting up ArchR and creating a project
library(ArchR)
addArchRGenome("hg38")
addArchRThreads(8)
addArchRLocking(locking = TRUE)
set.seed(1)
# Create Arrow files from fragment data
ArrowFiles <- createArrowFiles(
inputFiles = atacFiles,
sampleNames = names(atacFiles),
minTSS = 4,
minFrags = 1000,
addTileMat = TRUE,
addGeneScoreMat = TRUE
)
# Create ArchR project
proj <- ArchRProj(
ArrowFiles = ArrowFiles,
outputDirectory = "ArchR-Output",
copyArrows = TRUE
)
Pattern 2: Quality control and filtering
# Plot TSS enrichment for QC
p <- plotTSSEnrichment(proj, groupBy = "Sample")
plotPDF(p, name = "TSS-Enrich", ArchRProj = proj)
# Filter doublets
proj <- filterDoublets(proj)
# Subset to high-quality cells
proj <- subsetArchRProject(
ArchRProj = proj,
cells = proj$cellColData[proj$cellColData$Clusters != "Doublet", ]
)
Pattern 3: Dimensionality reduction and clustering
# Add Iterative LSI
proj <- addIterativeLSI(
ArchRProj = proj,
useMatrix = "TileMatrix",
name = "IterativeLSI",
force = TRUE
)
# Add UMAP embedding
proj <- addUMAP(
ArchRProj = proj,
reducedDims = "IterativeLSI",
name = "UMAP",
nNeighbors = 30,
minDist = 0.5,
metric = "cosine"
)
# Add clusters
proj <- addClusters(
input = proj,
reducedDims = "IterativeLSI",
method = "Seurat",
name = "Clusters",
resolution = 0.8
)
Pattern 4: Visualization and plotting
# Plot UMAP colored by sample
p1 <- plotEmbedding(ArchRProj = proj, colorBy = "cellColData", name = "Sample", embedding = "UMAP")
# Plot UMAP colored by clusters
p2 <- plotEmbedding(ArchRProj = proj, colorBy = "cellColData", name = "Clusters", embedding = "UMAP")
# Side-by-side comparison
ggAlignPlots(p1, p2, type = "h")
plotPDF(p1, p2, name = "Plot-UMAP-Sample-Clusters.pdf", ArchRProj = proj, addDOC = FALSE)
Pattern 5: Gene accessibility and marker analysis
# Add gene expression scores
proj <- addGeneScoreMatrix(
ArchRProj = proj,
useMatrix = "TileMatrix",
matrixName = "GeneScoreMatrix"
)
# Plot gene scores on UMAP
p <- plotEmbedding(
ArchRProj = proj,
embedding = "UMAP",
colorBy = "GeneScoreMatrix",
name = "CD34",
size = 1
)
Pattern 6: Differential accessibility analysis
# Identify marker peaks
markersPeaks <- getMarkerFeatures(
ArchRProj = proj,
useMatrix = "PeakMatrix",
groupBy = "Clusters",
bias = c("TSSEnrichment", "log10(nFrags)"),
testMethod = "wilcoxon"
)
# Extract region matrix
markerList <- getFeatures(markersPeaks, name = "PeakMatrix")
heatmapPeaks <- plotMarkerHeatmap(
peakMatrix = getMatrixFromProject(proj, useMatrix = "PeakMatrix"),
features = markerList,
cutOff = "FDR <= 0.1 & Log2FC >= 1"
)
Pattern 7: Co-accessibility analysis
# Add co-accessibility
proj <- addCoAccessibility(
ArchRProj = proj,
reducedDims = "IterativeLSI"
)
# Get co-accessibility loops
cA <- getCoAccessibility(
ArchRProj = proj,
corCutOff = 0.5,
resolution = 1000,
returnLoops = TRUE
)
# Plot browser tracks with co-accessibility loops
p <- plotBrowserTrack(
ArchRProj = proj,
groupBy = "Clusters",
geneSymbol = c("CD14", "CD3D"),
upstream = 50000,
downstream = 50000,
loops = cA
)
Pattern 8: Motif enrichment analysis
# Add motif annotations
proj <- addMotifAnnotations(ArchRProj = proj, motifSet = "cisbp", name = "Motif")
# Perform motif enrichment
enrichMotifs <- peakAnnoEnrichment(
seMarker = getFeatures(markersPeaks, name = "PeakMatrix"),
ArchRProj = proj,
peakAnnotation = "Motif"
)
# Plot motif enrichment
plotEnrichHeatmap(enrichMotifs, cutOff = "FDR <= 0.1 & Log2FC >= 1")
Pattern 9: Multiome data integration
# Import scRNA-seq data
rnaMatrix <- import10xFeatureMatrix(inputFiles = rnaFiles)
# Add gene expression matrix to ArchR project
proj <- addGeneExpressionMatrix(
input = proj,
matrices = rnaMatrix,
strictMatch = TRUE
)
# Create dimensionality reduction using both modalities
proj <- addCombinedDims(
ArchRProj = proj,
reducedDims = c("IterativeLSI", "GeneIntegrationMatrix"),
name = "CombinedDims"
)
Pattern 10: Exporting results and data
# Export group BigWig files
bw <- getGroupBW(
ArchRProj = proj,
groupBy = "Clusters",
normMethod = "ReadsInTSS",
tileSize = 100
)
# Export group fragment files
frags <- getGroupFragments(
ArchRProj = proj,
groupBy = "Clusters"
)
# Save project
proj <- saveArchRProject(proj, outputDirectory = "ArchR-Project-Output", load = FALSE)
Key Concepts
Core ArchR Objects
- ArchRProject: Main container for single-cell ATAC-seq data and analyses
- ArrowFiles: Efficient storage format for fragment data
- PeakMatrix: Peak-by-cell accessibility matrix
- TileMatrix: Fixed-size tile-by-cell accessibility matrix
- GeneScoreMatrix: Gene-by-cell accessibility score matrix
Analysis Workflow
- Data Input: Import fragment files/BAM files → Create Arrow files
- Quality Control: TSS enrichment → Doublet filtering → Cell subsetting
- Dimensionality Reduction: Iterative LSI → UMAP
- Clustering: Identify cell populations
- Downstream Analysis: Differential accessibility, motif analysis, co-accessibility
Important Parameters
- minTSS: Minimum TSS enrichment score for cell retention (usually 4-10)
- minFrags: Minimum fragments per cell (usually 1000-5000)
- resolution: Clustering resolution (higher = more clusters)
- corCutOff: Correlation cutoff for co-accessibility (usually 0.3-0.5)
Reference Files
This skill includes comprehensive documentation in references/:
Core Documentation
- getting_started.md - Installation, basic setup, and introduction to ArchR
- data_preparation.md - Input file formats, project creation, and data import
- dimensionality_reduction.md - LSI, UMAP, and other dimensionality reduction methods
- clustering.md - Cell clustering, group creation, and population identification
- visualization.md - Plotting functions and data visualization
Analysis Functions
- analysis_functions.md - Core analysis functions for modalities and integrative analysis
- peak_analysis.md - Peak calling, differential accessibility, and peak-to-gene linking
- gene_analysis.md - Gene scoring, expression integration, and gene-based analyses
- enrichment_analysis.md - GO analysis, motif enrichment, and pathway analysis
- trajectory_analysis.md - Pseudotime analysis, lineage trajectories, and differentiation
Advanced Topics
- advanced.md - Advanced techniques and specialized analyses
- integration.md - Multi-omics integration and batch correction
- export.md - Data export, result saving, and sharing
Utilities
- project_management.md - Project organization, subsetting, and management
- utility_functions.md - Helper functions and utilities
- visualization_functions.md - Additional plotting and visualization tools
- other.md - Miscellaneous topics and supplementary information
Reference Content Structure
Each reference file contains:
- Detailed explanations of functions and workflows
- Code examples with syntax highlighting
- Parameter descriptions and usage notes
- Best practices and troubleshooting tips
Use file names to navigate specific topics (e.g., view clustering.md for clustering guidance).
Working with This Skill
For Beginners
- Start with getting_started.md - Learn ArchR installation and basic concepts
- Read data_preparation.md - Understand how to format and import your data
- Follow dimensionality_reduction.md and clustering.md - Create your first analyses
- Use visualization.md - Learn to plot and interpret your results
For Intermediate Users
- Review peak_analysis.md and gene_analysis.md - Perform differential analyses
- Explore enrichment_analysis.md - Add regulatory insights to your work
- Check integration.md - Combine multiple modalities or datasets
- Use export.md - Generate publication-ready outputs
For Advanced Users
- Study advanced.md - Implement sophisticated analytical techniques
- Review trajectory_analysis.md - Study cellular differentiation
- Customize utility_functions.md - Extend ArchR functionality
- Optimize workflows using project_management.md
Navigation Tips
- Use specific file names in your queries (e.g., "show me clustering.md")
- Ask for specific functions by name (e.g., "how does addIterativeLSI work?")
- Request examples from particular documentation sections
- Use the Quick Reference patterns for common workflows
Resources
Documentation Structure
- references/ - Complete extracted documentation organized by topic
- Quick Reference - Frequently used code patterns and workflows
- Key Concepts - Essential terminology and best practices
Getting Help
- Reference specific files for detailed function descriptions
- Use Quick Reference patterns for common tasks
- Ask about specific parameters or troubleshooting scenarios
Notes
- ArchR is specifically designed for single-cell ATAC-seq data analysis
- The Arrow file format enables memory-efficient processing of large datasets
- ArchR integrates seamlessly with other Bioconductor packages
- All major scATAC-seq file formats are supported (10x, sci-ATAC, etc.)
- The toolkit includes extensive QC metrics and validation steps
Updating
To refresh this skill with updated documentation:
- Re-run the scraper with the same configuration
- This skill will be rebuilt with the latest information from the ArchR documentation