name	banksy-merged-v4
description	BANKSY spatial transcriptomics analysis tool - complete documentation with precise file name-based categorization

Banksy-Merged-V4 Skill

Comprehensive assistance with BANKSY spatial transcriptomics analysis, including data preprocessing, matrix generation, clustering, and visualization for spatial omics data.

When to Use This Skill

This skill should be triggered when:

Working with spatial transcriptomics data - especially Slide-seq, 10x Visium, or STARmap datasets
Implementing BANKSY algorithms for spatially-aware clustering and analysis
Processing spatial omics data - preprocessing, filtering, and feature selection
Generating BANKSY matrices - creating neighbor-averaged feature matrices with spatial context
Performing spatial clustering - Leiden or Mclust partitioning with spatial information
Analyzing spatial patterns - metagene analysis, cell type annotation, and spatial visualization
Debugging BANKSY workflows - troubleshooting matrix generation, clustering, or visualization issues
Learning spatial transcriptomics best practices - understanding AGF (adaptive gene filtering) and neighbor weighting

Quick Reference

Core Data Processing Patterns

Preprocess spatial data (python):

import scanpy as sc
from filter_utils import preprocess_data, filter_cells, feature_selection

# Basic preprocessing
adata = preprocess_data(adata, log1p=True)
adata = filter_cells(adata, min_count=500, max_count=50000, MT_filter=20, gene_filter=3)
adata = feature_selection(adata, sample="slide_seq", coord_keys=('x', 'y'), hvgs=2000)

Generate BANKSY matrices (python):

from embed_banksy import generate_banksy_matrix

# Create BANKSY matrices with spatial context
banksy_dict, banksy_matrix = generate_banksy_matrix(
    adata=adata,
    banksy_dict=banksy_dict,
    lambda_list=[0.2, 0.5, 0.8],
    max_m=2,
    plot_std=True,
    save_matrix=True
)

Clustering and Analysis Patterns

Spatial clustering with Leiden (python):

from cluster_methods import run_leiden_partition

# Run spatial clustering
results_df = run_leiden_partition(
    banksy_dict=banksy_dict,
    resolutions=[0.4, 0.6, 0.8],
    num_nn=50,
    partition_seed=1234,
    match_labels=True
)

Cell type annotation and refinement (python):

from cluster_utils import pad_clusters, refine_cell_types

# Annotate clusters
cluster2annotation = {'0': 'Excitatory', '1': 'Inhibitory', '2': 'Astrocyte'}
pad_clusters(cluster2annotation, original_clusters, pad_name='other')

# Refine cell types
adata_spatial, adata_nonspatial = refine_cell_types(
    adata_spatial, adata_nonspatial, cluster2annotation_refine
)

Metagene Analysis Patterns

Create metagene data for validation (python):

from cluster_utils import create_metagene_df, get_metagene_difference

# Generate metagene dataframe
metagene_df = create_metagene_df(
    adata_allgenes,
    coord_keys=['x', 'y'],
    markergenes_dict=custom_markers
)

# Compare metagene expressions
diff_main, diff_nbr = get_metagene_difference(
    adata, DE_genes1, DE_genes2, m=1
)

Quality Control and Validation Patterns

Calculate clustering metrics (python):

from cluster_utils import calculate_ari, get_DEgenes

# Calculate Adjusted Rand Index
ari_score = calculate_ari(adata, manual='cell_type_manual', predicted='cell_type_predicted')

# Get top differentially expressed genes
top_genes = get_DEgenes(adata, cell_type='Excitatory', top_n=20)

Data normalization and filtering (python):

from filter_utils import normalize_total, filter_hvg

# Normalize total counts
adata = normalize_total(adata)

# Filter highly variable genes
adata_hvg, adata_all = filter_hvg(adata, n_top_genes=2000, flavor='seurat_v3')

Key Concepts

Core BANKSY Components

BANKSY Matrix: Enhanced feature matrix combining original expression with spatially-averaged neighbor information
Lambda Parameter: Controls the contribution of spatial neighborhood information (0.0 = no spatial, 1.0 = pure spatial)
AGF (Adaptive Gene Filtering): Captures spatial variance patterns by computing absolute differences between cell and neighborhood expressions
Neighbor Weight Decay: How spatial influence decreases with distance (gaussian, scaled_gaussian, etc.)
Max_m: Maximum order of neighborhood averaging (m=0 = mean, m≥1 = AGF)

Spatial Analysis Workflow

Data Preprocessing: QC filtering, normalization, and feature selection
Spatial Graph Construction: Build neighbor relationships with spatial coordinates
BANKSY Matrix Generation: Combine expression with spatial context
Dimensionality Reduction: PCA on BANKSY matrices
Spatial Clustering: Leiden/Mclust with spatial awareness
Cell Type Annotation: Manual or automated labeling
Validation: Metagene analysis and spatial pattern validation

Reference Files

This skill includes comprehensive documentation in references/:

Core Analysis Documentation

core_analysis.md - Essential BANKSY matrix generation and embedding functions
- embed_banksy.py: Core matrix generation with AGF implementation
- main.py: BANKSY main functions and utilities
- neighbors.py: Spatial neighbor graph construction
- pca_utils.py: Dimensionality reduction for spatial data

Clustering Methods Documentation

clustering_methods.md - Spatial clustering algorithms and utilities
- cluster_methods.py: Leiden and Mclust partitioning implementations
- cluster_utils.py: Cell type annotation and metagene analysis

Data Processing Documentation

data_loading.md - Data preprocessing and filtering utilities
- preprocessing.py: Basic data preprocessing and QC metrics
- filter_utils.py: Cell/gene filtering and feature selection

Specialized Analysis Documentation

dlpfc_analysis.md - DLPFC (human brain) dataset specific workflows
slideseq_analysis.md - Slide-seq platform specific implementations
starmap_analysis.md - STARmap platform analysis workflows
visualization.md - Spatial visualization and plotting utilities

Getting Started Documentation

getting_started.md - Installation, setup, and basic workflow tutorials
data_types.md - Data format specifications and AnnData structures
utilities.md - Helper functions and utility tools

Working with This Skill

For Beginners

Start with getting_started.md - Learn installation and basic workflow
Review data_loading.md - Understand data preprocessing requirements
Study core_analysis.md - Master BANKSY matrix generation
Practice with simple examples - Use the Quick Reference patterns above

For Intermediate Users

Explore clustering_methods.md - Implement spatial clustering algorithms
Study platform-specific docs - slideseq_analysis.md or starmap_analysis.md based on your data
Learn visualization.md - Create effective spatial visualizations
Use cluster_utils.md - Advanced cell type annotation and validation

For Advanced Users

Modify core algorithms - Customize embed_banksy.py for novel applications
Implement new clustering methods - Extend cluster_methods.py
Develop platform-specific workflows - Create new analysis modules
Optimize performance - Tune neighbor graph construction and matrix operations

Navigation Tips

Use the search function to find specific functions or parameters
Cross-reference between files - many functions work together across modules
Check function dependencies - some functions require specific data preprocessing steps
Study the code examples - each reference file contains practical implementation examples

Resources

references/

Organized documentation extracted from official sources. These files contain:

Complete function implementations with full code and documentation
Parameter explanations and usage recommendations
Code examples with language annotations for different platforms
Spatial analysis best practices and workflow recommendations
Platform-specific guidance for Slide-seq, STARmap, Visium, and more

scripts/

Add helper scripts here for:

Custom data preprocessing pipelines
Batch processing automation
Quality control reporting
Result visualization workflows

assets/

Store templates and examples:

Configuration files for different spatial platforms
Example datasets for testing workflows
Marker gene dictionaries for different tissue types
Visualization templates for spatial plots

Notes

This skill was generated from complete BANKSY source code and documentation
All code examples are extracted from actual working implementations
Functions maintain their original signatures and dependencies
Spatial coordinates should be in consistent coordinate systems
Memory usage scales with dataset size and neighborhood complexity
GPU acceleration is available for certain operations (check individual function docs)

Updating

To refresh this skill with updated documentation:

Re-run the documentation scraper with the same configuration
The skill will be rebuilt with the latest code examples and functions
All enhanced examples and quick references will be updated automatically

banksy-merged-v4

Install Skill

SKILL.md