name	banksy-merged-v3
description	BANKSY spatial transcriptomics analysis tool - complete documentation with notebooks and source code

Banksy-Merged-V3 Skill

Comprehensive assistance with BANKSY spatial transcriptomics analysis, including spatially-aware clustering, multi-sample integration, and advanced visualization techniques.

When to Use This Skill

This skill should be triggered when:

Working with spatial transcriptomics data - 10x Visium, Slide-seq, MERFISH, or other spatial platforms
Running BANKSY analysis - Setting up spatial clustering with neighborhood information
Multi-sample integration - Combining multiple spatial datasets with Harmony or other methods
Spatial coordinate processing - Staggering coordinates, handling sample-specific treatments
Clustering and visualization - Running Leiden clustering, UMAP embedding, and spatial plotting
Performance evaluation - Computing ARI scores, comparing clustering results
Data preprocessing - HVG selection, normalization, quality control for spatial data

Quick Reference

Common Patterns

Basic BANKSY Setup

from banksy.initialize_banksy import initialize_banksy
from banksy.embed_banksy import generate_banksy_matrix

# BANKSY parameters for spatial clustering
coord_keys = ('x_pixel', 'y_pixel', 'coord_xy')
nbr_weight_decay = 'scaled_gaussian'
k_geom = 18
lambda_list = [0.2]  # Spatial weighting parameter
m = 1  # Maximum neighborhood order

Multi-Sample Data Loading

from scanpy import read_10x_h5
import anndata as ad

def load_multisamples_as_one(sample):
    data_path = os.path.join("data", "DLPFC", sample)
    expr_path = os.path.join(data_path, f"{sample}_raw_feature_bc_matrix.h5")
    spatial_path = os.path.join(data_path, "tissue_positions_list.txt")

    # Load expression data
    adata = read_10x_h5(expr_path)

    # Load spatial coordinates
    spatial = pd.read_csv(spatial_path, sep=",", header=None, index_col=0)
    adata.obs["x_pixel"] = spatial[4]
    adata.obs["y_pixel"] = spatial[5]

    return adata

Coordinate Staggering for Multi-Sample

# Stagger coordinates to prevent overlap between samples
coords_df = pd.DataFrame(adata.obs[['x_pixel', 'y_pixel', 'sample']])
coords_df['x_pixel'] = coords_df.groupby('sample')['x_pixel'].transform(lambda x: x - x.min())
global_max_x = max(coords_df['x_pixel']) * 1.5

# Add sample-specific offsets
coords_df['sample_no'] = pd.Categorical(coords_df['sample']).codes
coords_df['x_pixel'] = coords_df['x_pixel'] + coords_df['sample_no'] * global_max_x

Data Preprocessing

from banksy_utils import filter_utils

# Normalization to target sum
tar_sum = np.median(adata.X.sum(axis=1).A1)
adata = filter_utils.normalize_total(adata, method='RC', target_sum=tar_sum)

# HVG selection (using pre-computed HVGs for consistency)
r_hvg = pd.read_csv("path_to_hvgs.csv")
adata = adata[:, r_hvg['hvgs'].str.upper()]

Running BANKSY Matrix Generation

# Initialize BANKSY with spatial information
adata.obsm['coord_xy'] = np.vstack((adata.obs['x_pixel'].values,
                                   adata.obs['y_pixel'].values)).T

banksy_dict = initialize_banksy(adata, coord_keys, k_geom,
                                nbr_weight_decay=nbr_weight_decay, max_m=m)

# Generate BANKSY matrix with spatial weighting
banksy_dict, banksy_matrix = generate_banksy_matrix(adata, banksy_dict,
                                                   lambda_list, max_m=m)

Dimensionality Reduction and Harmony Integration

from harmony import harmonize
import umap

# Run Harmony for batch correction
for pca_dim in pca_dims:
    Z = harmonize(banksy_dict[nbr_weight_decay][0.2]["adata"].obsm[f'reduced_pc_{pca_dim}'],
                  banksy_dict[nbr_weight_decay][0.2]["adata"].obs,
                  batch_key='sample')

    # Generate UMAP embeddings
    reducer = umap.UMAP(transform_seed=42)
    umap_embedding = reducer.fit_transform(Z)
    banksy_dict[nbr_weight_decay][0.2]["adata"].obsm[f"reduced_pc_{pca_dim}_umap"] = umap_embedding

Clustering and Evaluation

from banksy.cluster_methods import run_Leiden_partition
from sklearn.metrics.cluster import adjusted_rand_score

# Run Leiden clustering
results_df, max_num_labels = run_Leiden_partition(
    banksy_dict, resolutions=[0.4], num_nn=50,
    num_iterations=-1, partition_seed=1234, match_labels=True
)

# Calculate ARI for evaluation
def calc_ari(adata, manual: str, predicted: str):
    return adjusted_rand_score(adata.obs[manual].cat.codes,
                              adata.obs[predicted].cat.codes)

Spatial Visualization

import matplotlib.pyplot as plt

# Create spatial plots
fig = plt.figure(figsize=(12, 6))
grid = fig.add_gridspec(ncols=3, nrows=2)

for counter, sample in enumerate(samples):
    ax = fig.add_subplot(grid[0, counter])
    scatter = ax.scatter(adata_plt_temp.obs['x_pixel'],
                        adata_plt_temp.obs['y_pixel'],
                        c=adata_plt_temp.obs['labels'],
                        cmap='tab20', s=3, alpha=1.0)
    ax.set_aspect('equal')
    ax.set_title(f'BANKSY {sample} Labels')

Key Concepts

BANKSY Core Principles

Spatially-Aware Clustering: Incorporates neighborhood information into dimensionality reduction and clustering
AGF (Anisotropic Gaussian Filter): Weight decay function for spatial neighbors
Lambda Parameter: Controls spatial vs. transcriptional information weighting (0.0 = non-spatial, >0.0 = spatial)
K-Geometry: Number of spatial neighbors to consider (typically 15-25)
Maximum Order (m): Neighborhood order for spatial information propagation

Multi-Sample Integration

Coordinate Staggering: Prevents spatial overlap between samples by adding offsets
Harmony Integration: Batch correction method for integrating multiple samples
Sample-Specific Treatment: Maintains sample identity while enabling joint analysis

Performance Metrics

ARI (Adjusted Rand Index): Measures clustering agreement with manual annotations
Resolution Parameter: Controls cluster granularity in Leiden clustering
Number of Neighbors: Parameter for k-NN graph construction in clustering

Reference Files

This skill includes comprehensive documentation in references/:

core_library.md (28 pages)

Core BANKSY library documentation including:

slideseq_ref_data.py - Reference dictionaries for Slide-seq dataset annotations
Cell type markers and cluster definitions for cerebellar tissue
Utility objects for spatial transcriptomics analysis
Marker gene dictionaries for major brain cell types

notebooks.md (7 pages)

Complete Jupyter notebook workflows including:

DLPFC_harmony_multisample - End-to-end multi-sample analysis workflow
Data preprocessing and HVG selection
Multi-sample coordinate staggering
BANKSY matrix generation and clustering
Harmony integration for batch correction
Spatial visualization and performance evaluation

Working with This Skill

For Beginners

Start with the DLPFC_harmony_multisample notebook in references/notebooks.md for:

Complete workflow from data loading to results
Step-by-step coordinate handling for multiple samples
Standard BANKSY parameter configurations
Visualization and evaluation methods

For Specific Analysis Tasks

New Datasets: Adapt the multi-sample loading functions in notebooks.md
Parameter Tuning: Modify lambda_list, k_geom, and resolution parameters
Different Platforms: Update coordinate keys and spatial loading functions
Custom Integration: Replace Harmony with other batch correction methods

For Advanced Users

Custom Weight Functions: Implement alternative nbr_weight_decay functions
Performance Optimization: Adjust num_iterations and num_nn for clustering
Large-Scale Analysis: Use the reference dictionaries in core_library.md for cell type annotation
Method Development: Extend the BANKSY matrix generation for novel applications

Code Examples by Complexity

Beginner Level (Setup & Loading)

# Basic imports and data loading
import scanpy as sc
import anndata as ad
from banksy_utils import filter_utils

# Load spatial data
adata = read_10x_h5("sample_data.h5")
# Add coordinates
adata.obs["x_pixel"] = spatial_coords[0]
adata.obs["y_pixel"] = spatial_coords[1]

Intermediate Level (BANKSY Analysis)

# Complete BANKSY workflow
banksy_dict = initialize_banksy(adata, coord_keys, k_geom=18)
banksy_dict, banksy_matrix = generate_banksy_matrix(adata, banksy_dict,
                                                   lambda_list=[0.2])

Advanced Level (Multi-Sample Integration)

# Advanced multi-sample with Harmony
for pca_dim in pca_dims:
    Z = harmonize(banksy_dict[nbr_weight_decay][lambda_val]["adata"]
                  .obsm[f'reduced_pc_{pca_dim}'],
                  banksy_dict[nbr_weight_decay][lambda_val]["adata"].obs,
                  batch_key='sample')
    # UMAP and clustering

Resources

references/

Organized documentation extracted from official sources:

core_library.md - Core library functions and reference data
notebooks.md - Complete analysis workflows with code examples
Preserves original structure and examples from source documentation
Code examples include proper language detection for syntax highlighting

scripts/

Add helper scripts here for:

Custom data loading functions
Parameter optimization routines
Batch processing automation
Quality control metrics

assets/

Add templates and examples for:

Configuration files for different platforms
Standard analysis workflows
Visualization templates
Reference datasets

Notes

This skill was generated from comprehensive BANKSY documentation and notebooks
Reference files maintain the structure and examples from original sources
All code examples are extracted from real analysis workflows
Parameters are based on published BANKSY applications and best practices
Multi-sample integration follows established spatial transcriptomics standards

Common Pitfalls and Solutions

Coordinate Handling

Issue: Overlapping spatial coordinates between samples
Solution: Use coordinate staggering with sample-specific offsets
Code: See multi-sample coordinate transformation in Quick Reference

Parameter Selection

Issue: Poor clustering results
Solution: Adjust lambda parameter (0.1-0.5 typical) and k_geom (15-25)
Guideline: Higher lambda = more spatial influence

Memory Management

Issue: Large datasets causing memory issues
Solution: Use sparse matrices and limit HVGs to 2000-3000 genes
Practice: Monitor memory usage during BANKSY matrix generation

Updating

To refresh this skill with updated documentation:

Re-run the scraper with the same configuration
The skill will be rebuilt with the latest information
Existing custom scripts and assets in scripts/ and assets/ will be preserved

banksy-merged-v3

Install Skill

SKILL.md