name	envi-pkg-local-improved
description	ENVI spatial transcriptomics analysis toolkit - comprehensive documentation with tutorials and Python source code

Envi-Pkg-Local-Improved Skill

Comprehensive assistance with ENVI (Environmental Niche-aware Variational Integration) spatial transcriptomics analysis, generated from official documentation.

When to Use This Skill

This skill should be triggered when:

Core ENVI Analysis Tasks

Spatial transcriptomics integration - When working with paired scRNA-seq and spatial data
Niche covariance analysis - When calculating COVET matrices for cellular niches
Gene imputation - When imputing missing genes in spatial transcriptomics data
Latent embedding analysis - When generating and analyzing ENVI latent representations
Cell type niche composition - When analyzing cellular neighborhood composition

Data Processing & Visualization

MERFISH data analysis - When working with Multiplexed Error-Robust Fluorescence In Situ Hybridization data
Motor cortex spatial analysis - When analyzing cortical layer organization and depth
UMAP visualization - When creating dimensionality reduction plots of integrated data
Force-directed layouts - When computing FDL layouts for covariance matrices

Technical Implementation

Model training and configuration - When setting up and training ENVI models
CUDA/GPU configuration - When configuring computational resources for analysis
Batch processing - When handling multiple spatial datasets or batches
Utility function implementation - When implementing diffusion maps, FDL, or other analytical tools

Specific Use Cases

Predicting spatial context for dissociated scRNA-seq data
Quantifying cellular niches based on gene-gene covariance
Analyzing cortical depth and cellular organization
Integrating multi-modal spatial and single-cell datasets
Debugging ENVI model convergence or performance issues

Quick Reference

Essential Setup

Environment Configuration

import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"  # Change to -1 for CPU
import warnings
warnings.filterwarnings('ignore')

Core Imports

from scenvi.ENVI import ENVI
from scenvi.utils import compute_covet

Data Loading

import scanpy as sc
st_data = sc.read_h5ad('st_data.h5ad')  # Spatial data
sc_data = sc.read_h5ad('sc_data.h5ad')  # scRNA-seq data

Model Initialization & Training

ENVI Model Setup

envi_model = scenvi.ENVI(spatial_data=st_data, sc_data=sc_data)
# Output: Computing Niche Covariance Matrices, Initializing VAE

Model Training

envi_model.train()
envi_model.impute_genes()
envi_model.infer_niche_covet()
envi_model.infer_niche_celltype()

Data Processing Utilities

Force-Directed Layout for COVET Analysis

def FDL(data, k=30):
    nbrs = sklearn.neighbors.NearestNeighbors(n_neighbors=int(k), metric='euclidean', n_jobs=5).fit(data)
    kNN = nbrs.kneighbors_graph(data, mode='distance')
    # Adaptive kernel computation and layout calculation
    layout = force_directed_layout(kernel)
    return layout

Diffusion Maps Implementation

def run_diffusion_maps(data_df, n_components=10, knn=30, alpha=0):
    # Adaptive anisotropic kernel computation
    # Markov normalization and eigen decomposition
    return {"T": T, "EigenVectors": V, "EigenValues": D, "kernel": kernel}

Visualization & Analysis

UMAP for Integrated Data

fit = umap.UMAP(n_neighbors=100, min_dist=0.3, n_components=2)
latent_umap = fit.fit_transform(np.concatenate([st_data.obsm['envi_latent'], sc_data.obsm['envi_latent']], axis=0))

Spatial Visualization with Cell Type Colors

cell_type_palette = {
    'Astro': (0.843137, 0.0, 0.0, 1.0),
    'L23_IT': (0.007843, 0.533333, 0.0, 1.0),
    'Pvalb': (0.47451, 0.0, 0.0, 1.0),
    # ... complete palette for all cell types
}

Specialized Analysis

COVET Matrix Computation

# Use the utility function for niche covariance
covet_matrices = compute_covet(
    spatial_data=st_data,
    k=8,
    g=64,
    spatial_key="spatial"
)

Cortical Depth Analysis

# For specific cell types (e.g., Sst neurons)
st_data_sst = st_data[st_data.obs['cell_type'] == 'Sst']
sc_data_sst = sc_data[sc_data.obs['cell_type'] == 'Sst']
# Run FDL on COVET_SQRT for pseudo-depth prediction

Key Concepts

COVET (Cellular Niche Covariance)

Purpose: Represents and quantifies cellular niches based on gene-gene covariance patterns
Input: Spatial transcriptomics data with physical coordinates
Output: Niche gene-gene covariance matrix for each cell
Key Insight: Distance between COVET matrices is calculated as L2 between their square roots

ENVI (Environmental Niche-aware Variational Integration)

Purpose: Integrates paired scRNA-seq and spatial data using COVET representations
Architecture: Conditional Variational Autoencoder (CVAE) with spatial/scRNA-seq modes
Capabilities:
- Predicts spatial context for dissociated scRNA-seq data
- Imputes missing genes for spatial data
- Generates joint latent embeddings for integrated analysis
- Produces predicted COVET matrices for scRNA-seq data

Analysis Workflow

Preprocessing: Configure environment, load spatial and scRNA-seq data
COVET Calculation: Compute niche covariance matrices for spatial data
Model Training: Train ENVI CVAE on integrated datasets
Inference: Generate predictions, imputations, and latent embeddings
Analysis: Visualize UMAPs, analyze cortical depth, examine niche composition

Computational Considerations

GPU Support: Configure CUDA for accelerated training (use -1 for CPU)
Memory Management: Use batch processing for large datasets
Distance Metrics: COVET analysis uses sqrt-transformed matrices for distance calculations
Batch Effects: Support for batch-aware nearest neighbor computation

Reference Files

This skill includes comprehensive documentation in references/:

core_documentation.md

Utilities documentation - Helper functions and computational tools
Search functionality - Documentation search and navigation
Core infrastructure - Essential framework components

getting_started.md

Complete ENVI tutorial - Step-by-step MERFISH analysis workflow
Installation guide - Package setup and dependency management
Data loading examples - Motor cortex scRNA-seq and MERFISH datasets
Visualization techniques - Spatial plots, UMAPs, and analysis figures
Advanced analysis - COVET analysis, cortical depth prediction, niche composition

python_api.md

Complete API reference - All public functions and classes
Configuration options - Sphinx documentation build configuration
Internal modules - Distribution functions, neural network architectures
Utility functions - Covariance computation, batch processing, niche analysis

Working with This Skill

For Beginners

Start with getting_started.md - Contains the complete tutorial with real MERFISH data
Follow the environment setup - Ensure proper CUDA configuration and package installation
Run the basic workflow - Data loading → model initialization → training → analysis
Study the visualization examples - Learn to create spatial plots and UMAPs

For Intermediate Users

Explore python_api.md - Understand the complete API and customization options
Experiment with utility functions - Implement custom diffusion maps and FDL layouts
Analyze specific cell types - Focus on neuronal subtypes and cortical organization
Optimize model parameters - Adjust k-NN parameters, latent dimensions, and training settings

For Advanced Users

Modify core algorithms - Customize COVET computation, CVAE architecture, or loss functions
Scale to large datasets - Implement batch processing and memory optimization
Integrate new modalities - Adapt ENVI for other spatial transcriptomics platforms
Develop custom analyses - Create specialized niche metrics or visualization techniques

Navigation Tips

Use view command to read specific reference files for detailed information
Search for keywords like "COVET", "latent", "imputation" to find relevant sections
Follow code examples in order - they build from basic setup to advanced analysis
Check function signatures in python_api.md for parameter options and requirements

Resources

references/

Organized documentation extracted from official sources containing:

Detailed conceptual explanations and mathematical foundations
Complete code examples with proper language annotations
Step-by-step tutorials with real biological datasets
Links to original documentation and supplementary materials
Structured table of contents for rapid navigation

scripts/

Add helper scripts for common automation tasks:

Batch processing pipelines for multiple datasets
Custom visualization functions for specific analyses
Data preprocessing utilities for different spatial platforms

assets/

Add templates and example projects:

Complete analysis workflows for MERFISH data
Configuration files for different experimental designs
Example datasets and expected outputs for testing

Notes

Biological focus: Originally developed for motor cortex MERFISH analysis but applicable to any spatial transcriptomics data
Mathematical foundation: Based on rigorous covariance-based niche representation and variational inference
Computational efficiency: Supports GPU acceleration and batch processing for large-scale datasets
Extensibility: Modular design allows customization of individual components
Quality assurance: All code examples extracted from working tutorials and tested implementations

Updating

To refresh this skill with updated documentation:

Re-run the documentation scraper with the same configuration
The skill will be rebuilt with the latest API changes and examples
Existing custom scripts and assets in the skill directory will be preserved
Backup copies are automatically created before updates

envi-pkg-local-improved

Install Skill

SKILL.md