Claude Code Plugins

Community-maintained marketplace

Feedback

multiplex-imaging-analysis

@smith6jt-cop/Skills_Registry
0
0

Optimizing segmentation feature extraction, SOM clustering, and spatial analysis for multiplex immunofluorescence imaging with cell compartment and ECM features

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name multiplex-imaging-analysis
description Optimizing segmentation feature extraction, SOM clustering, and spatial analysis for multiplex immunofluorescence imaging with cell compartment and ECM features
author smith6jt
date Fri Dec 12 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

Multiplex Imaging Analysis - Research Notes

Experiment Overview

Item Details
Date 2025-12-12
Goal Review and optimize Notebook 4 for segmentation feature extraction, SOM clustering, and spatial analysis with ECM integration
Environment KINTSUGI v1.2.0, Python 3.10+, InstanSeg, pyFlowSOM, scanpy 1.x, scimap 2.2.x, napari-simpleitk-image-processing
Status Planning Complete - Improvements Identified

Context

KINTSUGI Notebook 4 performs cell/nuclear/ECM segmentation followed by feature extraction, SOM clustering, and spatial analysis. The current workflow uses:

  • InstanSeg for deep learning segmentation
  • napari-simpleitk-image-processing.label_statistics for feature extraction
  • pyFlowSOM for self-organizing map clustering
  • scanpy + scimap for spatial analysis and phenotyping

Literature review and analysis of forked repos (scimap, KODAMA, cylinter) identified optimization opportunities in all four areas.

Verified Workflow

Current Feature Extraction (Working)

from napari_simpleitk_image_processing import label_statistics

# Basic features work well
marker_stats = label_statistics(
    intensity_image=marker[i],
    label_image=matched_cytoplasm_label,
    size=True,
    intensity=True,
    perimeter=True,
    shape=True,
    position=True,
    moments=True
)

Current SOM Clustering (Working)

import pyFlowSOM
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
train_array = scaler.fit_transform(pixel_data[channels].values.astype(np.float64))
som = pyFlowSOM.som(train_array, grid_size, grid_size, num_passes,
                    alpha_range=(lr_start, lr_end), seed=627)
clusters, dist = pyFlowSOM.map_data_to_nodes(som, train_array)

ECM Mask Creation (Working)

from skimage.segmentation import watershed
from scipy import ndimage

binary_mask = marker_label > 0
distance = ndimage.distance_transform_edt(~binary_mask)
ecm_label = watershed(distance, marker_label)
ecm_label[marker_label > 0] = 0  # Remove cell regions

Matched Compartment Masks (Working)

# Cytoplasm = cell mask minus nuclear regions
matched_cytoplasm_label = marker_label.copy()
matched_cytoplasm_label[nuclear_label > 0] = 0

# Nuclear regions from cell mask
matched_nuclear_label = marker_label.copy()
matched_nuclear_label[matched_cytoplasm_label > 0] = 0

Failed Attempts (Critical)

Attempt Why it Failed Lesson Learned
Using resolve_cell_and_nucleus=True in InstanSeg Can cause label mismatches between cell and nuclear masks Set to False and create matched masks post-hoc for consistent labels
Clustering all markers together without compartment separation Dilutes compartment-specific signals, mixed phenotypes Cluster cell, nuclear, ECM features separately then merge
Fixed SOM grid size (e.g., 10x10) Under/over-clusters depending on data complexity Use elbow method or try multiple grid sizes (5x5 to 15x15)
Single-pass SOM training Unstable cluster assignments Use 10+ passes with decreasing learning rate (0.06 -> 0.001)
Not StandardScaler normalizing before SOM Markers with high intensity dominate clustering Always z-score normalize: StandardScaler().fit_transform()
Using sc.pp.log1p() on uint16 without normalization Incorrect scaling for scanpy downstream Call sc.pp.normalize_total() before sc.pp.log1p()
Merging DataFrames without checking label alignment NaN values in merged features Use inner joins and verify label columns match

Recommended Improvements

1. Enhanced Feature Extraction

from skimage.feature import graycomatrix, graycoprops
import numpy as np

def extract_texture_features(image, labels, distances=[1, 3], angles=[0, np.pi/4, np.pi/2]):
    """Add Haralick GLCM texture features per label."""
    props = ['contrast', 'homogeneity', 'energy', 'correlation']
    features = {}
    for label_id in np.unique(labels)[1:]:  # Skip background
        mask = labels == label_id
        region = image[mask].reshape(-1)
        # Compute GLCM and properties
        # Extract the region as a square patch (bounding box) for GLCM
        coords = np.argwhere(mask)
        if coords.size == 0:
            continue
        minr, minc = coords.min(axis=0)
        maxr, maxc = coords.max(axis=0) + 1
        patch = image[minr:maxr, minc:maxc]
        patch_mask = mask[minr:maxr, minc:maxc]
        # Mask out background in patch
        patch = np.where(patch_mask, patch, 0)
        # Rescale patch to 8-bit if needed
        if patch.max() > 255 or patch.dtype != np.uint8:
            patch = ((patch - patch.min()) / (patch.ptp() + 1e-8) * 255).astype(np.uint8)
        # Compute GLCM
        glcm = graycomatrix(patch, 
                            distances=distances, 
                            angles=angles, 
                            levels=256, 
                            symmetric=True, 
                            normed=True)
        # Extract properties
        features[label_id] = {}
        for prop in props:
            val = graycoprops(glcm, prop)
            # Average over distances and angles
            features[label_id][prop] = val.mean()
    return features

# Nuclear-cytoplasmic ratio (important for cell state)
nc_ratio = nuclear_df['nuclear_number_of_pixels'] / cell_df['cell_number_of_pixels']
dapi_nc_intensity_ratio = nuclear_df['nuclear_DAPI_mean'] / cell_df['cell_DAPI_mean']

2. Improved SOM Clustering (FuseSOM or minisom)

# Alternative: minisom for more control
from minisom import MiniSom

som = MiniSom(xdim, ydim, n_features,
              sigma=1.0,
              learning_rate=0.5,
              neighborhood_function='gaussian',
              topology='hexagonal')
som.train_batch(data, num_iteration=1000)  # Batch training more stable

# Adaptive grid sizing
def optimal_grid_size(n_samples):
    """Heuristic: sqrt(5 * sqrt(n_samples))"""
    return int(np.ceil(np.sqrt(5 * np.sqrt(n_samples))))

3. Enhanced Spatial Analysis (scimap)

import scimap as sm

# Neighborhood enrichment
sm.tl.spatial_count(adata, x_coordinate='X_centroid', y_coordinate='Y_centroid',
                    phenotype='merged_clusters', method='knn', knn=10)

# Spatial LDA for microenvironment identification
sm.tl.spatial_lda(adata, x_coordinate='X_centroid', y_coordinate='Y_centroid',
                  phenotype='merged_clusters', num_motifs=10)

# Cell-cell interaction analysis
sm.tl.spatial_interaction(adata, x_coordinate='X_centroid', y_coordinate='Y_centroid',
                          phenotype='merged_clusters', method='knn', knn=10)

4. ECM-Specific Features

# Cell-ECM distance
from scipy.ndimage import distance_transform_edt

ecm_boundary = (ecm_label > 0).astype(int)
cell_ecm_distance = distance_transform_edt(~ecm_boundary)

# Per-cell ECM distance
cell_to_min_ecm_dist = {}
for cell_id in np.unique(marker_label)[1:]:
    cell_mask = marker_label == cell_id
    min_ecm_dist = cell_ecm_distance[cell_mask].min()
    cell_to_min_ecm_dist[cell_id] = min_ecm_dist
# cell_to_min_ecm_dist now maps each cell_id to its minimum ECM distance

Final Parameters

SOM Clustering (Recommended)

# pyFlowSOM parameters
grid_size: 5-15 (adaptive based on sample size)
num_passes: 10
lr_start: 0.06
lr_end: 0.001
seed: 627 (or any fixed seed for reproducibility)
normalization: StandardScaler (z-score)

Leiden Clustering

resolution: 0.6-1.0 (tune based on desired granularity)
key_added: 'leiden_Res0.6'

Consensus Phenotyping

# Merge SOM + Leiden
n_final_clusters: 20-30 (hierarchical cut)
linkage_method: 'ward'
distance_metric: 'correlation'

scanpy Preprocessing

min_counts: 1
normalize_total: True
log1p: True
scale: zero_center=True

Key Insights

  • Compartment-specific clustering is essential: Cell, nuclear, and ECM features have different biological meanings. Cluster separately then integrate.
  • FuseSOM outperforms standard FlowSOM: For imaging cytometry data specifically (Bioinformatics Advances 2023).
  • scimap has powerful spatial functions: spatial_lda, spatial_count, spatial_interaction not currently used but highly valuable.
  • Label matching is critical: When creating compartment masks, ensure labels propagate correctly to enable merging.
  • ECM watershed expansion: Using distance transform + watershed from cell boundaries is effective for assigning ECM regions to nearest cells.
  • Texture features add value: Haralick/GLCM features capture structural patterns especially in ECM and tissue organization.
  • 3D considerations: Literature shows 2D thin sections fragment ~95% of cells (Nature Methods 2025). Consider thicker sections if phenotype accuracy is critical.

References