Claude Code Plugins

Community-maintained marketplace

Feedback

single-cell-annotation-skills-with-omicverse

@Starlitnightly/omicverse
768
0

Guide Claude through SCSA, MetaTiME, CellVote, CellMatch, GPTAnno, and weighted KNN transfer workflows for annotating single-cell modalities.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name single-cell-annotation-skills-with-omicverse
title Single-cell annotation skills with omicverse
description Guide Claude through SCSA, MetaTiME, CellVote, CellMatch, GPTAnno, and weighted KNN transfer workflows for annotating single-cell modalities.

Single-cell annotation skills with omicverse

Overview

Use this skill to reproduce and adapt the single-cell annotation playbook captured in omicverse tutorials: SCSA t_cellanno.ipynb, MetaTiME t_metatime.ipynb, CellVote t_cellvote.md & t_cellvote_pbmc3k.ipynb, CellMatch t_cellmatch.ipynb, GPTAnno t_gptanno.ipynb, and label transfer t_anno_trans.ipynb. Each section below highlights required inputs, training/inference steps, and how to read the outputs.

Instructions

  1. SCSA automated cluster annotation

    • Data requirements: PBMC3k raw counts from 10x Genomics (pbmc3k_filtered_gene_bc_matrices.tar.gz) or the processed sample/rna.h5ad. Download instructions are embedded in the notebook; unpack to data/filtered_gene_bc_matrices/hg19/. Ensure an SCSA SQLite database is available (e.g. pySCSA_2024_v1_plus.db from the Figshare/Drive links listed in the tutorial) and point model_path to its location.
    • Preprocessing & model fit: Load with sc.read_10x_mtx, run QC (ov.pp.qc), normalization and HVG selection (ov.pp.preprocess), scaling (ov.pp.scale), PCA (ov.pp.pca), neighbors, Leiden clustering, and compute rank markers (sc.tl.rank_genes_groups). Instantiate scsa = ov.single.pySCSA(...) choosing target='cellmarker' or 'panglaodb', tissue scope, and thresholds (foldchange, pvalue).
    • Inference & interpretation: Call scsa.cell_anno(clustertype='leiden', result_key='scsa_celltype_cellmarker') or scsa.cell_auto_anno to append predictions to adata.obs. Compare to manual marker-based labels via ov.utils.embedding or sc.pl.dotplot, inspect marker dictionaries (ov.single.get_celltype_marker), and query supported tissues with scsa.get_model_tissue(). Use the ROI/ROE helpers (ov.utils.roe, ov.utils.plot_cellproportion) to validate abundance trends.
  2. MetaTiME tumour microenvironment states

    • Data requirements: Batched TME AnnData with an scVI latent embedding. The tutorial uses TiME_adata_scvi.h5ad from Figshare (https://figshare.com/ndownloader/files/41440050). If starting from counts, run scVI (scvi.model.SCVI) first to populate adata.obsm['X_scVI'].
    • Preprocessing & model fit: Optionally subset to non-malignant cells via adata.obs['isTME']. Rebuild neighbors on the latent representation (sc.pp.neighbors(adata, use_rep="X_scVI")) and embed with pymde (adata.obsm['X_mde'] = ov.utils.mde(...)). Initialise TiME_object = ov.single.MetaTiME(adata, mode='table') and, if finer granularity is desired, over-cluster with TiME_object.overcluster(resolution=8, clustercol='overcluster').
    • Inference & interpretation: Run TiME_object.predictTiME(save_obs_name='MetaTiME') to assign minor states and Major_MetaTiME. Visualise using TiME_object.plot or sc.pl.embedding. Interpret the outputs by comparing cluster-level distributions and confirming that MetaTiME and Major_MetaTiME columns align with expected niches.
  3. CellVote consensus labelling

    • Data requirements: A clustered AnnData (e.g. PBMC3k stored as CELLVOTE_PBMC3K env var or data/pbmc3k.h5ad) plus at least two precomputed annotation columns (simulated in the tutorial as scsa_annotation, gpt_celltype, gbi_celltype). Prepare per-cluster marker genes via sc.tl.rank_genes_groups.
    • Preprocessing & model fit: After standard preprocessing (normalize, log1p, HVGs, PCA, neighbors, Leiden) build a marker dictionary marker_dict = top_markers_from_rgg(adata, 'leiden', topn=10) or via ov.single.get_celltype_marker. Instantiate cv = ov.single.CellVote(adata).
    • Inference & interpretation: Call cv.vote(clusters_key='leiden', cluster_markers=marker_dict, celltype_keys=[...], species='human', organization='PBMC', provider='openai', model='gpt-4o-mini'). Offline examples monkey-patch arbitration to avoid API calls; online voting requires valid credentials. Final consensus labels live in adata.obs['CellVote_celltype']. Compare each cluster’s majority vote with the input sources (adata.obs[['leiden', 'scsa_annotation', ...]]) to justify decisions.
  4. CellMatch ontology mapping

    • Data requirements: Annotated AnnData such as pertpy.dt.haber_2017_regions() with adata.obs['cell_label']. Download Cell Ontology JSON (cl.json) via ov.single.download_cl(...) or manual links, and optionally Cell Taxonomy resources (Cell_Taxonomy_resource.txt). Ensure access to a SentenceTransformer model (sentence-transformers/all-MiniLM-L6-v2, BAAI/bge-base-en-v1.5, etc.), downloading to local_model_dir if offline.
    • Preprocessing & model fit: Create the mapper with ov.single.CellOntologyMapper(cl_obo_file='new_ontology/cl.json', model_name='sentence-transformers/all-MiniLM-L6-v2', local_model_dir='./my_models'). Run mapper.map_adata(...) to assign ontology-derived labels/IDs, optionally enabling taxonomy matching (use_taxonomy=True after calling load_cell_taxonomy_resource).
    • Inference & interpretation: Explore mapping summaries (mapper.print_mapping_summary_taxonomy) and inspect embeddings coloured by cell_ontology, cell_ontology_cl_id, or enhanced_cell_ontology. Use helper queries such as mapper.find_similar_cells('T helper cell'), mapper.get_cell_info(...), and category browsing to validate ontology coverage.
  5. GPTAnno LLM-powered annotation

    • Data requirements: The same PBMC3k dataset (raw matrix or .h5ad) and cluster assignments. Access to an LLM endpoint—configure AGI_API_KEY for OpenAI-compatible providers (provider='openai', 'qwen', 'kimi', etc.), or supply a local model path for ov.single.gptcelltype_local.
    • Preprocessing & model fit: Follow the QC, normalization, HVG, scaling, PCA, neighbor, Leiden, and marker discovery steps described above (reusing outputs from the SCSA workflow). Build the marker dictionary automatically with ov.single.get_celltype_marker(adata, clustertype='leiden', rank=True, key='rank_genes_groups', foldchange=2, topgenenumber=5).
    • Inference & interpretation: Invoke ov.single.gptcelltype(...) specifying tissue/species context and desired provider/model. Post-process responses to keep clean labels (result[key].split(': ')[-1]...) and write them to adata.obs['gpt_celltype']. Compare embeddings (ov.pl.embedding(..., color=['leiden','gpt_celltype'])) to verify cluster identities. If operating offline, call ov.single.gptcelltype_local with a downloaded instruction-tuned checkpoint.
  6. Weighted KNN annotation transfer

    • Data requirements: Cross-modal GLUE outputs with aligned embeddings, e.g. data/analysis_lymph/rna-emb.h5ad (annotated RNA) and data/analysis_lymph/atac-emb.h5ad (query ATAC) where both contain obsm['X_glue'].
    • Preprocessing & model fit: Load both modalities, optionally concatenate for QC plots, and compute a shared low-dimensional embedding with ov.utils.mde. Train a neighbour model using ov.utils.weighted_knn_trainer(train_adata=rna, train_adata_emb='X_glue', n_neighbors=15).
    • Inference & interpretation: Transfer labels via labels, uncert = ov.utils.weighted_knn_transfer(query_adata=atac, query_adata_emb='X_glue', label_keys='major_celltype', knn_model=knn_transformer, ref_adata_obs=rna.obs). Store predictions in atac.obs['transf_celltype'] and uncertainties in atac.obs['transf_celltype_unc']; copy to major_celltype if you want consistent naming. Visualise (ov.utils.embedding) and inspect uncertainty to flag ambiguous cells.

Examples

  • "Run SCSA with both CellMarker and PanglaoDB references on PBMC3k, then benchmark against manual marker assignments before feeding the results into CellVote."
  • "Annotate tumour microenvironment states in the MetaTiME Figshare dataset, highlight Major_MetaTiME classes, and export the label distribution per patient."
  • "Download Cell Ontology resources, map haber_2017_regions clusters to ontology terms, and enrich ambiguous clusters using Cell Taxonomy hints."
  • "Propagate RNA-derived major_celltype labels onto GLUE-integrated ATAC cells and report clusters with high transfer uncertainty."

References