Claude Code Plugins

Community-maintained marketplace

Feedback
1
0

PostgreSQL development for Python with full-text search (tsvector, tsquery, BM25 via pg_search), vector similarity (pgvector with HNSW/IVFFlat), JSONB and array indexing, and production deployment. Use when creating search features, storing AI embeddings, querying vector similarity, optimizing PostgreSQL indexes, or deploying to AWS RDS/Aurora, GCP Cloud SQL/AlloyDB, or Azure. Covers psycopg2, psycopg3, asyncpg, SQLAlchemy integration, Docker development setup, and index selection strategies. Triggers: "PostgreSQL search", "pgvector", "BM25 postgres", "JSONB index", "psycopg", "asyncpg", "PostgreSQL Docker", "AlloyDB vector". Does NOT cover: DBA administration (backup, replication, users), MySQL/MongoDB/Redis, schema design theory, stored procedures.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md


name: mastering-postgresql description: PostgreSQL development for Python with full-text search (tsvector, tsquery, BM25 via pg_search), vector similarity (pgvector with HNSW/IVFFlat), JSONB and array indexing, and production deployment. Use when creating search features, storing AI embeddings, querying vector similarity, optimizing PostgreSQL indexes, or deploying to AWS RDS/Aurora, GCP Cloud SQL/AlloyDB, or Azure. Covers psycopg2, psycopg3, asyncpg, SQLAlchemy integration, Docker development setup, and index selection strategies. Triggers: "PostgreSQL search", "pgvector", "BM25 postgres", "JSONB index", "psycopg", "asyncpg", "PostgreSQL Docker", "AlloyDB vector". Does NOT cover: DBA administration (backup, replication, users), MySQL/MongoDB/Redis, schema design theory, stored procedures. allowed-tools: - Read - Bash - Write

PostgreSQL Python Development

Build search, vector similarity, and data-intensive applications with PostgreSQL and Python.

Quick Reference

Task Go To
Docker/local setup setup-and-docker.md
Full-text search & BM25 search-fulltext.md
pgvector & JSONB indexing search-vectors-json.md
Python drivers & pools python-drivers.md
Python query patterns python-queries.md
AWS RDS/Aurora cloud-aws.md
GCP Cloud SQL/AlloyDB cloud-gcp.md
Azure Flexible Server cloud-azure.md
Neon & Supabase cloud-serverless.md
Cloud common (pooling, config) cloud-common.md

When NOT to Use This Skill

  • DBA tasks: Backup strategies, replication setup, user management, security hardening
  • Other databases: MySQL, MongoDB, Redis, Elasticsearch-specific queries
  • Schema design: Normalization theory, data modeling patterns
  • Stored procedures: PL/pgSQL function development
  • Application frameworks: Django ORM specifics, FastAPI integration details

Quick Start Checklist

Copy this checklist to track progress:

Setup Progress:
- [ ] Docker environment running (docker-compose up -d)
- [ ] Connected to database (psql or Python)
- [ ] Extensions created (pgvector, pg_trgm)
- [ ] Table created with search_vector and embedding columns
- [ ] GIN index on search_vector created
- [ ] HNSW index on embedding created
- [ ] Test full-text query returns results
- [ ] Test vector query returns results

Quick Start: Search + Vectors in 5 Minutes

1. Start PostgreSQL with pgvector

# docker-compose.yml
services:
  postgres:
    image: pgvector/pgvector:pg17
    environment:
      POSTGRES_PASSWORD: devpass
    ports: ["5432:5432"]
    volumes: [pgdata:/var/lib/postgresql/data]
volumes:
  pgdata:
docker-compose up -d

# Verify container is running:
docker-compose ps
# Expected: postgres service with status "Up"

2. Enable Extensions

CREATE EXTENSION vector;      -- pgvector for embeddings
CREATE EXTENSION pg_trgm;     -- Trigram for fuzzy search

-- Verify extensions installed:
SELECT extname, extversion FROM pg_extension 
WHERE extname IN ('vector', 'pg_trgm');
-- Expected: 2 rows with version numbers

3. Create Searchable Table with Vectors

CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT,
    embedding vector(1536),
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(content, '')), 'B')
    ) STORED
);

-- Create indexes
CREATE INDEX idx_docs_search ON documents USING GIN (search_vector);
CREATE INDEX idx_docs_embedding ON documents USING hnsw (embedding vector_cosine_ops);

-- Verify indexes created:
SELECT indexname FROM pg_indexes WHERE tablename = 'documents';
-- Expected: idx_docs_search, idx_docs_embedding, documents_pkey

4. Query from Python

import asyncpg

async def search(pool, query: str, embedding: list[float], limit: int = 10):
    return await pool.fetch("""
        SELECT id, title,
               ts_rank(search_vector, websearch_to_tsquery('english', $1)) AS text_rank,
               embedding <=> $2::vector AS vector_dist
        FROM documents
        WHERE search_vector @@ websearch_to_tsquery('english', $1)
        ORDER BY vector_dist
        LIMIT $3
    """, query, embedding, limit)

# Verify connection works:
# pool = await asyncpg.create_pool('postgresql://postgres:devpass@localhost/postgres')
# rows = await pool.fetch("SELECT 1 AS test")
# assert rows[0]['test'] == 1

Decision Trees

Which Search Approach?

Need search? ─┬─► Exact keyword match ──────► B-tree index + WHERE =
              │
              ├─► Full-text search (FTS) ───► tsvector + GIN + ts_rank
              │
              ├─► Relevance like Google ────► pg_search BM25 (ParadeDB)
              │
              ├─► Typo tolerance ───────────► pg_trgm + similarity()
              │
              ├─► Semantic/AI search ───────► pgvector + embeddings
              │
              └─► Hybrid (keywords + semantic) ► Combine tsvector + pgvector

Which Vector Index?

Vector index? ─┬─► Dataset < 100K rows ────► No index (exact search OK)
               │
               ├─► Need best recall ────────► HNSW (slower build, fast query)
               │
               ├─► Fast index build ────────► IVFFlat (needs data first)
               │
               ├─► On AlloyDB ──────────────► ScaNN (Google optimized)
               │
               ├─► On Azure ────────────────► pg_diskann (32x less memory)
               │
               ├─► Billions of vectors ─────► VectorChord vchordrq (self-host)
               │
               └─► Dimensions > 2000 ───────► halfvec or binary quantization

Which Python Library?

Python lib? ──┬─► Sync, simple, stable ─────► psycopg2
              │
              ├─► Async + modern features ──► psycopg3
              │
              ├─► Max async performance ────► asyncpg
              │
              └─► ORM needed ───────────────► SQLAlchemy + asyncpg/psycopg

Which Index Type for Column?

Column type? ─┬─► Scalar (int, text, timestamp) ─► B-tree (default)
              │
              ├─► JSONB ────────────────────────┬► GIN (general queries)
              │                                 └► GIN jsonb_path_ops (@> only)
              │
              ├─► Array ────────────────────────► GIN
              │
              ├─► tsvector ─────────────────────► GIN (or GiST for updates)
              │
              ├─► vector ───────────────────────► HNSW or IVFFlat
              │
              └─► Range / Geometric ────────────► GiST

Common Patterns

For implementation details, see the reference files:

Pattern Reference
Full-text search with ranking search-fulltext.md#ranking-functions
BM25 search search-fulltext.md#bm25-with-pg_search
Vector similarity search-vectors-json.md#distance-operators
JSONB containment search-vectors-json.md#jsonb-indexing
Array overlap search-vectors-json.md#array-indexing
Bulk insert python-queries.md#bulk-insert-strategies
Connection pool python-drivers.md#asyncpg-async-only

Index Tuning Quick Reference

HNSW Parameters

Parameter Default Guidance
m 16 Higher = better recall, more memory. 12-48 typical
ef_construction 64 Higher = better index quality, slower build. 64-200
hnsw.ef_search 40 Set at query time. Higher = better recall, slower
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops) WITH (m=16, ef_construction=100);
SET hnsw.ef_search = 100;  -- Before querying

-- Verify setting applied:
SHOW hnsw.ef_search;

IVFFlat Parameters

Parameter Guidance
lists sqrt(rows) for <1M rows; rows/1000 for >1M
ivfflat.probes Start at sqrt(lists), increase for recall
CREATE INDEX ON docs USING ivfflat (embedding vector_l2_ops) WITH (lists=100);
SET ivfflat.probes = 10;

Troubleshooting Quick Reference

Symptom Likely Cause Fix
Seq Scan on indexed column Stats outdated ANALYZE tablename;
Vector search slow No index or low ef_search Create HNSW index, increase ef_search
Poor vector recall IVFFlat probes too low Increase ivfflat.probes
FTS not matching Wrong language config Check to_tsvector('english', ...)
Index not used Query doesn't match ops Verify operator class matches query
Connection timeout Pool exhausted Increase pool size or fix leaks
Extension not found Not installed CREATE EXTENSION name;
HNSW build OOM Insufficient memory Increase maintenance_work_mem
Filtered queries return few results Filtering after index scan Enable hnsw.iterative_scan
Connection drops in production No health checking Use check=ConnectionPool.check_connection
Scaling past 100M vectors pgvector limits Consider VectorChord vchordrq

For detailed troubleshooting, see search-vectors-json.md.

Script Usage

pip install -r scripts/requirements.txt  # Install dependencies first
Script Purpose When to Use
setup_extensions.py Install pgvector, pg_trgm extensions Initial database setup
create_search_tables.py Create tables with search_vector and embedding columns After extensions installed
health_check.py Check index health, bloat, and performance Diagnosing slow queries
vector_search.py --demo Demonstrate vector similarity queries Learning pgvector patterns
bulk_insert.py High-performance data loading Importing large datasets
fts_examples.py Full-text search query examples Learning FTS syntax
connection_pool.py Connection pooling patterns Production deployments

Example:

python scripts/setup_extensions.py --host localhost --dbname mydb
python scripts/create_search_tables.py --host localhost --dbname mydb
python scripts/health_check.py --host localhost --dbname mydb

Cloud Quick Reference

Provider pgvector BM25 Support Connection Pooling
AWS RDS/Aurora 0.8.0 pg_textsearch (preview) RDS Proxy
GCP Cloud SQL 0.8.0 pg_textsearch (preview) Cloud SQL Proxy
GCP AlloyDB 0.8.0 + ScaNN pg_textsearch (preview) Built-in
Azure Flexible 0.8.0 + pg_diskann pg_textsearch (preview) Built-in PgBouncer
Neon pg_search Built-in
Supabase pg_search Built-in

Serverless options: Neon (scale-to-zero, instant branching) and Supabase (BaaS with auth/real-time) are ideal for dev/test and startups. See cloud-serverless.md.

BM25 Options:

  • pg_search (ParadeDB): Production-ready, self-host or ParadeDB managed service
  • pg_textsearch (TigerData): Preview status, available on managed PostgreSQL services

See provider-specific files for setup commands: AWS | GCP | Azure

Reference Files

Load these for detailed implementation guidance:

Reference Load When
setup-and-docker.md Docker setup, extension installation, postgresql.conf tuning
search-fulltext.md Full-text search (FTS), BM25 setup, trigram fuzzy search
search-vectors-json.md pgvector tuning, JSONB/array indexing, maintenance
python-drivers.md psycopg2/psycopg3/asyncpg, connection pools, SQLAlchemy
python-queries.md Bulk inserts, FTS queries, vector queries, JSONB operations
cloud-aws.md AWS RDS/Aurora setup, RDS Proxy
cloud-gcp.md GCP Cloud SQL/AlloyDB, ScaNN indexes
cloud-azure.md Azure Flexible Server, pg_diskann
cloud-serverless.md Neon, Supabase (scale-to-zero, branching)
cloud-common.md Extension matrix, pooling, production config, costs