name	vlm-embed
description	Manage VLM (Vision-Language Model) embedding service and jobs. Use this skill to start the Flask embedding service, run embedding jobs, or train the AE3D autoencoder. Invoke with /vlm-embed.

VLM Embedding Pipeline

This skill manages the VLM embedding pipeline which generates 1536-dimensional embeddings for images using the GME-Qwen2-VL-2B-Instruct model.

Architecture

The pipeline separates model loading from Spark orchestration:

Flask Service (vlm_embed_service.py): Loads model once, serves /embed endpoint
Spark Job (vlm_embed.py): Reads from Hive tables, calls service, writes embeddings back to Hive

Hive Tables

Input: images_without_text_renderings (default)
Output: images_without_text_renderings_vlm_embeddings (default)
Sharded output: {output_table}_shard_NNNN when using --num-shards

Start Embedding Service

# Start the Flask service on port 8081
python -m wc_simd.vlm_embed_service --host 0.0.0.0 --port 8081

# Or with specific GPU
CUDA_VISIBLE_DEVICES=0 python -m wc_simd.vlm_embed_service --host 0.0.0.0 --port 8081

The service exposes:

GET /health - Health check with model info
POST /embed - Embed images via {urls: [...]} or {images_b64: [...]}

Run Embedding Job

# Run Spark job to embed all images
python -m wc_simd.vlm_embed --endpoint http://127.0.0.1:8081/embed

# Custom input/output tables
python -m wc_simd.vlm_embed \
  --input-table my_images \
  --output-table my_images_embedded \
  --endpoint http://127.0.0.1:8081/embed

Sharded Processing (Resumable)

For large-scale processing with checkpoint/resume:

# Split into 100 shards, process shard 0
python -m wc_simd.vlm_embed --num-shards 100 --shard-id 0

# Process all shards sequentially (omit --shard-id)
python -m wc_simd.vlm_embed --num-shards 100

# Resume: existing shard tables are skipped by default
python -m wc_simd.vlm_embed --num-shards 100 --shard-id 5

# Force recompute (drop and recreate)
python -m wc_simd.vlm_embed --num-shards 100 --shard-id 5 --no-skip-existing

Prefetch Mode

Download images in Spark workers and send base64 bytes to service (useful when service has limited network access):

python -m wc_simd.vlm_embed \
  --prefetch-images \
  --prefetch-workers 8 \
  --endpoint http://127.0.0.1:8081/embed

Multi-Instance Parallelism

Run multiple Spark jobs on different machines, each processing a subset:

# Machine 1
python -m wc_simd.vlm_embed --instances 4 --instance-no 0

# Machine 2
python -m wc_simd.vlm_embed --instances 4 --instance-no 1

# etc.

Key Options

Option	Default	Description
`--input-table`	`images_without_text_renderings`	Source Hive table
`--output-table`	`images_without_text_renderings_vlm_embeddings`	Destination table
`--endpoint`	`http://127.0.0.1:8081/embed`	Flask service URL
`--batch-size`	16	Images per service call
`--num-partitions`	320	Spark partitions
`--num-shards`	None	Split into N shard tables
`--shard-id`	None	Process specific shard
`--skip-existing`	True	Skip existing output tables
`--prefetch-images`	False	Download in Spark, send bytes

Train AE3D Autoencoder

Reduce 1536-dim embeddings to 3D for visualization:

# Export embeddings from Hive/Parquet to NumPy
python -m wc_simd.vlm_embed_train_data \
  --input-glob "path/to/embeddings/*.parquet" \
  --output-npy data/vlm_embed/embeddings.npy \
  --output-index data/vlm_embed/index.parquet

# Train autoencoder
python -m wc_simd.vlm_embed_ae train \
  --data data/vlm_embed/embeddings.npy \
  --out runs/ae3d

# Inference (project to 3D)
python -m wc_simd.vlm_embed_ae infer \
  --model runs/ae3d/ae.pt \
  --data embeddings.npy \
  --out projected.npy

Troubleshooting

transformers Version Conflict

The GME-Qwen2-VL model requires transformers<4.52.0. If you see errors like:

transformers<4.52.0 is required for normal functioning of this module, but found transformers==4.57.3

Fix by pinning the version:

pip install 'transformers>=4.37.0,<4.52.0'

Or in requirements.txt/Dockerfile:

transformers>=4.37.0,<4.52.0

401 Auth Errors

Some images require authentication (content advisory). These will have embed_error set in the output table.

GPU Memory

The model requires significant GPU memory. Use a GPU with at least 16GB VRAM for the 2B model.

Service Connection

If Spark workers can't reach the service:

Local: use http://127.0.0.1:8081/embed
Docker workers to host: use http://172.19.0.1:8081/embed (Docker gateway IP)

vlm-embed

Install Skill

SKILL.md