Claude Code Plugins

Community-maintained marketplace

Feedback

Extract summaries and descriptions from images using vision-language models. Use when working with image attachments, analyzing photos, screenshots, diagrams, or when asked to describe or understand image content.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name image-processing
description Extract summaries and descriptions from images using vision-language models. Use when working with image attachments, analyzing photos, screenshots, diagrams, or when asked to describe or understand image content.

Image Processing

This skill covers how to process and analyze image attachments in advisory work.

Supported Formats

Extension Type Best For
.jpg, .jpeg JPEG Photos, screenshots
.png PNG Screenshots, diagrams, transparent images
.gif GIF Animated images, simple graphics
.bmp BMP Windows bitmap images
.webp WebP Modern web images
.tiff, .tif TIFF High-quality images, scans

Image Storage

Images from email attachments are:

  • Saved to attachments/ directory (same as documents)
  • Filtered by minimum size (5KB) to exclude logos/signatures
  • Checksummed to prevent duplicate storage
  • Automatically indexed in attachments/INDEX.md with basic metadata
  • Not processed by default - you must explicitly process them

INDEX.md Structure

The attachments/INDEX.md file is the authoritative reference for all attachments in the repository. It:

  • Lists every file in the attachments/ directory
  • Shows processing status (✓ for processed, ✗ for not processed)
  • Includes summaries and details for processed images
  • Is automatically updated when:
    • New attachments are downloaded (via update_attachments_index())
    • Images are processed (via update_image_processing_status())

Important: The INDEX.md file should always faithfully reflect all files in the attachments folder. If you manually add files or process images, ensure INDEX.md is updated accordingly.

WORKFLOW: Processing Images

When you need to analyze an image:

Step 1: Check Available Images

Look for images in the attachments directory:

ls -la attachments/
cat attachments/INDEX.md

Step 2: Check if Already Processed

Look for a .json or .md file alongside the image:

ls -la attachments/
# If photo.jpg exists, check for photo.json or photo.md

Step 3: Process the Image

If not processed, use the image handler:

# Generate summary only (default)
python -m src.image_handler attachments/photo.jpg

# Generate detailed description
python -m src.image_handler attachments/photo.jpg --details

# Generate both summary and details (single API call)
python -m src.image_handler attachments/photo.jpg --summary --details

# Only detailed description (no summary)
python -m src.image_handler attachments/photo.jpg --details-only

This creates:

  • A .json file with structured results (cached for future use)
  • A .md file with formatted output for easy reading
  • Automatically updates attachments/INDEX.md to mark the image as processed and include summary/details

Step 4: Read the Results

# Read the markdown output
cat attachments/photo.md

# Or read the JSON for structured data
cat attachments/photo.json

Processing Methods

Currently supported:

  • low-cost-vlm (default): Uses HuggingFace Qwen3-VL-8B-Instruct model
    • Automatically downsizes images >1024px to reduce API costs
    • Supports summary and detailed descriptions
    • Makes a single API call when both summary and details are requested

Method: low-cost-vlm

This method uses HuggingFace's inference API with the Qwen vision-language model.

Usage:

python -m src.image_handler attachments/photo.jpg --method low-cost-vlm

Output:

  • Summary: One-sentence description
  • Details: Comprehensive description including elements, text, colors, composition, context

INDEX.md Reference

The attachments/INDEX.md file serves as the single source of truth for all attachments:

  1. Every file in attachments/ should have an entry in INDEX.md
  2. Processing status is tracked - entries show ✓ (processed) or ✗ (not processed)
  3. Processed images include summary and detailed descriptions in the index
  4. The index is automatically maintained when:
    • Attachments are downloaded from email
    • Images are processed via python -m src.image_handler

Example INDEX.md entry:

### photo.jpg
- **File**: `2026-01-03-photo.jpg`
- **Type**: image/jpeg
- **Size**: 822898 bytes
- **Path**: `attachments/2026-01-03-photo.jpg`
- **Checksum**: `29314559e912ab1ef6889fb19e3563738b4475d4d81db70da4301144f68280cc`
- **Processed**: ✓
- **Summary**: A humorous cat-themed mousepad with the slogan "I WORK HARD SO MY CAT CAN HAVE NICE THINGS"
- **Details**: The image displays a close-up, slightly angled overhead view...

When processing images manually, ensure INDEX.md is updated to reflect the processing status.

Processing Options

# Standard processing (summary only)
python -m src.image_handler attachments/photo.jpg

# Summary only (explicit)
python -m src.image_handler attachments/photo.jpg --summary-only

# Detailed description only
python -m src.image_handler attachments/photo.jpg --details-only

# Both summary and details (efficient: single API call)
python -m src.image_handler attachments/photo.jpg --summary --details

# Force re-processing (ignore cache)
python -m src.image_handler attachments/photo.jpg --force

Output Format

The generated markdown includes:

---
source: original-filename.jpg
method: low-cost-vlm
processed: low-cost-vlm
---

## Summary

[One-sentence description of the image]

## Detailed Description

[Comprehensive description including all visible elements, text, colors, composition, and context]

## Metadata

- **Dimensions**: 1920x1080
- **Size**: 245678 bytes
- **Format**: .jpg

Image Filtering

Images are automatically filtered during email processing:

  • Size filter: Images < 5KB are ignored (likely signatures/logos)
  • Pattern filter: Files with names like "image001", "logo", "icon", "signature" are ignored
  • Checksum: Duplicate images (same binary content) are not stored twice

Duplicate Prevention

The system uses SHA256 checksums to prevent storing duplicate images:

  • If an image with the same checksum already exists, it's not downloaded again
  • The existing image reference is used instead
  • This prevents repository bloat from forwarded emails with the same images

When to Process Images

Process images when:

  • User asks you to describe or analyze an image
  • Image contains text you need to read (screenshots, documents)
  • Image shows diagrams, charts, or visual data
  • User wants a summary of what's in an image
  • Image is part of a larger analysis task

Note: Images are not processed automatically to save API costs. Only process when needed.

Cost Considerations

  • VLM processing uses HuggingFace API
  • Images are automatically downsized to max 1024px to reduce costs
  • When both summary and details are requested, a single API call is made (more efficient)
  • Results are cached - re-processing uses cache unless --force is used
  • Only process images when explicitly needed or requested

Limitations

  • Large images are automatically resized (may lose some detail)
  • Animated GIFs are processed as static frames
  • Very complex images may have incomplete descriptions
  • Text in images may not be perfectly extracted (depends on VLM quality)

Notes

  • Processing is cached: first run creates .json and .md, subsequent reads use cache
  • Use --force if you want to re-process with updated models
  • Images are stored in attachments/ alongside documents
  • Checksumming prevents duplicate storage automatically
  • When both summary and details are requested, the tool makes a single efficient API call