name	image-processing
description	Extract summaries and descriptions from images using vision-language models. Use when working with image attachments, analyzing photos, screenshots, diagrams, or when asked to describe or understand image content.

Image Processing

This skill covers how to process and analyze image attachments in advisory work.

Supported Formats

Extension	Type	Best For
`.jpg`, `.jpeg`	JPEG	Photos, screenshots
`.png`	PNG	Screenshots, diagrams, transparent images
`.gif`	GIF	Animated images, simple graphics
`.bmp`	BMP	Windows bitmap images
`.webp`	WebP	Modern web images
`.tiff`, `.tif`	TIFF	High-quality images, scans

Image Storage

Images from email attachments are:

Saved to attachments/ directory (same as documents)
Filtered by minimum size (5KB) to exclude logos/signatures
Checksummed to prevent duplicate storage
Automatically indexed in attachments/INDEX.md with basic metadata
Not processed by default - you must explicitly process them

INDEX.md Structure

The attachments/INDEX.md file is the authoritative reference for all attachments in the repository. It:

Lists every file in the attachments/ directory
Shows processing status (✓ for processed, ✗ for not processed)
Includes summaries and details for processed images
Is automatically updated when:
- New attachments are downloaded (via update_attachments_index())
- Images are processed (via update_image_processing_status())

Important: The INDEX.md file should always faithfully reflect all files in the attachments folder. If you manually add files or process images, ensure INDEX.md is updated accordingly.

WORKFLOW: Processing Images

When you need to analyze an image:

Step 1: Check Available Images

Look for images in the attachments directory:

ls -la attachments/
cat attachments/INDEX.md

Step 2: Check if Already Processed

Look for a .json or .md file alongside the image:

ls -la attachments/
# If photo.jpg exists, check for photo.json or photo.md

Step 3: Process the Image

If not processed, use the image handler:

# Generate summary only (default)
python -m src.image_handler attachments/photo.jpg

# Generate detailed description
python -m src.image_handler attachments/photo.jpg --details

# Generate both summary and details (single API call)
python -m src.image_handler attachments/photo.jpg --summary --details

# Only detailed description (no summary)
python -m src.image_handler attachments/photo.jpg --details-only

This creates:

A .json file with structured results (cached for future use)
A .md file with formatted output for easy reading
Automatically updates attachments/INDEX.md to mark the image as processed and include summary/details

Step 4: Read the Results

# Read the markdown output
cat attachments/photo.md

# Or read the JSON for structured data
cat attachments/photo.json

Processing Methods

Currently supported:

low-cost-vlm (default): Uses HuggingFace Qwen3-VL-8B-Instruct model
- Automatically downsizes images >1024px to reduce API costs
- Supports summary and detailed descriptions
- Makes a single API call when both summary and details are requested

Method: low-cost-vlm

This method uses HuggingFace's inference API with the Qwen vision-language model.

Usage:

python -m src.image_handler attachments/photo.jpg --method low-cost-vlm

Output:

Summary: One-sentence description
Details: Comprehensive description including elements, text, colors, composition, context

INDEX.md Reference

The attachments/INDEX.md file serves as the single source of truth for all attachments:

Every file in attachments/ should have an entry in INDEX.md
Processing status is tracked - entries show ✓ (processed) or ✗ (not processed)
Processed images include summary and detailed descriptions in the index
The index is automatically maintained when:
- Attachments are downloaded from email
- Images are processed via python -m src.image_handler

Example INDEX.md entry:

### photo.jpg
- **File**: `2026-01-03-photo.jpg`
- **Type**: image/jpeg
- **Size**: 822898 bytes
- **Path**: `attachments/2026-01-03-photo.jpg`
- **Checksum**: `29314559e912ab1ef6889fb19e3563738b4475d4d81db70da4301144f68280cc`
- **Processed**: ✓
- **Summary**: A humorous cat-themed mousepad with the slogan "I WORK HARD SO MY CAT CAN HAVE NICE THINGS"
- **Details**: The image displays a close-up, slightly angled overhead view...

When processing images manually, ensure INDEX.md is updated to reflect the processing status.

Processing Options

# Standard processing (summary only)
python -m src.image_handler attachments/photo.jpg

# Summary only (explicit)
python -m src.image_handler attachments/photo.jpg --summary-only

# Detailed description only
python -m src.image_handler attachments/photo.jpg --details-only

# Both summary and details (efficient: single API call)
python -m src.image_handler attachments/photo.jpg --summary --details

# Force re-processing (ignore cache)
python -m src.image_handler attachments/photo.jpg --force

Output Format

The generated markdown includes:

---
source: original-filename.jpg
method: low-cost-vlm
processed: low-cost-vlm
---

## Summary

[One-sentence description of the image]

## Detailed Description

[Comprehensive description including all visible elements, text, colors, composition, and context]

## Metadata

- **Dimensions**: 1920x1080
- **Size**: 245678 bytes
- **Format**: .jpg

Image Filtering

Images are automatically filtered during email processing:

Size filter: Images < 5KB are ignored (likely signatures/logos)
Pattern filter: Files with names like "image001", "logo", "icon", "signature" are ignored
Checksum: Duplicate images (same binary content) are not stored twice

Duplicate Prevention

The system uses SHA256 checksums to prevent storing duplicate images:

If an image with the same checksum already exists, it's not downloaded again
The existing image reference is used instead
This prevents repository bloat from forwarded emails with the same images

When to Process Images

Process images when:

User asks you to describe or analyze an image
Image contains text you need to read (screenshots, documents)
Image shows diagrams, charts, or visual data
User wants a summary of what's in an image
Image is part of a larger analysis task

Note: Images are not processed automatically to save API costs. Only process when needed.

Cost Considerations

VLM processing uses HuggingFace API
Images are automatically downsized to max 1024px to reduce costs
When both summary and details are requested, a single API call is made (more efficient)
Results are cached - re-processing uses cache unless --force is used
Only process images when explicitly needed or requested

Limitations

Large images are automatically resized (may lose some detail)
Animated GIFs are processed as static frames
Very complex images may have incomplete descriptions
Text in images may not be perfectly extracted (depends on VLM quality)

Notes

Processing is cached: first run creates .json and .md, subsequent reads use cache
Use --force if you want to re-process with updated models
Images are stored in attachments/ alongside documents
Checksumming prevents duplicate storage automatically
When both summary and details are requested, the tool makes a single efficient API call

image-processing

Install Skill

SKILL.md