| name | image-processing |
| description | Extract summaries and descriptions from images using vision-language models. Use when working with image attachments, analyzing photos, screenshots, diagrams, or when asked to describe or understand image content. |
Image Processing
This skill covers how to process and analyze image attachments in advisory work.
Supported Formats
| Extension | Type | Best For |
|---|---|---|
.jpg, .jpeg |
JPEG | Photos, screenshots |
.png |
PNG | Screenshots, diagrams, transparent images |
.gif |
GIF | Animated images, simple graphics |
.bmp |
BMP | Windows bitmap images |
.webp |
WebP | Modern web images |
.tiff, .tif |
TIFF | High-quality images, scans |
Image Storage
Images from email attachments are:
- Saved to
attachments/directory (same as documents) - Filtered by minimum size (5KB) to exclude logos/signatures
- Checksummed to prevent duplicate storage
- Automatically indexed in
attachments/INDEX.mdwith basic metadata - Not processed by default - you must explicitly process them
INDEX.md Structure
The attachments/INDEX.md file is the authoritative reference for all attachments in the repository. It:
- Lists every file in the
attachments/directory - Shows processing status (✓ for processed, ✗ for not processed)
- Includes summaries and details for processed images
- Is automatically updated when:
- New attachments are downloaded (via
update_attachments_index()) - Images are processed (via
update_image_processing_status())
- New attachments are downloaded (via
Important: The INDEX.md file should always faithfully reflect all files in the attachments folder. If you manually add files or process images, ensure INDEX.md is updated accordingly.
WORKFLOW: Processing Images
When you need to analyze an image:
Step 1: Check Available Images
Look for images in the attachments directory:
ls -la attachments/
cat attachments/INDEX.md
Step 2: Check if Already Processed
Look for a .json or .md file alongside the image:
ls -la attachments/
# If photo.jpg exists, check for photo.json or photo.md
Step 3: Process the Image
If not processed, use the image handler:
# Generate summary only (default)
python -m src.image_handler attachments/photo.jpg
# Generate detailed description
python -m src.image_handler attachments/photo.jpg --details
# Generate both summary and details (single API call)
python -m src.image_handler attachments/photo.jpg --summary --details
# Only detailed description (no summary)
python -m src.image_handler attachments/photo.jpg --details-only
This creates:
- A
.jsonfile with structured results (cached for future use) - A
.mdfile with formatted output for easy reading - Automatically updates
attachments/INDEX.mdto mark the image as processed and include summary/details
Step 4: Read the Results
# Read the markdown output
cat attachments/photo.md
# Or read the JSON for structured data
cat attachments/photo.json
Processing Methods
Currently supported:
low-cost-vlm(default): Uses HuggingFace Qwen3-VL-8B-Instruct model- Automatically downsizes images >1024px to reduce API costs
- Supports summary and detailed descriptions
- Makes a single API call when both summary and details are requested
Method: low-cost-vlm
This method uses HuggingFace's inference API with the Qwen vision-language model.
Usage:
python -m src.image_handler attachments/photo.jpg --method low-cost-vlm
Output:
- Summary: One-sentence description
- Details: Comprehensive description including elements, text, colors, composition, context
INDEX.md Reference
The attachments/INDEX.md file serves as the single source of truth for all attachments:
- Every file in
attachments/should have an entry in INDEX.md - Processing status is tracked - entries show ✓ (processed) or ✗ (not processed)
- Processed images include summary and detailed descriptions in the index
- The index is automatically maintained when:
- Attachments are downloaded from email
- Images are processed via
python -m src.image_handler
Example INDEX.md entry:
### photo.jpg
- **File**: `2026-01-03-photo.jpg`
- **Type**: image/jpeg
- **Size**: 822898 bytes
- **Path**: `attachments/2026-01-03-photo.jpg`
- **Checksum**: `29314559e912ab1ef6889fb19e3563738b4475d4d81db70da4301144f68280cc`
- **Processed**: ✓
- **Summary**: A humorous cat-themed mousepad with the slogan "I WORK HARD SO MY CAT CAN HAVE NICE THINGS"
- **Details**: The image displays a close-up, slightly angled overhead view...
When processing images manually, ensure INDEX.md is updated to reflect the processing status.
Processing Options
# Standard processing (summary only)
python -m src.image_handler attachments/photo.jpg
# Summary only (explicit)
python -m src.image_handler attachments/photo.jpg --summary-only
# Detailed description only
python -m src.image_handler attachments/photo.jpg --details-only
# Both summary and details (efficient: single API call)
python -m src.image_handler attachments/photo.jpg --summary --details
# Force re-processing (ignore cache)
python -m src.image_handler attachments/photo.jpg --force
Output Format
The generated markdown includes:
---
source: original-filename.jpg
method: low-cost-vlm
processed: low-cost-vlm
---
## Summary
[One-sentence description of the image]
## Detailed Description
[Comprehensive description including all visible elements, text, colors, composition, and context]
## Metadata
- **Dimensions**: 1920x1080
- **Size**: 245678 bytes
- **Format**: .jpg
Image Filtering
Images are automatically filtered during email processing:
- Size filter: Images < 5KB are ignored (likely signatures/logos)
- Pattern filter: Files with names like "image001", "logo", "icon", "signature" are ignored
- Checksum: Duplicate images (same binary content) are not stored twice
Duplicate Prevention
The system uses SHA256 checksums to prevent storing duplicate images:
- If an image with the same checksum already exists, it's not downloaded again
- The existing image reference is used instead
- This prevents repository bloat from forwarded emails with the same images
When to Process Images
Process images when:
- User asks you to describe or analyze an image
- Image contains text you need to read (screenshots, documents)
- Image shows diagrams, charts, or visual data
- User wants a summary of what's in an image
- Image is part of a larger analysis task
Note: Images are not processed automatically to save API costs. Only process when needed.
Cost Considerations
- VLM processing uses HuggingFace API
- Images are automatically downsized to max 1024px to reduce costs
- When both summary and details are requested, a single API call is made (more efficient)
- Results are cached - re-processing uses cache unless
--forceis used - Only process images when explicitly needed or requested
Limitations
- Large images are automatically resized (may lose some detail)
- Animated GIFs are processed as static frames
- Very complex images may have incomplete descriptions
- Text in images may not be perfectly extracted (depends on VLM quality)
Notes
- Processing is cached: first run creates
.jsonand.md, subsequent reads use cache - Use
--forceif you want to re-process with updated models - Images are stored in
attachments/alongside documents - Checksumming prevents duplicate storage automatically
- When both summary and details are requested, the tool makes a single efficient API call