name	vision
description	Analyzes and processes images using Claude's vision capabilities. Supports OCR, image classification, diagram comparison, chart analysis, visual Q&A, and more. Use when users need to understand, extract, or analyze visual content.
license	Apache-2.0
allowed-tools	Read, Write, Bash, WebFetch, Task

Vision Image Processing Skill

Overview

This skill leverages Claude's multimodal vision capabilities to analyze, process, and extract insights from images. It supports a wide range of visual understanding tasks including optical character recognition (OCR), image classification, diagram analysis, chart interpretation, and visual comparison.

When to Use This Skill

Activate this skill when users need to:

Extract text from images, screenshots, or scanned documents (OCR)
Classify or categorize images based on visual content
Analyze charts, graphs, or data visualizations to extract insights
Compare multiple images (diagrams, screenshots, designs)
Describe or caption images in detail
Answer questions about visual content
Detect objects, people, or elements within images
Analyze UI/UX from screenshots or mockups
Read handwritten text or notes
Process receipts, invoices, or forms for data extraction

Core Capabilities

1. Optical Character Recognition (OCR)

Extract text from images with high accuracy:

Instructions:

Use the Read tool to load the image file
Analyze the image and extract all visible text
Preserve formatting, layout, and structure when possible
Handle multiple languages and fonts
Identify and extract text from challenging contexts (handwriting, artistic fonts, rotated text)

Output Format:

Provide extracted text in markdown format
Include confidence notes for challenging sections
Maintain document structure (headings, paragraphs, lists)

Example Use Cases:

Screenshot text extraction
Scanned document digitization
Receipt and invoice processing
Handwritten note transcription
Sign and label reading

2. Image Classification and Categorization

Identify and classify image content:

Instructions:

Analyze the overall subject and context
Identify primary objects, scenes, or themes
Provide classification labels with confidence levels
Detect style, mood, and artistic elements
Categorize by industry-relevant taxonomies when applicable

Output Format:

## Primary Classification
- Category: [main category]
- Confidence: [High/Medium/Low]

## Detected Elements
- Object 1: [description]
- Object 2: [description]
...

## Additional Attributes
- Style: [style description]
- Setting: [environment/context]
- Colors: [dominant colors]

3. Chart and Graph Analysis

Extract insights from data visualizations:

Instructions:

Identify chart type (bar, line, pie, scatter, etc.)
Extract data points, values, and trends
Read axes labels, legends, and annotations
Summarize key insights and patterns
Flag anomalies or notable data points

Output Format:

## Chart Analysis

**Type:** [Chart Type]

**Data Summary:**
[Extracted data in table or structured format]

**Key Insights:**
1. [Insight 1]
2. [Insight 2]
3. [Insight 3]

**Trends:**
- [Trend description]

**Notable Points:**
- [Anomalies or important observations]

4. Diagram and Visual Comparison

Compare multiple images or diagrams:

Instructions:

Load all images to be compared
Identify similarities and differences
Highlight structural, content, and style variations
Create side-by-side comparison tables
Note additions, deletions, or modifications

Output Format:

## Visual Comparison

**Image 1:** [description]
**Image 2:** [description]

### Similarities
- [Similarity 1]
- [Similarity 2]

### Differences
| Aspect | Image 1 | Image 2 |
|--------|---------|---------|
| [Aspect] | [Description] | [Description] |

### Overall Assessment
[Summary of comparison]

5. Detailed Image Description

Generate comprehensive image descriptions:

Instructions:

Describe the overall scene or subject
Identify and describe all visible elements
Note spatial relationships and composition
Describe colors, lighting, and atmosphere
Mention text, logos, or symbols if present
Consider accessibility (generate alt-text compatible descriptions)

Output Format:

Natural language description (paragraph form)
Structured element list (bulleted)
Technical details (dimensions, format, quality notes)

6. Visual Question Answering

Answer specific questions about image content:

Instructions:

Carefully read the user's question
Examine the relevant areas of the image
Provide accurate, specific answers
Reference visual evidence when answering
Acknowledge uncertainty if details are unclear

Best Practices:

Be precise and factual
Avoid assumptions beyond what's visible
Describe what you see, not what you infer (unless asked)
Use spatial language (top-left, center, background, etc.)

7. UI/UX and Design Analysis

Analyze user interfaces and design elements:

Instructions:

Identify UI components (buttons, forms, navigation)
Assess layout and visual hierarchy
Note design patterns and conventions
Evaluate accessibility considerations
Compare against design best practices
Extract color schemes and typography

Output Format:

## UI/UX Analysis

**Component Inventory:**
- [List of UI elements]

**Layout Assessment:**
- [Layout description and grid analysis]

**Design Patterns:**
- [Identified patterns]

**Accessibility Notes:**
- [Contrast, readability, touch targets]

**Recommendations:**
- [Improvement suggestions]

8. Document and Form Processing

Extract structured data from forms, receipts, and documents:

Instructions:

Identify document type and structure
Extract field names and values
Organize data into structured format (JSON, CSV, tables)
Handle multi-column layouts
Preserve data relationships and hierarchies

Output Format:

{
  "document_type": "invoice",
  "fields": {
    "invoice_number": "value",
    "date": "value",
    "total": "value"
  },
  "line_items": [...]
}

Workflow and Best Practices

Standard Vision Processing Workflow

Load the Image(s)
- Use the Read tool to access image files
- Support formats: PNG, JPG, JPEG, GIF, WebP, PDF (single page)
Understand the Request
- Identify the specific task (OCR, classification, analysis, etc.)
- Note any special requirements or focus areas
Analyze the Visual Content
- Apply Claude's vision capabilities to examine the image
- Extract relevant information based on the task
Structure the Output
- Format results according to the task type
- Use markdown for readability
- Include confidence indicators where appropriate
Validate and Refine
- Check for completeness
- Verify accuracy of extracted data
- Provide follow-up options if needed

Quality Guidelines

Accuracy First: Prioritize correct information over comprehensive coverage
Structured Output: Use consistent formatting for similar tasks
Confidence Indicators: Note when details are unclear or ambiguous
Context Awareness: Consider the user's domain and use case
Accessibility: Generate descriptions suitable for screen readers when appropriate

Limitations and Considerations

Image Quality: Low resolution or blurry images may reduce accuracy
Supported Formats: Primarily raster images; vector graphics may need conversion
Privacy: Be cautious with sensitive information (PII, credentials, etc.)
Complex Diagrams: Highly technical diagrams may require domain expertise clarification
Real-Time Data: Cannot access live data or external resources not in the image

Advanced Features

Batch Processing

For multiple images:

Processing images in batch:
1. [Image1.png] - [Task result]
2. [Image2.png] - [Task result]
3. [Image3.png] - [Task result]

Summary: [Overall findings]

Multi-Modal Context

Combine visual analysis with code, documents, or data:

Cross-reference image content with codebase files
Validate design implementations against mockups
Extract data and populate code templates
Generate code from UI screenshots

Helper Script Integration

For advanced processing, the skill includes a Python helper script at scripts/image_processor.py:

Image format conversion
Metadata extraction
Batch file operations
Image preprocessing (resize, enhance, crop)

Usage:

python .claude/skills/vision/scripts/image_processor.py --operation [convert|metadata|batch] --input [path]

Examples

Example 1: OCR from Screenshot

User Request: "Extract the text from this screenshot"

Process:

Load image with Read tool
Identify all text regions
Extract and format text
Preserve structure

Output:

## Extracted Text

### Heading
Main paragraph text goes here...

### List Items
- Item 1
- Item 2
- Item 3

---
Note: Text extracted with high confidence. Minor formatting adjustments made for readability.

Example 2: Chart Analysis

User Request: "Analyze this sales chart and tell me the key trends"

Process:

Load chart image
Identify chart type and data
Extract values and labels
Analyze trends

Output:

## Sales Chart Analysis

**Chart Type:** Line graph with 3 series (2023-2025)

**Data Points:**
| Month | 2023 | 2024 | 2025 |
|-------|------|------|------|
| Jan   | 120K | 145K | 180K |
| Feb   | 135K | 150K | 195K |
...

**Key Insights:**
1. Consistent year-over-year growth of ~20%
2. Seasonal peak in Q4 across all years
3. 2025 showing accelerated growth trend

**Recommendation:** Prepare for continued high demand in Q4 2025

Example 3: UI Component Comparison

User Request: "Compare these two design mockups and highlight the differences"

Process:

Load both images
Identify UI components in each
Compare layouts and elements
Document changes

Output:

## Design Mockup Comparison

### Added Elements (Mockup 2)
- Search bar in header
- User avatar in top-right corner
- Breadcrumb navigation

### Removed Elements
- Sidebar menu (replaced with top navigation)

### Modified Elements
| Component | Mockup 1 | Mockup 2 |
|-----------|----------|----------|
| Primary Button | Blue (#007AFF) | Green (#34C759) |
| Typography | Roboto 16px | SF Pro 15px |
| Card Spacing | 16px | 24px |

### Layout Changes
- Shifted from sidebar to top navigation
- Increased content area width by ~30%
- Changed from 3-column to 4-column grid

Integration with Claude Code

This skill works seamlessly with other Claude Code features:

Read Tool: Load images from the filesystem
Write Tool: Save processed results or extracted data
Bash Tool: Run helper scripts for preprocessing
Task Tool: Coordinate complex multi-image workflows

Quick Reference

Task	Command Pattern	Output Type
OCR	"Extract text from [image]"	Markdown text
Classification	"Classify this image"	Category labels
Chart Analysis	"Analyze this chart"	Data + insights
Comparison	"Compare [img1] and [img2]"	Diff table
Description	"Describe this image"	Paragraph
Q&A	"What [question] in this image?"	Answer
UI Analysis	"Analyze this UI screenshot"	Component breakdown

Tips for Best Results

Provide Context: Mention the domain or purpose (e.g., "medical diagram," "e-commerce UI")
Be Specific: Request specific information rather than general analysis
Multiple Angles: For complex images, ask follow-up questions
File Paths: Use absolute or relative paths correctly
Batch Operations: Process multiple similar images together for consistency

Support and Troubleshooting

Common Issues:

"Cannot read image" → Verify file path and format
"Low confidence extraction" → Image may be too low resolution
"Unable to detect chart data" → Chart may be too complex or stylized

Getting Better Results:

Use high-resolution images (300+ DPI for documents)
Ensure good contrast and lighting
Crop images to focus on relevant areas
Provide context about the image content

License

This skill is licensed under Apache-2.0.

Version

Version: 1.0.0 Last Updated: 2025-11-18 Compatible with: Claude Code (all versions with vision support)

vision

Install Skill

SKILL.md

Vision Image Processing Skill

Overview

When to Use This Skill

Core Capabilities

1. Optical Character Recognition (OCR)

2. Image Classification and Categorization

3. Chart and Graph Analysis

4. Diagram and Visual Comparison

5. Detailed Image Description

6. Visual Question Answering

7. UI/UX and Design Analysis

8. Document and Form Processing

Workflow and Best Practices

Standard Vision Processing Workflow

Quality Guidelines

Limitations and Considerations

Advanced Features

Batch Processing

Multi-Modal Context

Helper Script Integration

Examples

Example 1: OCR from Screenshot

Example 2: Chart Analysis

Example 3: UI Component Comparison

Integration with Claude Code

Quick Reference

Tips for Best Results

Support and Troubleshooting

License

Version