| name | smart-screenshot |
| description | Intelligent screenshot and screen capture with OCR, markdown conversion, and annotation. Triggered by PrtSc key, captures screen regions, extracts text with OCR, converts to markdown using MarkItDown, saves with auto-formatting. Similar to Windows Snipping Tool with AI enhancements for text extraction and document processing. |
Smart Screenshot
Intelligent screen capture with OCR, markdown conversion, and smart formatting. Capture screen regions, extract text, convert images/PDFs to markdown, and save with automated formatting.
Quick Start
Trigger methods:
- Keyboard shortcut: Press
PrtSc(customizable) - Command line:
python scripts/capture.py - Claude Code: Ask Claude to "take a screenshot"
Workflow:
- Press PrtSc → Capture mode activates
- Choose: Image or Text
- Select region/window
- If Image: Save with annotation options
- If Text: OCR → MarkItDown → Save markdown
Prerequisites
System Requirements
- Windows 10/11, macOS 10.14+, or Linux
- Python 3.8+
- Screen with display access
Install Dependencies
Core (required):
# Screenshot and OCR
pip install pillow pyautogui mss pytesseract pyscreenshot --break-system-packages
# MarkItDown (Microsoft's converter)
pip install markitdown --break-system-packages
# Keyboard hooks
pip install keyboard pynput --break-system-packages
# GUI for dialogs
pip install tkinter --break-system-packages # May be pre-installed
OCR engine (Tesseract):
Windows:
# Download installer from:
# https://github.com/UB-Mannheim/tesseract/wiki
# Install to: C:\Program Files\Tesseract-OCR\
# Add to PATH
macOS:
brew install tesseract
Linux:
sudo apt-get install tesseract-ocr
# or
sudo dnf install tesseract
Optional enhancements:
# Better OCR (EasyOCR - slower but more accurate)
pip install easyocr --break-system-packages
# PDF handling
pip install pdf2image pypdf2 --break-system-packages
# Image enhancement
pip install opencv-python --break-system-packages
# Clipboard integration
pip install pyperclip --break-system-packages
See reference/setup-guide.md for detailed installation.
Features
Capture Modes
1. Region Selection
- Click and drag to select area
- Real-time preview
- Pixel-perfect selection
2. Window Capture
- Automatically detect windows
- Capture specific application
- Includes/excludes borders
3. Full Screen
- Entire display
- Multi-monitor support
- All screens at once
4. Scrolling Capture
- Capture long web pages
- Auto-scroll and stitch
- Perfect for documentation
Text Extraction
OCR Engines:
- Tesseract - Fast, free, 100+ languages
- EasyOCR - Slower, more accurate
- Cloud OCR - Azure/Google (highest accuracy)
Smart text processing:
- Automatic language detection
- Text cleanup and formatting
- Table recognition
- Layout preservation
Markdown Conversion
Using MarkItDown (Microsoft):
- Images → Markdown with alt text
- PDFs → Clean markdown
- Screenshots → Formatted text
- Tables → Markdown tables
- Code blocks → Syntax highlighting
Conversion features:
- Smart heading detection
- List preservation
- Link extraction
- Code formatting
- Table structure recognition
Core Operations
Quick Capture
Keyboard shortcut:
# Run as background service
python scripts/screenshot_service.py
# Now press PrtSc anytime:
# 1. Screen freezes
# 2. Choose "Image" or "Text"
# 3. Select region
# 4. Auto-process and save
Command line:
# Capture with UI
python scripts/capture.py
# Capture full screen immediately
python scripts/capture.py --fullscreen --output screenshot.png
# Capture region with coordinates
python scripts/capture.py --region 100,100,800,600 --output region.png
Text Mode (OCR → Markdown)
Interactive:
# Start capture
python scripts/capture.py --mode text
# Process:
# 1. Select region
# 2. OCR extracts text
# 3. MarkItDown formats
# 4. Save dialog opens
# 5. Save as .md file
Automatic:
# Capture and OCR
python scripts/capture_text.py --output extracted.md
# With specific language
python scripts/capture_text.py --lang eng+fra --output text.md
# With enhancement
python scripts/capture_text.py --enhance --output clean.md
Image Mode
Interactive:
# Start capture
python scripts/capture.py --mode image
# Process:
# 1. Select region
# 2. Annotation tools appear
# 3. Add arrows, boxes, text
# 4. Save dialog opens
With annotations:
# Capture and annotate
python scripts/capture_annotate.py --output annotated.png
# Annotation tools:
# - Arrow
# - Rectangle
# - Circle
# - Text
# - Highlight
# - Blur (redact sensitive info)
PDF to Markdown
Convert PDF to markdown:
# Using MarkItDown
python scripts/pdf_to_markdown.py --input document.pdf --output document.md
# With OCR for scanned PDFs
python scripts/pdf_to_markdown.py --input scanned.pdf --ocr --output text.md
# Batch convert folder
python scripts/batch_pdf_convert.py --input ./pdfs/ --output ./markdown/
Screenshot from Image
Process existing image:
# Extract text to markdown
python scripts/image_to_markdown.py --input screenshot.png --output text.md
# Clean up image first
python scripts/enhance_and_extract.py --input noisy.png --output clean.md
Configuration
Settings file: config.yaml
# Keyboard shortcut
hotkey: "Print" # or "ctrl+shift+s", "cmd+shift+5", etc.
# Default capture mode
default_mode: "prompt" # "image", "text", or "prompt"
# OCR settings
ocr:
engine: "tesseract" # "tesseract", "easyocr", or "cloud"
language: "eng"
enhance: true # Pre-process image for better OCR
# Output settings
output:
directory: "~/Screenshots"
filename_pattern: "Screenshot-{date}-{time}"
auto_save: false # true = skip save dialog
clipboard: true # Copy to clipboard
# Markdown settings
markdown:
format_code_blocks: true
detect_tables: true
preserve_formatting: true
# Annotation defaults
annotation:
arrow_color: "#FF0000"
box_color: "#0000FF"
text_color: "#000000"
text_size: 12
line_width: 2
Common Workflows
Workflow 1: Code Documentation
Scenario: Capture code from screen → Markdown documentation
# 1. Run screenshot service
python scripts/screenshot_service.py &
# 2. Press PrtSc on your keyboard
# 3. Select "Text" mode
# 4. Select code region on screen
# 5. OCR extracts code
# 6. MarkItDown formats as code block:
```python
def example_function():
return "formatted code"
7. Save dialog opens → Save as code-snippet.md
### Workflow 2: Meeting Notes from Slides
**Scenario:** Capture presentation slides → Formatted notes
```bash
# Capture multiple slides
python scripts/capture_sequence.py \
--count 5 \
--delay 3 \
--mode text \
--output slides.md
# Result: All slides as markdown in one file
Workflow 3: Email/Document Processing
Scenario: Screenshot email → Extract and format text
# Capture email
python scripts/capture.py --mode text --enhance
# Text extracted, formatted, and saved
# Perfect for archiving or processing
Workflow 4: Research Paper Annotation
Scenario: Screenshot paper → Annotate → Save
# Capture and annotate
python scripts/capture_annotate.py --output paper-notes.png
# Add arrows, highlights, notes
# Save annotated version
Workflow 5: Batch PDF Conversion
Scenario: Convert all PDFs to markdown
# Convert folder of PDFs
python scripts/batch_pdf_convert.py \
--input ~/Documents/PDFs/ \
--output ~/Documents/Markdown/ \
--ocr # Enable OCR for scanned docs
# Progress shown for each file
# All PDFs → Clean markdown
MarkItDown Features
Microsoft's MarkItDown converts:
Images:
- Screenshots → Extracted text
- Diagrams → Alt text descriptions
- Charts → Data tables
PDFs:
- Native PDFs → Clean markdown
- Scanned PDFs → OCR + markdown
- Preserve structure and formatting
Documents:
- Word docs → Markdown
- PowerPoint → Slide content
- Excel → Markdown tables
Code:
- Syntax highlighted code blocks
- Language detection
- Proper indentation
Tables:
- Visual tables → Markdown tables
- Preserved alignment
- Header detection
Keyboard Shortcuts
During capture:
Esc- Cancel captureSpace- Toggle crosshair/selectionEnter- Confirm selectionCtrl+Z- Undo annotationCtrl+C- Copy to clipboardCtrl+S- Save
Annotation mode:
A- Arrow toolR- Rectangle toolC- Circle toolT- Text toolH- Highlight toolB- Blur toolDelete- Remove last annotation
OCR Accuracy Tips
Better results:
- Enhance image first - Increase contrast, denoise
- Correct language - Specify language(s)
- Proper DPI - Higher resolution = better OCR
- Clean background - Remove clutter
- Good lighting - For camera captures
- Straight text - Rotate if needed
Pre-processing:
# Enhance before OCR
python scripts/enhance_image.py \
--input screenshot.png \
--output enhanced.png \
--operations "grayscale,contrast,denoise"
# Then OCR
python scripts/image_to_markdown.py --input enhanced.png
Multi-Monitor Support
Capture from specific monitor:
# List monitors
python scripts/list_monitors.py
# Capture from monitor 2
python scripts/capture.py --monitor 2
# Capture all monitors
python scripts/capture.py --all-monitors
Save Dialog Options
When save dialog appears:
- Filename: Auto-generated or custom
- Location: Last used or default directory
- Format: .md, .txt, .png, .jpg
- Options:
- Copy to clipboard
- Open in editor
- Share
Skip dialog (auto-save):
# config.yaml
output:
auto_save: true
directory: "~/Screenshots"
Integration with Clipboard
Copy to clipboard automatically:
# Text mode - copies markdown
python scripts/capture.py --mode text --clipboard
# Image mode - copies image
python scripts/capture.py --mode image --clipboard
# Both
python scripts/capture.py --clipboard --save
Paste from clipboard:
# Process clipboard image
python scripts/process_clipboard.py --output result.md
Running as Service
Windows
Install as Windows service:
# Install
python scripts/install_windows_service.py
# Service runs on startup
# PrtSc always available
Or use Task Scheduler:
1. Open Task Scheduler
2. Create Basic Task
3. Trigger: At log on
4. Action: Start program
5. Program: python
6. Arguments: path\to\screenshot_service.py
macOS
LaunchAgent setup:
# Install service
python scripts/install_macos_service.py
# Creates ~/Library/LaunchAgents/com.screenshot.service.plist
# Runs on login
Manual:
# Create LaunchAgent plist
# Load with launchctl
launchctl load ~/Library/LaunchAgents/com.screenshot.service.plist
Linux
Systemd service:
# Install
python scripts/install_linux_service.py
# Creates ~/.config/systemd/user/screenshot.service
# Enable and start:
systemctl --user enable screenshot
systemctl --user start screenshot
Or use autostart:
# Copy desktop entry
cp screenshot.desktop ~/.config/autostart/
Cloud OCR (Optional)
For highest accuracy:
Azure Computer Vision:
export AZURE_CV_KEY="your-key"
export AZURE_CV_ENDPOINT="https://your-region.api.cognitive.microsoft.com/"
python scripts/capture.py --mode text --ocr-engine azure
Google Vision:
export GOOGLE_APPLICATION_CREDENTIALS="path/to/credentials.json"
python scripts/capture.py --mode text --ocr-engine google
Costs:
- Azure: 1000 transactions/month free, then $1/1000
- Google: 1000 units/month free, then $1.50/1000
Scripts Reference
Capture:
capture.py- Main interactive capturecapture_text.py- Text mode onlycapture_annotate.py- Image with annotationscapture_sequence.py- Multiple captures
Service:
screenshot_service.py- Background serviceinstall_windows_service.py- Windows installerinstall_macos_service.py- macOS installerinstall_linux_service.py- Linux installer
Conversion:
image_to_markdown.py- Image → Markdownpdf_to_markdown.py- PDF → Markdownbatch_pdf_convert.py- Batch conversion
Processing:
enhance_image.py- Image enhancementprocess_clipboard.py- Clipboard processingextract_tables.py- Table extraction
Utilities:
list_monitors.py- List displaystest_ocr.py- Test OCR accuracyconfigure.py- Interactive config
Best Practices
- Run as service - Always available with hotkey
- Configure hotkey - Choose comfortable shortcut
- Enable clipboard - Quick copy-paste workflow
- Enhance first - Better OCR results
- Use appropriate OCR - Tesseract for speed, Cloud for accuracy
- Organize output - Set default directory
- Backup settings - Save config.yaml
- Test thoroughly - Verify OCR accuracy for your use case
Troubleshooting
"Tesseract not found"
# Install Tesseract
# Windows: Download installer
# macOS: brew install tesseract
# Linux: apt install tesseract-ocr
# Check installation
tesseract --version
"Permission denied" (screenshot)
Windows: Run as Administrator
macOS: System Preferences → Security → Privacy → Screen Recording
Linux: Check X11 permissions
"Keyboard hook failed"
# Requires administrator/root privileges
# Windows: Run as Administrator
# macOS: Grant Accessibility permissions
# Linux: Run with sudo or add user to input group
"Poor OCR quality"
# Enhance image first
python scripts/enhance_image.py --input screenshot.png
# Try different OCR engine
python scripts/capture.py --ocr-engine easyocr
# Specify language
python scripts/capture.py --lang eng+fra
"MarkItDown not working"
pip install --upgrade markitdown --break-system-packages
# Check version
python -c "import markitdown; print(markitdown.__version__)"
Platform-Specific Notes
Windows
- PrtSc key native support
- Windows Ink integration available
- OneDrive sync compatible
- Notification system integration
macOS
- cmd+shift+5 alternative
- Quick Look preview
- iCloud Drive sync
- Notification Center integration
Linux
- Wayland/X11 support
- Various hotkey daemons
- Desktop environment integration
- Screenshot directories vary
Integration Examples
See examples/ for complete workflows:
- examples/documentation-workflow.md - Code docs
- examples/research-notes.md - Paper processing
- examples/meeting-capture.md - Meeting slides
- examples/email-archival.md - Email processing
Reference Documentation
- reference/setup-guide.md - Complete setup
- reference/ocr-engines.md - OCR comparison
- reference/markitdown-guide.md - MarkItDown features
- reference/hotkey-config.md - Keyboard shortcuts
- reference/service-install.md - Service setup