| name | ocr-super-surya |
| description | GPU-optimized OCR using Surya. Use when: (1) Extracting text from images/screenshots, (2) Processing PDFs with embedded images, (3) Multi-language document OCR, (4) Layout analysis and table detection. Supports 90+ languages with 2x accuracy over Tesseract. |
| license | CC BY-NC 4.0 |
OCR Super Surya
GPU-optimized OCR skill using Surya - a modern, high-accuracy OCR engine.
When to Use
- Extracting text from screenshots, photos, or scanned images
- Processing PDFs with embedded images
- Multi-language document OCR (90+ languages including Japanese)
- Layout analysis and table detection
- When GPU acceleration is available and desired
Key Features
| Feature | Description |
|---|---|
| Accuracy | 2x better than Tesseract (0.97 vs 0.88 similarity) |
| GPU Support | PyTorch-based, CUDA optimized |
| Languages | 90+ languages including CJK |
| Layout | Document layout analysis, table recognition |
| LaTeX | Inline math equation recognition |
Quick Start
Installation
# Core OCR
pip install surya-ocr
# For PDF processing (optional)
pip install pdf2image
# Windows: Install Poppler from https://github.com/oschwartz10612/poppler-windows/releases
# macOS: brew install poppler
# Linux: sudo apt install poppler-utils
Basic Usage
from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor
# Load image
image = Image.open("document.png")
# Initialize predictors (auto-detects GPU)
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()
# Run OCR
predictions = recognition_predictor([image], det_predictor=detection_predictor)
# Get text
for page in predictions:
for line in page.text_lines:
print(line.text)
CLI Usage
# OCR single image
surya_ocr image.png
# OCR with output to JSON
surya_ocr image.png --output_dir ./results
# Launch GUI (requires streamlit)
pip install streamlit
surya_gui
Helper Script CLI
# Basic usage
python scripts/ocr_helper.py image.png
# With verbose logging
python scripts/ocr_helper.py image.png -v
# Specify languages and output file
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt
# Disable OOM auto-retry
python scripts/ocr_helper.py large_image.png --no-retry
GPU Configuration
Surya auto-detects GPU. Adjust VRAM usage with environment variables:
| Variable | Default | Description |
|---|---|---|
RECOGNITION_BATCH_SIZE |
512 | Reduce for lower VRAM (e.g., 256 for 12GB) |
DETECTOR_BATCH_SIZE |
36 | Reduce if OOM errors occur |
# Linux/macOS
export RECOGNITION_BATCH_SIZE=256
export DETECTOR_BATCH_SIZE=16
surya_ocr image.png
# Windows PowerShell
$env:RECOGNITION_BATCH_SIZE = 256
$env:DETECTOR_BATCH_SIZE = 16
surya_ocr image.png
OOM Auto-Retry
The helper script automatically retries with reduced batch size on GPU OOM:
# Auto-retry enabled by default
text = ocr_image("large_image.png") # Retries up to 3x
# Disable if you want manual control
text = ocr_image("large_image.png", auto_retry=False)
Use Cases
| Use Case | Command / Function |
|---|---|
| Screenshot OCR | python scripts/ocr_helper.py screenshot.png |
| PDF Processing | ocr_pdf("document.pdf") → returns list of page texts |
| Batch Processing | ocr_batch(["img1.png", "img2.png"]) → returns dict |
| Japanese/CJK | Auto-detected, no config needed |
Scripts
| Script | Description |
|---|---|
scripts/ocr_helper.py |
Helper functions with OOM auto-retry, verbose logging, batch support |
Helper Script Features
| Feature | Description |
|---|---|
verbose |
Enable detailed logging (-v in CLI) |
auto_retry |
Automatically reduce batch size on OOM (default: on) |
ocr_image() |
Single image OCR |
ocr_pdf() |
PDF OCR (all pages) |
ocr_batch() |
Batch OCR for multiple images |
set_verbose() |
Enable/disable logging programmatically |
Troubleshooting
CUDA Out of Memory
Reduce batch sizes:
export RECOGNITION_BATCH_SIZE=128
export DETECTOR_BATCH_SIZE=8
CPU Fallback
If no GPU available, Surya automatically falls back to CPU (slower but works).
PDF Processing on Windows
If pdf2image fails, install Poppler:
- Download from https://github.com/oschwartz10612/poppler-windows/releases
- Extract to
C:\Program Files\poppler - Add
C:\Program Files\poppler\Library\binto PATH
Model Download
First run downloads models (~2GB). Ensure internet connection.
References
- Surya GitHub - Official repository
- Surya Documentation - Usage guide
- Benchmark Results - Accuracy comparisons
License Notice
This skill: CC BY-NC 4.0 (wrapper scripts only)
Surya (underlying OCR engine):
- Code: GPL-3.0
- Models: Free for research, personal use, and startups under $2M funding/revenue
- Commercial use beyond $2M: See Surya Pricing