name	ocr-super-surya
description	GPU-optimized OCR using Surya. Use when: (1) Extracting text from images/screenshots, (2) Processing PDFs with embedded images, (3) Multi-language document OCR, (4) Layout analysis and table detection. Supports 90+ languages with 2x accuracy over Tesseract.
license	CC BY-NC 4.0

OCR Super Surya

GPU-optimized OCR skill using Surya - a modern, high-accuracy OCR engine.

When to Use

Extracting text from screenshots, photos, or scanned images
Processing PDFs with embedded images
Multi-language document OCR (90+ languages including Japanese)
Layout analysis and table detection
When GPU acceleration is available and desired

Key Features

Feature	Description
Accuracy	2x better than Tesseract (0.97 vs 0.88 similarity)
GPU Support	PyTorch-based, CUDA optimized
Languages	90+ languages including CJK
Layout	Document layout analysis, table recognition
LaTeX	Inline math equation recognition

Quick Start

Installation

# Core OCR
pip install surya-ocr

# For PDF processing (optional)
pip install pdf2image
# Windows: Install Poppler from https://github.com/oschwartz10612/poppler-windows/releases
# macOS: brew install poppler
# Linux: sudo apt install poppler-utils

Basic Usage

from PIL import Image
from surya.recognition import RecognitionPredictor
from surya.detection import DetectionPredictor
from surya.foundation import FoundationPredictor

# Load image
image = Image.open("document.png")

# Initialize predictors (auto-detects GPU)
foundation_predictor = FoundationPredictor()
recognition_predictor = RecognitionPredictor(foundation_predictor)
detection_predictor = DetectionPredictor()

# Run OCR
predictions = recognition_predictor([image], det_predictor=detection_predictor)

# Get text
for page in predictions:
    for line in page.text_lines:
        print(line.text)

CLI Usage

# OCR single image
surya_ocr image.png

# OCR with output to JSON
surya_ocr image.png --output_dir ./results

# Launch GUI (requires streamlit)
pip install streamlit
surya_gui

Helper Script CLI

# Basic usage
python scripts/ocr_helper.py image.png

# With verbose logging
python scripts/ocr_helper.py image.png -v

# Specify languages and output file
python scripts/ocr_helper.py document.pdf -l ja en -o result.txt

# Disable OOM auto-retry
python scripts/ocr_helper.py large_image.png --no-retry

GPU Configuration

Surya auto-detects GPU. Adjust VRAM usage with environment variables:

Variable	Default	Description
`RECOGNITION_BATCH_SIZE`	512	Reduce for lower VRAM (e.g., 256 for 12GB)
`DETECTOR_BATCH_SIZE`	36	Reduce if OOM errors occur

# Linux/macOS
export RECOGNITION_BATCH_SIZE=256
export DETECTOR_BATCH_SIZE=16
surya_ocr image.png

# Windows PowerShell
$env:RECOGNITION_BATCH_SIZE = 256
$env:DETECTOR_BATCH_SIZE = 16
surya_ocr image.png

OOM Auto-Retry

The helper script automatically retries with reduced batch size on GPU OOM:

# Auto-retry enabled by default
text = ocr_image("large_image.png")  # Retries up to 3x

# Disable if you want manual control
text = ocr_image("large_image.png", auto_retry=False)

Use Cases

Use Case	Command / Function
Screenshot OCR	`python scripts/ocr_helper.py screenshot.png`
PDF Processing	`ocr_pdf("document.pdf")` → returns list of page texts
Batch Processing	`ocr_batch(["img1.png", "img2.png"])` → returns dict
Japanese/CJK	Auto-detected, no config needed

Scripts

Script	Description
`scripts/ocr_helper.py`	Helper functions with OOM auto-retry, verbose logging, batch support

Helper Script Features

Feature	Description
`verbose`	Enable detailed logging (`-v` in CLI)
`auto_retry`	Automatically reduce batch size on OOM (default: on)
`ocr_image()`	Single image OCR
`ocr_pdf()`	PDF OCR (all pages)
`ocr_batch()`	Batch OCR for multiple images
`set_verbose()`	Enable/disable logging programmatically

Troubleshooting

CUDA Out of Memory

Reduce batch sizes:

export RECOGNITION_BATCH_SIZE=128
export DETECTOR_BATCH_SIZE=8

CPU Fallback

If no GPU available, Surya automatically falls back to CPU (slower but works).

PDF Processing on Windows

If pdf2image fails, install Poppler:

Download from https://github.com/oschwartz10612/poppler-windows/releases
Extract to C:\Program Files\poppler
Add C:\Program Files\poppler\Library\bin to PATH

Model Download

First run downloads models (~2GB). Ensure internet connection.

References

Surya GitHub - Official repository
Surya Documentation - Usage guide
Benchmark Results - Accuracy comparisons

License Notice

This skill: CC BY-NC 4.0 (wrapper scripts only)

Surya (underlying OCR engine):

Code: GPL-3.0
Models: Free for research, personal use, and startups under $2M funding/revenue
Commercial use beyond $2M: See Surya Pricing

ocr-super-surya

Install Skill

SKILL.md