| name | gastrohem-media-processor |
| description | Automatically process unprocessed audio and image files in Gastrohem daily WhatsApp folders. This skill should be used when the user asks to transcribe audio files, perform OCR on images, or process media in daily folders (e.g., "Process media in today's folder", "Transcribe audio and OCR images in 24.10 folder"). Handles audio transcription using insanely-fast-whisper (parallelized, creates .json) and image OCR using Claude's vision capabilities (creates natural .md summaries with Gastrohem-relevant info). |
Gastrohem Media Processor
Overview
Automatski procesira WhatsApp media fajlove (audio i slike) u Gastrohem dnevnim folderima.
Što radi:
- Transkribuje audio fajlove paralelno (3x brže) → kreira
.jsonfajlove - Identifikuje slike koje trebaju OCR → kreira
.mdfajlove sa prirodnim sažetkom - Default: Koristi današnji datum automatski
- Skenira SVE odjele odjednom za dati datum
Performance:
- Audio: Do 3 fajla istovremeno (paralelno)
- Slike: Batch OCR (sve odjednom sa Claude vision)
- Scan: Nalazi sve foldere za datum u <1 sekundi
Note: Skill kreira .json za audio i .md za slike - dodavanje u chat.md je odvojen korak.
When to Use This Skill
User says:
- "Process media"
- "Process today's media"
- "Transcribe audio files"
- "Process media for 24.10"
- "OCR images in today's folders"
Default behavior: Uses today's date, scans all departments automatically.
Workflow
Simple Usage
Process all media for today (DEFAULT):
python .claude/skills/gastrohem-media-processor/scripts/process_media.py
- Uses today's date automatically (26.10 danas)
- Scans ALL departments for folders matching today's date
- Transcribes audio in parallel
- Lists images needing OCR
Process specific date:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py --scan-date 24.10
Process specific folder:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py --folder "gastrohem whatsapp/administracija/20.10 - 27.10/24.10"
What Happens
Audio Processing:
- Finds all audio files (
.mp3,.ogg,.m4a,.wav,.opus) - Skips files that already have
.jsontranscriptions - Transcribes in parallel (3 concurrent processes)
- Creates JSON:
{audio_filename}.json{ "speakers": [], "chunks": [...], "text": "Full transcribed text here" }
Image Processing:
- Finds all images (
.png,.jpg,.jpeg,.webp,.bmp) - Checks which images DON'T have
.mdfiles - Returns list of images needing OCR
- Claude reads images in parallel and creates natural summaries with Gastrohem-relevant info
- Creates markdown:
{image_filename}.md# image.png **Poslao:** Mahir Kadic **Datum:** 26.10.2025 13:58 --- [Natural language summary focusing on Gastrohem-relevant information: contacts, names, emails, phone numbers, business details, etc.]
Script Reference
Three Separate Scripts
The skill now uses three modular scripts for better organization:
1. process_audio.py
Purpose: Audio transcription only (parallelized)
Usage:
python scripts/process_audio.py "path/to/folder" [--max-workers 3] [--output-json results.json]
Arguments:
folder- Path to folder containing audio files--max-workers N- Max parallel processes (default: 3)--no-skip-existing- Re-transcribe files with existing JSON--output-json FILE- Save results to JSON
What it does:
- Finds audio files (
.mp3,.ogg,.m4a,.wav,.opus) - Skips files with existing
.jsontranscriptions - Transcribes in parallel (max 3 concurrent)
- Creates JSON files:
{audio_filename}.json
Requirements: insanely-fast-whisper in PATH
2. process_images.py
Purpose: Image OCR helper functions
Usage:
# List images needing OCR
python scripts/process_images.py "path/to/folder" [--output-json images.json]
# Use in Python for batch processing
from process_images import save_ocr_md, batch_save_ocr, get_images_needing_ocr
Key functions:
get_images_needing_ocr(folder_path)- Returns list of images without.mdfilessave_ocr_md(image_file, summary, sender)- Save natural summary to.mdfilebatch_save_ocr(ocr_results)- Save multiple OCR results at once
Markdown structure:
# image.png
**Poslao:** Mahir Kadic
**Datum:** 26.10.2025 13:58
---
Natural language summary focusing on Gastrohem-relevant information:
contacts, names, emails, phone numbers, business details, etc.
3. process_media.py
Purpose: Master script combining audio + images
Usage:
# Scan all folders for a specific date
python scripts/process_media.py --scan-date DD.MM [--output-json results.json]
# Process a specific folder
python scripts/process_media.py --folder "path/to/folder" [--output-json results.json]
Arguments:
--scan-date DD.MM- Scan all departments for this date--folder PATH- Process specific folder--base-path PATH- Base path (default:gastrohem whatsapp)--no-skip-existing- Re-process all files--output-json FILE- Save results to JSON
What it does:
- Calls
process_audio.pyfor audio files - Calls
process_images.pyto find images needing OCR - Returns combined results
Best Practices
- Use
--scan-datefor daily processing - Automatically finds all folders for a specific date across all departments - Process audio in parallel - The script now handles up to 3 audio files simultaneously for faster processing
- Batch OCR images - Read all images in parallel using Claude's vision for maximum efficiency
- Save results to JSON - Use
--output-jsonto keep a record of processing results - Process regularly - Process media files daily to avoid backlog
- Separate chat.md updates - This skill creates JSON files; updating
chat.mdis a separate workflow
Error Handling
If transcription fails:
- Check that
insanely-fast-whisperis installed - Verify the audio file is not corrupted
- Check that the device (mps) is available
- Manually transcribe if necessary
If image OCR is unclear:
- Request higher quality image from sender
- Focus on extracting key information rather than perfect transcription
- Note any illegible portions in the context field
Example Workflows
User: "Process media"
Claude:
- Runs:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py - Script uses today's date (26.10), scans all departments
- Finds 3 folders: administracija/26.10, finansije/26.10, adis-chat/26.10
- Transcribes 5 audio files in parallel →
.jsonfiles created - Returns list of 3 images needing OCR
- Claude reads all 3 images in parallel, creates natural summaries →
.mdfiles created - Reports: "Processed 5 audio and 3 images across 3 folders."
User: "Process media for 24.10"
Claude:
- Runs:
python .claude/skills/gastrohem-media-processor/scripts/process_media.py --scan-date 24.10 - Same workflow for 24.10 date
Performance Improvements (Nove Optimizacije)
1. Paralelizacija Audio Transkripicja:
- Do 3 audio fajla se transkribuju istovremeno
- Koristi
ThreadPoolExecutorza paralelno izvršavanje - 3x brže nego prije
2. Batch OCR za Slike:
- Sve slike se mogu pročitati odjednom (parallel Read tool calls)
- Svaka slika dobija svoj
.mdfajl:image.png.md - Prirodan sažetak sa fokusom na Gastrohem-relevantne informacije (kontakti, imena, brojevi, poslovni detalji)
- Helper funkcija
save_ocr_md()za lako čuvanje
3. Automatski Scan Svih Foldera:
- Default: Koristi današnji datum (nije potrebno specificirati)
- Automatski pronalazi sve foldere za taj datum u SVIM odjelima
- Procesira sve odjednom
Struktura Skripti:
process_audio.py- Audio only (paralelno)process_images.py- Image OCR helper functionsprocess_media.py- Master skripta (kombinuje oba)
Tipična brzina:
- Audio: ~3-5 sec po fajlu (paralelno, 3 max)
- Slike: ~2-3 sec po slici (batch)
- Scan svih odjela: <1 sekunda