| name | verify-inference-example |
| description | Verify a CVlization inference example is properly structured, builds successfully, and runs inference correctly. Use when validating inference example implementations or debugging inference issues. |
Verify Inference Example
Systematically verify that a CVlization inference example is complete, properly structured, and functional.
When to Use
- Validating a new or modified inference example
- Debugging inference pipeline issues
- Ensuring example completeness before commits
- Verifying example works after CVlization updates
Important Context
Shared GPU Environment: This machine may be used by multiple users simultaneously. Before running GPU-intensive inference:
- Check GPU memory availability with
nvidia-smi - Wait for sufficient VRAM and low GPU utilization if needed
- Consider stopping other processes if you have permission
- If CUDA OOM errors occur, wait and retry when GPU is less busy
Verification Checklist
1. Structure Verification
Check that the example directory contains all required files:
# Navigate to example directory
cd examples/<capability>/<task>/<framework>/
# Expected structure:
# .
# ├── example.yaml # Required: CVL metadata
# ├── Dockerfile # Required: Container definition
# ├── build.sh # Required: Build script
# ├── predict.sh # Required: Inference script
# ├── predict.py # Required: Inference code
# ├── examples/ # Required: Sample inputs
# ├── outputs/ # Created at runtime
# └── README.md # Recommended: Documentation
Key files to check:
example.yaml- Must have: name, capability, stability, presets (build, predict/inference)Dockerfile- Should copy necessary files and install dependenciesbuild.sh- Must setSCRIPT_DIRand calldocker buildpredict.sh- Must mount volumes correctly and call predict.pypredict.py- Main inference scriptexamples/- Directory with sample input files
2. Build Verification
# Option 1: Build using script directly
./build.sh
# Option 2: Build using CVL CLI (recommended)
cvl run <example-name> build
# Verify image was created
docker images | grep <example-name>
# Expected: Image appears with recent timestamp
What to check:
- Build completes without errors (both methods)
- All dependencies install successfully
- Image size is reasonable
cvl info <example-name>shows correct metadata
3. Inference Verification
Run inference with sample inputs:
# Option 1: Run inference using script directly
./predict.sh
# Option 2: Run inference using CVL CLI (recommended)
cvl run <example-name> predict
# With custom inputs (if supported)
./predict.sh path/to/custom/input.jpg
Immediate checks:
- Container starts without errors
- Model loads successfully (check GPU memory with
nvidia-smiif using GPU) - Inference completes (outputs generated)
- Output files created in
outputs/or similar directory - Results look reasonable (open output files to inspect)
4. Output Verification
Check that inference produces valid outputs:
# Check outputs directory
ls -la outputs/
# Expected: Output files with recent timestamps
# Inspect output content
cat outputs/output.md # For text outputs
# or
python -m json.tool outputs/output.json # For JSON outputs
What to verify:
- Output files are created
- Output format is correct (markdown, JSON, etc.)
- Output contains expected content structure
- Output is non-empty and valid
5. Model Caching Verification
Verify that pretrained models are cached properly:
# Check HuggingFace cache
ls -la ~/.cache/huggingface/hub/
# Expected: Model files downloaded once and reused
# Run inference twice and verify no re-download
./predict.sh 2>&1 | tee first_run.log
# Second run should reuse cached models
./predict.sh 2>&1 | tee second_run.log
# Verify no download messages in second run
grep -i "downloading" second_run.log
# Expected: No new downloads (models already cached)
What to verify:
- Models download to
~/.cache/huggingface/(or framework-specific cache) - Second run reuses cached models without re-downloading
- Check predict.py doesn't set custom cache directories that break caching
6. Runtime Checks
GPU VRAM Usage Monitoring (REQUIRED for GPU models):
Monitor GPU VRAM usage before, during, and after inference:
# In another terminal, watch GPU memory in real-time
watch -n 1 nvidia-smi
# Or get detailed memory breakdown
nvidia-smi --query-gpu=index,name,memory.used,memory.total,memory.free,utilization.gpu --format=csv,noheader,nounits
# Record peak VRAM usage during inference
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits | awk '{print $1 " MB"}'
Expected metrics:
- Model loading: VRAM usage increases as model loads into memory
- Inference peak: VRAM spikes during forward pass
- Cleanup: Memory released after inference completes (for short-running containers)
- Temperature: Stable (<85°C)
What to record for verification metadata:
- Peak VRAM usage in GB (e.g., "8.2GB VRAM" or "12.5GB VRAM")
- Percentage of total VRAM (e.g., "52%" for 12.5GB on 24GB GPU)
- Whether 4-bit/8-bit quantization was used (affects VRAM requirements)
Troubleshooting:
- CUDA OOM: Use smaller model variant, enable quantization (4-bit/8-bit), or run on CPU
- High VRAM idle usage: Check if other processes are using GPU
- Memory not released: Container may still be running (
docker ps)
Docker Container Health:
# Check container runs and exits cleanly
docker ps -a | head
# Verify mounts (for running container)
docker inspect <container-id> | grep -A 10 Mounts
# Should see: workspace, cvlization_repo, huggingface cache
7. Quick Validation Test
For fast verification during development:
# Run with smallest sample input
./predict.sh examples/small_sample.jpg
# Expected runtime: seconds to few minutes
# Verify: Completes without errors, output generated
8. Update Verification Metadata
After successful verification, update the example.yaml with verification metadata:
First, check GPU info:
# Get GPU model and VRAM
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
Format:
verification:
last_verified: 2025-10-25
last_verification_note: "Verified build, inference, model caching, and outputs on [GPU_MODEL] ([VRAM]GB VRAM)"
What to include in the note:
- What was verified: build, inference, outputs
- Key aspects: model caching, GPU/CPU inference
- GPU info: Dynamically determine GPU model and VRAM using nvidia-smi (e.g., "A10 GPU (24GB VRAM)", "RTX 4090 (24GB)")
- If no GPU: Use "CPU-only"
- VRAM usage: Peak VRAM used during inference (e.g., "Uses 8.2GB VRAM (34%) with 4-bit quantization")
- Get with:
nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits - Convert to GB and calculate percentage of total VRAM
- Note if quantization (4-bit/8-bit) was used
- Get with:
- Any limitations: e.g., "Requires 8GB VRAM", "GPU memory constraints"
- Quick notes: e.g., "First run downloads 470MB models"
Example complete entry:
name: pose-estimation-dwpose
docker: dwpose
capability: perception/pose_estimation
# ... other fields ...
verification:
last_verified: 2025-10-25
last_verification_note: "Verified build, inference with video/image inputs, model caching (470MB models), and JSON outputs on [detected GPU]."
When to update:
- After completing full verification checklist (steps 1-7)
- Only if ALL success criteria pass
- When re-verifying after CVlization updates or fixes
Common Issues and Fixes
Build Failures
# Issue: Dockerfile can't find files
# Fix: Check COPY paths are relative to Dockerfile location
# Issue: Dependency conflicts
# Fix: Check requirements.txt versions, update base image
# Issue: Large build context
# Fix: Add .dockerignore file
Inference Failures
# Issue: CUDA out of memory
# Fix: Use smaller model variant or CPU inference
# Issue: Model not found
# Fix: Check model name/path in predict.py, ensure internet connection
# Issue: Input file not found
# Fix: Check file paths, ensure examples/ directory exists
# Issue: Permission denied on outputs
# Fix: Ensure output directories exist and are writable
Output Issues
# Issue: Empty outputs
# Fix: Check model loaded correctly, verify input format
# Issue: Malformed JSON output
# Fix: Check output parsing logic in predict.py
# Issue: Outputs not saved
# Fix: Verify output directory path, check file write permissions
Example Commands
Document AI - Granite Docling
cd examples/perception/doc_ai/granite_docling
./build.sh
./predict.sh
# Check: outputs/output.md contains extracted document structure
Vision-Language - Moondream
cd examples/perception/vision_language/moondream2
./build.sh
./predict.sh examples/demo.jpg
# Check: outputs/ contains image description
CVL Integration
Inference examples integrate with CVL command system:
# List all available examples
cvl list
# Get example info
cvl info granite-docling
# Run example directly (uses example.yaml presets)
cvl run granite-docling build
cvl run granite-docling predict
Success Criteria
An inference example passes verification when:
- ✅ Structure: All required files present, example.yaml valid
- ✅ Build: Docker image builds without errors (both
./build.shandcvl run <name> build) - ✅ Inference: Runs successfully on sample inputs (both
./predict.shandcvl run <name> predict) - ✅ Outputs: Valid output files generated in expected format
- ✅ Model Caching: Models cached to
~/.cache/(typically~/.cache/huggingface/), avoiding repeated downloads - ✅ CVL CLI:
cvl info <name>shows correct metadata, build and predict presets work - ✅ Documentation: README explains how to use the example
- ✅ Verification Metadata: example.yaml updated with
verificationfield containinglast_verifieddate andlast_verification_note
Related Files
Check these files for debugging:
predict.py- Core inference logicpredict.sh- Docker run scriptDockerfile- Environment setupexample.yaml- CVL metadata and presetsexamples/- Sample input filesREADME.md- Usage instructions
Tips
- Use small sample inputs for fast validation
- Monitor GPU memory with
nvidia-smiif using GPU - Check
docker logs <container>if inference hangs - For HuggingFace models, set
HF_TOKENenvironment variable if needed - Most examples support custom input paths as arguments to predict.sh
- Check example.yaml for supported parameters and environment variables
- For diffusion/flow matching models: Reduce sampling steps for faster validation (e.g.,
--num_steps 5or-i num_steps=5for Cog). Most models support step parameters:- Common parameter names:
num_steps,num_inference_steps,steps - Typical defaults: 20-50 steps
- Fast validation: 5-10 steps (lower quality but completes quickly)
- Production: Full step count for best quality
- Examples: Stable Diffusion, SVD, FLUX, AnimateDiff, Flow Matching models
- Common parameter names: