| name | Model Manager |
| description | Test, validate, and add new AI models to the eval suite. Use when user asks to add new models, test model access, check pricing, or update models.yml. |
Model Manager
Test API access, validate configurations, and add new AI models to the AILANG eval suite.
Quick Start
Most common usage:
# User says: "Can we add GPT-5.1 to the eval suite?"
# This skill will:
# 1. Test API access to GPT-5.1
# 2. Find the correct API model name
# 3. Look up pricing information
# 4. Update models.yml configuration
# 5. Run a test benchmark to verify
When to Use This Skill
Invoke this skill when:
- User asks to "add a new model" to eval suite
- User mentions checking if a model is "accessible" or "available"
- User wants to "test API access" to a model
- User asks to "update models.yml" or "check pricing"
- User says "can we use [model name]?" for evaluations
Available Scripts
scripts/test_model_access.sh <provider> <model-name>
Test API access to a model and display authentication status.
Usage:
# Test OpenAI model
scripts/test_model_access.sh openai gpt-5.1
# Test Anthropic model
scripts/test_model_access.sh anthropic claude-sonnet-4-5-20250929
# Test Google Gemini via Vertex AI
scripts/test_model_access.sh google gemini-3-pro-preview-11-2025
Output:
Testing: openai/gpt-5.1
✓ OPENAI_API_KEY found
✓ API call successful
✓ Model: gpt-5.1-2025-11-13
✓ Tokens: 13 input, 10 output (10 reasoning)
Ready to add to models.yml
scripts/find_model_info.sh <model-keywords>
Search for model information using web search and return API names + pricing.
Usage:
# Find GPT-5.1 info
scripts/find_model_info.sh "GPT-5.1 API model name pricing"
# Find Gemini 3 Pro info
scripts/find_model_info.sh "Gemini 3 Pro API documentation"
Output:
Searching for: GPT-5.1 API model name pricing
✓ Found API names:
- gpt-5.1 (Thinking mode)
- gpt-5.1-chat-latest (Instant mode)
✓ Pricing:
Input: $1.25 per 1M tokens
Output: $10.00 per 1M tokens
Cached: $0.125 per 1M tokens
scripts/update_models_yml.sh <friendly-name> <api-name> <provider> <input-price> <output-price>
Add a new model to models.yml configuration.
Usage:
# Add GPT-5.1
scripts/update_models_yml.sh \
gpt5-1 \
"gpt-5.1" \
openai \
0.00125 \
0.01
Output:
Adding model to models.yml:
Friendly name: gpt5-1
API name: gpt-5.1
Provider: openai
Pricing: $0.00125 / $0.01 per 1K tokens
✓ Updated models.yml
✓ Validated YAML syntax
✓ Ready to test
scripts/verify_vertex_model.sh <model-name>
Check if a Gemini model is available in Vertex AI.
Usage:
# Check if Gemini 3 Pro is available
scripts/verify_vertex_model.sh gemini-3-pro-preview-11-2025
Output:
Checking Vertex AI for: gemini-3-pro-preview-11-2025
✓ GCP project: multivac-internal-prod
✓ Access token obtained
✗ Model not found (404)
Recommendation: Monitor for availability, check again in 1-2 weeks
scripts/run_test_benchmark.sh <model-name>
Run a small test benchmark to verify model works end-to-end.
Usage:
# Test GPT-5.1 with fizzbuzz benchmark
scripts/run_test_benchmark.sh gpt5-1
Output:
Running test benchmark: fizzbuzz
Model: gpt5-1
✓ Benchmark completed
✓ Result: PASS (100%)
✓ Tokens: 245 input, 89 output
✓ Cost: $0.002
Model is ready for production use
Workflow
1. Test API Access
First, verify you can call the model:
# Use test_model_access.sh
scripts/test_model_access.sh openai gpt-5.1
What to check:
- API key is set (OPENAI_API_KEY, ANTHROPIC_API_KEY, or gcloud auth)
- API call succeeds (not 401/403/404)
- Model returns expected structure
- Token usage is reported
For Gemini models:
- Uses Vertex AI (not public API)
- Requires
gcloud auth application-default login - Check availability with
verify_vertex_model.sh
2. Find Model Information
Search for official documentation:
# Find API model name and pricing
scripts/find_model_info.sh "GPT-5.1 API documentation pricing"
What to gather:
- Exact API model name (e.g.,
gpt-5.1notGPT-5.1) - Provider (openai, anthropic, google)
- Input price per 1K tokens
- Output price per 1K tokens
- Context limits (if relevant)
- Special features (adaptive reasoning, caching, etc.)
Reference: See resources/provider_endpoints.md
3. Update models.yml
Add the model configuration:
# Add to models.yml
scripts/update_models_yml.sh \
<friendly-name> \
<api-name> \
<provider> \
<input-per-1k> \
<output-per-1k>
Naming conventions:
- Friendly name:
gpt5-1,claude-sonnet-4-5,gemini-3-pro - API name: Exact string for API calls
- Use hyphens, lowercase
Also update:
- Model suites (
benchmark_suite,extended_suite,dev_models) - Add notes about special features
- Document agent CLI support (if available)
4. Run Test Benchmark
Verify end-to-end:
# Test with a simple benchmark
scripts/run_test_benchmark.sh <model-name>
What to verify:
- Benchmark completes successfully
- Results are reasonable (not garbage output)
- Token usage matches expectations
- Cost calculation works
- No errors in logs
5. Document the Model
Update relevant documentation:
- Add model to this skill's resource guide
- Note any special parameters (e.g.,
max_completion_tokensfor GPT-5.1) - Document authentication requirements
- Add to teaching prompts if needed
6. Optional: Run Full Eval
If model looks good:
# Run small eval suite
ailang eval-suite --models <model-name> --benchmarks fizzbuzz,recursion_factorial
# Run full suite (expensive!)
make eval-baseline EVAL_VERSION=vX.Y.Z FULL=true
Resources
Provider Endpoints
See resources/provider_endpoints.md for:
- API endpoint URLs for each provider
- Authentication methods
- How to test access manually
- Common errors and fixes
Pricing Guide
See resources/pricing_guide.md for:
- How to find official pricing
- Price conversion (per 1M → per 1K)
- Cost calculation verification
- Caching and discounts
Progressive Disclosure
This skill loads information progressively:
- Always loaded: This SKILL.md file (workflow and script descriptions)
- Execute as needed: Scripts in
scripts/(testing, updating, verification) - Load on demand: Resources (detailed endpoint docs, pricing references)
Notes
Important:
- Always test API access BEFORE updating models.yml
- Vertex AI (Gemini) requires gcloud auth, not API key
- GPT-5.1+ uses
max_completion_tokensinstead ofmax_tokens - New models may not be available in all regions immediately
- Check for preview/beta status before adding to production suites
Prerequisites:
- API keys set in environment (OPENAI_API_KEY, ANTHROPIC_API_KEY)
- For Gemini:
gcloudCLI installed and authenticated - For Gemini: GCP project set (
gcloud config set project PROJECT_ID) curl,python3, andjqavailable in PATH
Files modified by this skill:
internal/eval_harness/models.yml- Model configurations- (Optional)
prompts/vX.Y.Z.md- Teaching prompts - (Optional)
.claude/skills/model-manager/resources/- Local model database