name	interview-automation-builder
description	Build AI-powered interview systems with question generation, response evaluation, and session management. Supports FastAPI + OpenAI integration for technical, behavioral, and scenario-based interviews with real-time scoring and feedback.

Interview Automation Builder Skill

Expert assistance for building AI-powered interview and assessment automation systems.

What This Skill Provides

Core Tools

generate_interview_questions.py - Create adaptive interview questions using LLM
evaluate_responses.py - Score and analyze candidate responses with rubrics
manage_sessions.py - Track interview progress, session state, and analytics

Reference Documentation

assessment_patterns.md - Scoring rubrics, evaluation criteria, bias detection
voice_processing.md - Audio transcription, speech-to-text integration patterns
openai_interview_patterns.md - OpenAI API patterns for interview automation
session_management.md - State tracking, progress monitoring, analytics
best_practices.md - Interview automation best practices
troubleshooting.md - Common interview system issues and solutions

Templates

Interview question templates (technical, behavioral, scenario-based)
Scoring rubrics and evaluation forms
Session state management schemas
Real-time feedback components
FastAPI endpoint templates for interview APIs

When to Use This Skill

Perfect For

Building AI-powered interview practice platforms
Creating technical assessment systems
Automating behavioral interview question generation
Implementing real-time response evaluation
Tracking candidate progress across interview sessions
Integrating OpenAI for adaptive questioning
Voice-enabled interview systems

Not For

General chatbot development (use llm-integration)
Customer support automation (different domain)
Simple form-based surveys (over-engineered)
Non-AI interview scheduling (use calendar tools)

Quick Start Workflows

Workflow 1: Generate Adaptive Technical Interview

# Step 1: Generate technical questions for a Python developer role
python scripts/generate_interview_questions.py \
  --role "Senior Python Developer" \
  --type technical \
  --difficulty advanced \
  --count 10 \
  --output questions.json

# Step 2: Review generated questions
cat questions.json

# Step 3: Start interview session
python scripts/manage_sessions.py \
  --action create \
  --candidate-id "candidate_123" \
  --questions questions.json

Workflow 2: Evaluate Candidate Responses

# Step 1: Evaluate a candidate's response
python scripts/evaluate_responses.py \
  --question-id "q_001" \
  --response "I would use asyncio for concurrent operations..." \
  --rubric rubrics/technical_python.json \
  --output scores.json

# Step 2: View detailed scoring
cat scores.json

# Step 3: Update session with score
python scripts/manage_sessions.py \
  --action update \
  --session-id "session_456" \
  --score-file scores.json

Workflow 3: Build Interview API

# Step 1: Copy FastAPI template
cp templates/interview_api_endpoints.py app/routers/interview.py

# Step 2: Configure question generation
cp templates/question_config.yaml config/questions.yaml

# Step 3: Test API
uvicorn app.main:app --reload

# Step 4: Test question generation endpoint
curl -X POST http://localhost:8000/api/interview/questions \
  -H "Content-Type: application/json" \
  -d '{"role": "Data Scientist", "type": "technical", "count": 5}'

Decision Trees

When to use which question generation approach?

Need interview questions?
│
├─ Pre-defined question bank?
│  └─ Use: Static question templates → templates/question_bank.json
│
├─ Role-specific adaptive questions?
│  └─ Use: generate_interview_questions.py --type technical --adaptive
│
├─ Behavioral assessment?
│  └─ Use: generate_interview_questions.py --type behavioral --framework STAR
│
└─ Mixed technical + behavioral?
   └─ Use: generate_interview_questions.py --type mixed --balance 60/40

When to use which evaluation method?

Need to score responses?
│
├─ Objective technical answer?
│  └─ Use: evaluate_responses.py --rubric technical_objective.json
│
├─ Subjective behavioral response?
│  └─ Use: evaluate_responses.py --rubric behavioral_subjective.json
│
├─ Code submission?
│  └─ Use: evaluate_responses.py --mode code --run-tests
│
└─ Voice/audio response?
   └─ Use: Speech-to-text → evaluate_responses.py --input transcription.txt

Quality Checklist

Essentials (Required)

Questions are unbiased and inclusive
Evaluation rubrics are clearly defined and consistent
Session state is properly tracked and persisted
User consent is obtained for recording/transcription
Candidate data is encrypted and GDPR-compliant
Error handling for API failures and timeouts
Rate limiting to prevent API quota exhaustion

Best Practices (Recommended)

Questions are adaptive based on previous answers
Scoring includes confidence levels and reasoning
Feedback is constructive and actionable
Multiple evaluators for bias reduction
Session recovery for interrupted interviews
Analytics dashboard for hiring team
A/B testing for question effectiveness

Advanced (Nice to Have)

Voice tone analysis for soft skills assessment
Real-time hints for struggling candidates
Multi-language support for global hiring
Integration with ATS (Applicant Tracking System)
Automated interview report generation
Video analysis for non-verbal cues
Custom rubric builder UI

Common Pitfalls & Solutions

Pitfall 1: Biased Question Generation

Problem: LLM generates questions that favor certain demographics or backgrounds.

Solution:

# Use bias detection in question generation
questions = generate_questions(
    role="Software Engineer",
    bias_detection=True,
    diversity_filter=True,
    review_mode=True
)

# Review flagged questions before use
for q in questions:
    if q.get('bias_score', 0) > 0.3:
        print(f"Review needed: {q['question']}")
        print(f"Bias reason: {q['bias_reason']}")

Pitfall 2: Inconsistent Scoring

Problem: Same answer gets different scores across evaluation runs.

Solution:

# Use temperature=0 for consistent scoring
evaluation = evaluate_response(
    response=candidate_answer,
    rubric=scoring_rubric,
    temperature=0,  # Deterministic scoring
    use_chain_of_thought=True  # Explain reasoning
)

Pitfall 3: Session State Loss

Problem: Interview session data lost during network interruptions.

Solution:

# Implement auto-save with state recovery
session_manager = SessionManager(
    auto_save_interval=30,  # Save every 30 seconds
    backup_storage="s3://interviews/backups",
    recovery_enabled=True
)

# Graceful recovery
try:
    session = session_manager.resume(session_id)
except SessionNotFound:
    session = session_manager.recover_from_backup(session_id)

Pitfall 4: API Rate Limiting

Problem: OpenAI API rate limits hit during high-volume interviews.

Solution:

# Implement exponential backoff and caching
from openai_utils import RateLimitedClient

client = RateLimitedClient(
    max_retries=5,
    backoff_factor=2,
    cache_responses=True,  # Cache similar questions
    batch_mode=True  # Batch requests when possible
)

Pro Tips

Tip 1: Use Chain-of-Thought for Explainable Scoring

# Generate scores with detailed reasoning
evaluation = evaluate_response(
    response=answer,
    rubric=rubric,
    chain_of_thought=True  # LLM explains its scoring
)

print(f"Score: {evaluation['score']}")
print(f"Reasoning: {evaluation['reasoning']}")
# Share reasoning with candidate for transparency

Tip 2: Implement Progressive Difficulty

# Adjust difficulty based on performance
def get_next_question(session):
    avg_score = session.get_average_score()

    if avg_score > 0.8:
        difficulty = "advanced"
    elif avg_score > 0.5:
        difficulty = "intermediate"
    else:
        difficulty = "beginner"

    return generate_question(difficulty=difficulty)

Tip 3: Cache Common Questions for Cost Optimization

# Cache frequently generated questions
@lru_cache(maxsize=1000)
def get_common_question(role, level, topic):
    return generate_question(role, level, topic)

# Reduces API costs by 60-80% for common roles

Tip 4: Use STAR Framework for Behavioral Questions

# Generate STAR-formatted behavioral questions
question = generate_behavioral_question(
    framework="STAR",  # Situation, Task, Action, Result
    competency="leadership",
    include_followups=True  # Auto-generate follow-up questions
)

Tip 5: Implement Multi-Evaluator Consensus

# Reduce bias with multiple evaluators
evaluations = [
    evaluate_response(response, rubric, evaluator_id=1),
    evaluate_response(response, rubric, evaluator_id=2),
    evaluate_response(response, rubric, evaluator_id=3)
]

final_score = calculate_consensus(evaluations, method="median")
confidence = calculate_inter_rater_reliability(evaluations)

Example Usage

Complete Interview Session Example

#!/usr/bin/env python3
"""
Complete interview automation example.
Generates questions, manages session, evaluates responses.
"""

from interview_automation import (
    QuestionGenerator,
    ResponseEvaluator,
    SessionManager
)

# Initialize components
generator = QuestionGenerator(model="gpt-4")
evaluator = ResponseEvaluator(model="gpt-4", temperature=0)
session_mgr = SessionManager(storage="database")

# Create interview session
session = session_mgr.create_session(
    candidate_id="candidate_123",
    role="Senior Python Developer",
    interview_type="technical"
)

# Generate first question
question = generator.generate(
    role=session.role,
    difficulty="intermediate",
    topics=["async programming", "data structures"]
)

print(f"Question: {question.text}")

# Simulate candidate response
candidate_response = """
I would use asyncio.gather() to run multiple coroutines concurrently.
This allows efficient I/O operations without blocking.
"""

# Evaluate response
evaluation = evaluator.evaluate(
    question=question,
    response=candidate_response,
    rubric="rubrics/python_technical.json",
    explain=True
)

print(f"Score: {evaluation.score}/10")
print(f"Reasoning: {evaluation.reasoning}")

# Update session
session_mgr.add_response(
    session_id=session.id,
    question_id=question.id,
    response=candidate_response,
    evaluation=evaluation
)

# Generate next question based on performance
if evaluation.score >= 7:
    next_question = generator.generate(
        difficulty="advanced",  # Increase difficulty
        topics=question.topics  # Continue same topic
    )
else:
    next_question = generator.generate(
        difficulty="intermediate",  # Same difficulty
        topics=["different topic"]  # Try different area
    )

print(f"Next Question: {next_question.text}")

# Complete session and generate report
session_mgr.complete_session(session.id)
report = session_mgr.generate_report(session.id)
print(f"Final Report: {report.summary}")

Integration Examples

FastAPI Integration

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from interview_automation import QuestionGenerator, ResponseEvaluator

app = FastAPI()

class QuestionRequest(BaseModel):
    role: str
    difficulty: str
    count: int = 5

class EvaluationRequest(BaseModel):
    question_id: str
    response: str

@app.post("/api/interview/questions")
async def generate_questions(request: QuestionRequest):
    try:
        generator = QuestionGenerator()
        questions = generator.generate_batch(
            role=request.role,
            difficulty=request.difficulty,
            count=request.count
        )
        return {"questions": questions}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.post("/api/interview/evaluate")
async def evaluate_response(request: EvaluationRequest):
    try:
        evaluator = ResponseEvaluator()
        result = evaluator.evaluate(
            question_id=request.question_id,
            response=request.response
        )
        return {"evaluation": result}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

Related Skills

llm-integration - Core LLM integration patterns
fastapi-backend-builder - Build FastAPI backends for interview APIs
validation-pipeline - Multi-stage validation for response quality
streamlit-app-builder - Build interview admin dashboards

Next Steps

Review the reference documentation in references/
Explore templates in templates/
Run example scripts with --help to see all options
Check references/troubleshooting.md for common issues
Review references/best_practices.md for production deployment

interview-automation-builder

Install Skill

SKILL.md

Interview Automation Builder Skill

What This Skill Provides

Core Tools

Reference Documentation

Templates

When to Use This Skill

Perfect For

Not For

Quick Start Workflows

Workflow 1: Generate Adaptive Technical Interview

Workflow 2: Evaluate Candidate Responses

Workflow 3: Build Interview API

Decision Trees

When to use which question generation approach?

When to use which evaluation method?

Quality Checklist

Essentials (Required)

Best Practices (Recommended)

Advanced (Nice to Have)

Common Pitfalls & Solutions

Pitfall 1: Biased Question Generation

Pitfall 2: Inconsistent Scoring

Pitfall 3: Session State Loss

Pitfall 4: API Rate Limiting

Pro Tips

Tip 1: Use Chain-of-Thought for Explainable Scoring

Tip 2: Implement Progressive Difficulty

Tip 3: Cache Common Questions for Cost Optimization

Tip 4: Use STAR Framework for Behavioral Questions

Tip 5: Implement Multi-Evaluator Consensus

Example Usage

Complete Interview Session Example

Integration Examples

FastAPI Integration

Related Skills

Next Steps