name	voice-analyze
description	Reverse-engineer voice profiles from sample content by analyzing writing patterns. Use when relevant to the task.

voice-analyze

Reverse-engineer voice profiles from sample content by analyzing writing patterns.

Triggers

"analyze this writing style"
"extract voice from..."
"what voice is this?"
"create profile from this sample"
"match this writing style"

Behavior

When triggered, this skill:

Analyzes text samples for:
- Sentence structure and length patterns
- Vocabulary sophistication and domain
- Tone markers (formality, confidence, warmth)
- Structural patterns (lists, examples, questions)
- Perspective and voice choices
Extracts measurable features:
- Average sentence length
- Vocabulary complexity (syllables, word length)
- Contraction usage
- Personal pronoun frequency
- Question density
- List/bullet usage
Maps features to voice dimensions:
- Statistical analysis → tone scale values (0-1)
- Pattern detection → structure preferences
- Vocabulary extraction → prefer/avoid lists
Generates voice profile matching the analyzed style

Usage Examples

Analyze Existing Documentation

User: "Analyze this writing style" + [paste technical docs]

Analysis:
- Formality: 0.7 (no contractions, structured sentences)
- Confidence: 0.85 (direct statements, few hedges)
- Warmth: 0.25 (impersonal, third-person)
- Complexity: 0.8 (technical vocabulary, long sentences)

Output: analyzed-technical-docs.yaml

Match Brand Voice

User: "Extract voice from our marketing copy" + [paste samples]

Analysis:
- Formality: 0.3 (conversational, contractions)
- Confidence: 0.7 (benefit claims, but some hedging)
- Warmth: 0.85 (second person, friendly tone)
- Energy: 0.8 (exclamation points, action verbs)

Output: brand-marketing-voice.yaml

Capture Personal Style

User: "Create profile from my blog posts" + [paste samples]

Analysis:
- Identifies personal writing quirks
- Extracts signature phrases
- Maps to voice dimensions

Output: personal-blog-voice.yaml

Analysis Methodology

Feature Extraction

Feature	Measurement	Maps To
Sentence length	Avg words/sentence	complexity
Contractions	Frequency per 100 words	formality (inverse)
First person ("I", "we")	Frequency	warmth
Second person ("you")	Frequency	warmth
Passive voice	Percentage of sentences	confidence (inverse)
Questions	Per paragraph	warmth, engagement
Hedging words	"might", "perhaps", "could"	confidence (inverse)
Exclamation marks	Frequency	energy
Technical terms	Domain vocabulary density	complexity

Dimension Calibration

Formality (0-1):

0.0-0.3: Contractions frequent, casual language, fragments okay
0.4-0.6: Mixed style, professional but accessible
0.7-1.0: No contractions, complete sentences, formal structure

Confidence (0-1):

0.0-0.3: Many hedges ("might", "perhaps"), questions, qualifiers
0.4-0.6: Balanced certainty, occasional hedges
0.7-1.0: Direct statements, conclusions first, few qualifiers

Warmth (0-1):

0.0-0.3: Third person, passive voice, clinical tone
0.4-0.6: Professional but personable
0.7-1.0: Second person, inclusive language, empathetic

Energy (0-1):

0.0-0.3: Calm, measured, understated
0.4-0.6: Balanced engagement
0.7-1.0: Exclamation marks, action verbs, dynamic phrasing

Complexity (0-1):

0.0-0.3: Short sentences, simple vocabulary, accessible
0.4-0.6: Moderate complexity, clear but nuanced
0.7-1.0: Long sentences, technical vocabulary, layered ideas

Vocabulary Extraction

Signature phrases - Identified by:

Repeated patterns across samples
Distinctive constructions
Opening/closing patterns

Domain vocabulary - Extracted by:

Technical term frequency
Specialized jargon
Industry-specific language

Avoid patterns - Detected by:

Conspicuous absence of common phrases
Consistent avoidance of certain constructions

Output Format

name: analyzed-sample-voice
version: 1.0.0
description: Voice profile extracted from sample content
analysis_source:
  sample_size: 1500  # words analyzed
  sample_count: 3    # number of samples
  confidence: 0.85   # analysis confidence score
tone:
  formality: 0.65
  confidence: 0.8
  warmth: 0.4
  energy: 0.5
  complexity: 0.7
vocabulary:
  prefer:
    - "extracted signature phrase 1"
    - "detected domain terminology"
  avoid:
    - "patterns not found in samples"
  signature_phrases:
    - "The key point is..."
    - "This demonstrates..."
structure:
  sentence_length: medium    # avg 15-20 words
  paragraph_length: medium   # avg 4-6 sentences
  sentence_variety: high     # varied structure detected
  use_lists: when-appropriate
  use_examples: frequently
  use_questions: rarely
perspective:
  person: third
  voice: active
  tense: present
extracted_patterns:
  opening_style: "context-first"
  closing_style: "conclusion-summary"
  transition_style: "logical-flow"

CLI Usage

# Analyze from file
python voice_analyzer.py --input sample.txt

# Analyze from multiple files
python voice_analyzer.py --input "sample1.txt,sample2.txt,sample3.txt"

# Analyze from stdin (pipe content)
cat sample.txt | python voice_analyzer.py --stdin

# Specify output name
python voice_analyzer.py --input sample.txt --name my-extracted-voice

# Output to specific directory
python voice_analyzer.py --input sample.txt --output .aiwg/voices/

# JSON output for inspection
python voice_analyzer.py --input sample.txt --json

Integration

Output: Creates profiles usable by voice-apply
Chain: voice-analyze → voice-create (to refine) → voice-apply
Chain: voice-analyze + voice-analyze → voice-blend (combine styles)

Accuracy Considerations

Minimum sample: 500+ words for reliable analysis
Multiple samples: 3+ samples improve accuracy
Consistent genre: Mixing genres reduces accuracy
Confidence score: Output includes analysis confidence (0-1)

References

Schema: ../../../schemas/voice-profile.schema.json
Dimensions guide: ../voice-apply/references/voice-dimensions.md
Generator: ../voice-create/scripts/voice_generator.py

voice-analyze

Install Skill

SKILL.md

voice-analyze

Triggers

Behavior

Usage Examples

Analyze Existing Documentation

Match Brand Voice

Capture Personal Style

Analysis Methodology

Feature Extraction

Dimension Calibration

Vocabulary Extraction

Output Format

CLI Usage

Integration

Accuracy Considerations

References