| name | distribution-search |
| description | Guidance for finding probability distributions that satisfy specific statistical constraints such as KL divergence targets. This skill should be used when tasks involve constructing probability distributions with exact numerical properties, optimization over high-dimensional probability spaces, or satisfying multiple simultaneous statistical constraints within tight tolerances. |
Distribution Search
Overview
This skill provides structured guidance for finding probability distributions that satisfy specific statistical constraints. These problems typically involve constructing a discrete probability distribution over a large vocabulary that achieves target values for metrics like KL divergence, entropy, or other information-theoretic quantities.
Mathematical Analysis First
Before writing any code, perform thorough mathematical analysis to constrain the solution space:
Derive analytical relationships between the target metrics and distribution properties
- For KL divergence from uniform: KL(P||U) = -H(P) + log(V) where H(P) is entropy and V is vocabulary size
- For KL divergence to uniform: KL(U||P) = log(V) - (1/V) * Σ log(P(i))
Calculate fixed quantities from the problem specification
- Vocabulary size V determines log(V)
- Target values combined with log(V) constrain feasible entropy ranges
Identify implied constraints from simultaneous requirements
- Multiple target metrics often severely restrict the feasible solution space
- Determine if the problem is over-constrained before attempting optimization
Estimate the solution structure analytically
- Determine the approximate entropy the distribution must have
- Estimate how concentrated or spread the probability mass should be
Parameterization Strategy
High-dimensional optimization over all probability values is impractical. Use reduced parameterizations:
Recommended Parameterizations
Power law with uniform tail: k high-probability elements following p_i ∝ i^(-α), remaining elements share probability equally
- Parameters: k (number of special elements), α (power law exponent), p_rest (probability for tail)
Two-group distribution: k elements with probability p_high, remaining elements with probability p_low
- Parameters: k, p_high (p_low is determined by normalization)
Multi-tier distribution: Several groups of elements with distinct probability levels
- Parameters: group sizes and probability levels
Parameter Selection Guidance
- Start with few parameters (2-3) and add flexibility only if needed
- Derive parameter bounds from mathematical constraints rather than arbitrary choices
- Calculate reasonable initial values from the analytical solution estimates
Implementation Approach
Modular Code Structure
Organize code into separate, tested functions:
- kl_forward(P, V): Compute KL(P||Uniform)
- kl_backward(P, V): Compute KL(Uniform||P)
- create_distribution(params, V): Generate distribution from parameterization
- objective(params): Optimization objective combining constraints
- verify_solution(P, V, targets): Independent verification
Computational Efficiency
- Avoid full array operations when possible - use analytical formulas for symmetric distributions
- For two-group distributions: compute entropy/KL directly from group sizes and probabilities
- Estimate computational cost before running - set appropriate timeouts
Optimization Strategy
- Coarse grid search over reduced parameter space to find promising regions
- Local optimization from multiple starting points in promising regions
- Refinement with tighter tolerances near solutions
Verification
Independent Verification
Always verify solutions with code independent from the optimization:
- Check probability constraints: All values non-negative, sum to 1.0
- Compute metrics directly: Use explicit formulas, not the optimization objective
- Test against known cases: Verify computation on uniform distribution or other known solutions
Common Verification Bugs
- Incorrect handling of log(0) - use appropriate thresholds (e.g., max(p, 1e-30))
- Array indexing errors when elements are counted vs indexed
- Forgetting to normalize after parameter adjustments
- Multiplication errors when computing sums over groups
Common Pitfalls
Computational
- Starting with full-dimensional optimization: Immediately recognize the need for reduced parameterization
- Insufficient timeout estimation: Estimate iterations needed before running
- Rewriting entire scripts: Use modular code to enable targeted fixes
Mathematical
- Incomplete constraint analysis: Fully leverage mathematical relationships before coding
- Arbitrary parameter bounds: Derive bounds from problem constraints
- Poor initial values: Use analytical estimates for starting points
Verification
- Trusting optimization output directly: Always verify with independent computation
- Not testing verification code: Verify the verifier using known solutions first
- Numerical precision issues: Handle small probabilities carefully
Problem-Solving Workflow
- Analyze: Derive analytical relationships and estimate solution structure
- Parameterize: Choose reduced parameterization matching expected structure
- Implement: Write modular, tested code for each component
- Search: Grid search followed by local optimization
- Verify: Independent verification of candidate solutions
- Refine: Adjust parameterization if tolerances not achieved
When Solutions Are Not Found
If optimization fails to find valid solutions:
- Check if the problem is mathematically feasible given constraints
- Verify the parameterization can represent valid solutions
- Expand the parameterization to add flexibility
- Check for bugs in objective function or constraint handling
- Try different optimization algorithms or starting points