| name | mechinterp-crossmodel-matcher |
| description | Match SAE features between Ultra (24K) and Full (2K) models based on activation patterns and token overlap |
MechInterp Cross-Model Matcher
Match features between the Ultra (24K features) and Full (2K features) SAE models to understand feature correspondence and discover monosemantic representations.
Purpose
The cross-model matcher skill:
- Finds corresponding features across models
- Computes similarity based on top token overlap
- Identifies features unique to each model
- Helps validate interpretations across model scales
When to Use
Use this skill when you:
- Have interpreted a feature in one model and want to find its counterpart
- Want to validate that a pattern exists across model scales
- Need to understand what the Ultra model decomposes that Full doesn't
Usage
Programmatic
from splatnlp.mechinterp.analysis import FeatureMatcher
from splatnlp.mechinterp.skill_helpers import load_context
# Load source context (the model with your known feature)
source_ctx = load_context("ultra")
# Initialize matcher (automatically loads target model)
matcher = FeatureMatcher(source_ctx)
# Find matches for an Ultra feature in the Full model
report = matcher.find_matches(
source_feature=18712,
n_candidates=500, # How many Full features to check
n_top_matches=10 # How many matches to return
)
# View results
print(f"Searched {report.n_candidates_tested} candidates")
print(f"Best correlation: {report.best_correlation:.3f}")
for match in report.matches:
print(f"\nFull feature {match.target_feature}:")
print(f" Token overlap: {match.top_token_overlap:.3f}")
print(f" Shared tokens: {match.shared_top_tokens[:5]}")
print(f" Notes: {match.notes}")
Detailed Comparison
# Compare two specific features in detail
comparison = matcher.compare_features(
source_fid=18712, # Ultra feature
target_fid=1024, # Full feature
)
print(f"Jaccard similarity: {comparison['jaccard_similarity']:.3f}")
print(f"Shared tokens: {comparison['shared_tokens'][:10]}")
print(f"Ultra-only tokens: {comparison['source_only_tokens'][:10]}")
print(f"Full-only tokens: {comparison['target_only_tokens'][:10]}")
Matching Metrics
Token Overlap (Jaccard Similarity)
Compares top tokens between features:
overlap = |source_top ∩ target_top| / |source_top ∪ target_top|
- > 0.3: Strong match - likely same underlying concept
- 0.1 - 0.3: Moderate match - related but not identical
- < 0.1: Weak match - probably different concepts
Interpretation
High overlap suggests:
- Features detect the same pattern
- Ultra feature may be a "refinement" of Full feature
- Good candidate for cross-model validation
Low overlap with similar activation patterns suggests:
- Ultra model has decomposed the Full feature
- Multiple Ultra features may combine to match one Full feature
Example: Finding Ultra Decomposition
# Example: A Full model feature that might be polysemantic
full_ctx = load_context("full")
matcher = FeatureMatcher(full_ctx) # Source = Full
# Find what Ultra features correspond to Full feature 512
report = matcher.find_matches(source_feature=512)
# If multiple Ultra features match, the Full feature may be polysemantic
if len([m for m in report.matches if m.combined_score > 0.1]) > 3:
print("Full feature 512 appears to be polysemantic")
print("Ultra decomposition:")
for m in report.matches[:5]:
print(f" Ultra {m.target_feature}: {m.shared_top_tokens[:3]}")
Workflow Integration
- Start with interpreted feature: Begin with a feature you understand
- Find matches: Use this skill to find counterparts
- Validate interpretation: Check if matches have similar behavior
- Document correspondence: Update research state with cross-model links
- Investigate decomposition: If Ultra splits a Full feature, analyze each part
Limitations
- Token overlap is a proxy; true matching would require shared activation data
- Different expansion factors mean different granularity
- Some features may not have clear counterparts
See Also
- mechinterp-cluster-mapper: Analyze groups of related features
- mechinterp-state: Track cross-model research
- mechinterp-runner: Validate matches with experiments