| name | notion-llm-config |
| description | Manages the Notion LLM Config Table database tracking model configurations and parameters across multiple model providers. Use when working with the LLM config table, adding models, updating model parameters, or querying model architecture details. |
| allowed-tools | mcp__notion__notion-search, mcp__notion__notion-fetch, mcp__notion__notion-create-pages, mcp__notion__notion-update-page, mcp__notion__notion-update-database, Read, Bash |
Notion LLM Config Table Management
Database Information
- Database ID:
ddfd95bd-109a-4ac6-955c-90541cc53d5e - Data Source ID:
0a9fafd6-2cc2-4d6b-b6f0-3797ea777421 - Location: Family Notes workspace → "LLM config table"
- Purpose: Track LLM model configurations across multiple providers (Llama, GPT-2, Qwen, others)
Data Sources
Llama Models
Primary source: ~/projects/github/llama-models/models/sku_list.py
- Contains all registered Llama models with architecture parameters
- Constants:
LLAMA2_VOCAB_SIZE = 32000,LLAMA3_VOCAB_SIZE = 128256 - WARNING:
sku_list.pyhas incorrectffn_dim_multipliervalues for Llama 2 7B and 13B models- sku_list.py shows 1.3 for all Llama 2 models, but actual checkpoints differ
- Always verify d_ff values against HuggingFace configs, not just sku_list.py
GPT-2 Models
Primary source: HuggingFace config.json files
- Base repos:
openai-community/gpt2,gpt2-medium,gpt2-large,gpt2-xl - All variants: vocab_size=50257, max_context=1024, uses BPE tokenizer
- Uses learned positional embeddings (not RoPE)
User Preferences
- No emojis in any updates or content
- Batch operations preferred over sequential updates
- Focus on practical utility - add "Main" column for filtering to representative models
- Verify calculations - user will catch errors, so double-check formulas
Database Schema Key Columns
Identity
- Model Name (title), Model Family, Model Type, Core Model ID
- HuggingFace Repo, Variant
Architecture Parameters
- d_hidden = model hidden dimension (
dimfrom Llama arch_args) - d_ff = feed-forward hidden dimension (see calculation formulas below)
- d_ff / d_hidden Ratio = d_ff / d_hidden, rounded to 3 decimals
- Llama 2: 2.688 (7B), 2.7 (13B), 3.5 (70B)
- Llama 3+: 2.667 (3B), 3.125 (Guard INT4), 3.25 (405B), 3.5 (most common), 4.0 (1B)
- GPT-2: 4.0 (all variants - standard transformer)
- Null for models without d_ff (Llama 4 has empty arch_args)
- Gated MLP (checkbox) = gated MLP/activation (SwiGLU for Llama, standard GELU for GPT-2)
- n_layers, n_heads, n_kv_heads, head_dim
- Multiple Of, Norm Eps
Tokenization
- Tokenizer: Llama 2 Tokenizer (32k vocab) or Llama 3 Tokenizer (128k vocab)
- Vocab Size: 32000 (Llama 2), 128256 (Llama 3+)
MoE Architecture
- Is MoE (checkbox)
- Num Experts: Llama 4: 16 (Scout), 128 (Maverick)
- Top K Experts: Number of routed experts per token
- MoE Routing: "Token Choice" (Llama 4) vs "Expert Choice"
- Has Shared Expert: Llama 4 has 1 shared + 1 routed per token
- Activated Params (B): Llama 4: 17B
- Total Params (B): Llama 4: 109B (Scout), 400B (Maverick)
RoPE Configuration
- RoPE Theta, RoPE Freq Base
- Use Scaled RoPE (checkbox)
Other
- Quantization Format, PTH File Count, Max Context Size
- Main (checkbox) - marks representative models for filtering
Update Patterns
Property Updates (Preferred)
mcp__notion__notion-update-page({
"page_id": "page-id-here",
"command": "update_properties",
"properties": {
"Tokenizer": "Llama 3 Tokenizer",
"Vocab Size": 128256,
"Is MoE": "__YES__", # Checkboxes use __YES__/__NO__
"Num Experts": 16
}
})
Batch Updates
- Use parallel tool calls when updating multiple independent pages
- Group by model family for logical batching (e.g., all Scout models together)
Search & Fetch
# Search within database
mcp__notion__notion-search({
"query": "Llama 4",
"query_type": "internal",
"data_source_url": "collection://0a9fafd6-2cc2-4d6b-b6f0-3797ea777421"
})
# Fetch page details
mcp__notion__notion-fetch({"id": "page-id-or-url"})
Common Tasks
Adding New Models
- Parse
~/projects/github/llama-models/models/sku_list.pyto get model data - Extract arch_args and calculate derived fields (head_dim, d_ff using Llama formula, d_ff/d_hidden ratio)
- Determine tokenizer based on model family (see Llama Model Family Mappings)
- Set MoE fields for Llama 4 models
- Set "Gated MLP" to Yes (all Llama models use SwiGLU)
- Create pages in batches using
mcp__notion__notion-create-pages
Marking Representative Models
Representative "Main" models (user preference):
- Llama 2: 7b chat, 70b chat
- Llama 3.1: 8b instruct, 70b instruct, 405b instruct (FP8)
- Llama 3.2: 1b instruct, 3b instruct, 11b vision instruct (user trains small models)
- Llama 3.3: 70b instruct
- Llama 4: Scout instruct, Maverick instruct
- GPT-2: All variants (117M, 345M, 774M, 1.5B) marked as Main
Schema Updates
mcp__notion__notion-update-database({
"database_id": "ddfd95bd-109a-4ac6-955c-90541cc53d5e",
"properties": {
"New Column": {"type": "number", "number": {}}
}
})
Important Notes
Pitfalls to Avoid
- Don't clear existing fields - Only specify properties you're updating
- Column ordering - Cannot be changed via API (view-level setting in UI)
- Checkbox format - Must use
"__YES__"or"__NO__", not boolean - Llama 4 models - Have empty
arch_args={}in sku_list.py, no d_hidden/d_ff available
Model Family Mappings
Llama
- llama2: Llama 2 Tokenizer, 32k vocab
- Actual d_ff values (verified from HuggingFace):
- 7B: d_ff=11008 (NOT 14336 from sku_list.py formula)
- 13B: d_ff=13824 (NOT 17920 from sku_list.py formula)
- 70B: d_ff=28672 (correct in sku_list.py)
- Actual d_ff values (verified from HuggingFace):
- llama3, llama3_1, llama3_2, llama3_3, llama4, safety: Llama 3 Tokenizer, 128k vocab
GPT-2
- gpt2: BPE tokenizer, 50257 vocab, 1024 max context
- Naming: "GPT-2", "GPT-2 Medium", "GPT-2 Large", "GPT-2 XL"
- Model Type: "base" (all are base models)
- Parameter counts: 117M (base), 345M (medium), 774M (large), 1.5B (XL)
- Config parameter mappings:
n_embd→d_hidden,n_layer→n_layers
Llama MoE Architecture (Llama 4 only)
- Only Llama 4 models are MoE
- Architecture: 1 shared expert (always active) + 1 routed expert (top_k=1)
- Effective: 2 experts per token (shared + routed)
- Routing: Token Choice (each token selects expert via router scores)
Llama Gated MLP / Activation Function
- All Llama models use gated MLP with SwiGLU activation
- Implementation:
w2(F.silu(w1(x)) * w3(x))(frommodels/llama3/model.py) - Uses 3 weight matrices (w1, w2, w3) instead of standard 2-matrix FFN
- SwiGLU = Swish-Gated Linear Unit (Swish is same as SiLU)
Calculation Formulas
General Derived Fields
head_dim = d_hidden / n_heads
d_ff_d_hidden_ratio = round(d_ff / d_hidden, 3)
Llama d_ff Calculation
From actual Llama code (models/llama3/model.py):
# Initial: hidden_dim = 4 * dim
hidden_dim = int(2 * hidden_dim / 3) # = int(8 * dim / 3)
if ffn_dim_multiplier is not None:
hidden_dim = int(ffn_dim_multiplier * hidden_dim)
# Round up to multiple_of
hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)
# Result is d_ff
IMPORTANT: Llama 2 Model-Specific Behavior
Llama 2 models use different ffn_dim_multiplier values than what's in sku_list.py:
- 7B & 13B: Use
ffn_dim_multiplier=None(pure 8d/3 formula)- 7B: d_ff=11008, ratio=2.688
- 13B: d_ff=13824, ratio=2.7
- 70B: Uses
ffn_dim_multiplier=1.3(as specified in sku_list.py)- 70B: d_ff=28672, ratio=3.5
This was verified from actual HuggingFace checkpoint configs. The sku_list.py incorrectly shows ffn_dim_multiplier=1.3 for all Llama 2 models.
Alternative formula from HuggingFace transformers (equivalent for models with multiple_of=256):
# For Llama 2 7B, 13B (no multiplier)
def compute_intermediate_size(n, multiple_of=256):
return int(math.ceil(n * 8 / 3) + multiple_of - 1) // multiple_of * multiple_of
GPT-2 d_ff Calculation
Standard transformer architecture:
d_ff = 4 * d_hidden # Always 4x for all GPT-2 variants
- Not explicitly in config.json, but defined in model architecture
- All GPT-2 models have d_ff/d_hidden ratio of exactly 4.0
Helpful Scripts & References
Llama
Working directory: ~/projects/github/llama-models/
Scripts:
- Parse models:
parse_llama_models.py - Prepare Notion data:
prepare_notion_data.py - Update tokenizers:
update_tokenizers.py - Bulk updates:
bulk_update_llama3.py - Calculate d_ff:
fix_d_ff.py(correct d_ff calculation using actual Llama formula) - Calculate ratios:
calculate_ratios.py(d_ff/d_hidden ratios and gated MLP status) - Data files:
d_ff_corrections.json,complete_notion_updates.json,ratio_updates.json - Documentation:
COMPLETION_SUMMARY.md(d_ff update history),STATUS.md(current status)
References:
- Llama models repo:
~/projects/github/llama-models/ - Model definitions:
models/sku_list.py - Architecture files:
models/llama{2,3,4}/args.py - MoE implementation:
models/llama4/moe.py
GPT-2
HuggingFace References:
- Base:
https://huggingface.co/openai-community/gpt2 - Medium:
https://huggingface.co/gpt2-medium - Large:
https://huggingface.co/gpt2-large - XL:
https://huggingface.co/gpt2-xl - Config location:
/raw/main/config.json(append to repo URL)
Architecture Notes:
- Uses learned positional embeddings (not RoPE) - add to Notes field
- Activation: GELU (gelu_new variant) - not gated MLP
- Standard attention (n_kv_heads = n_heads, no GQA)
- All variants have head_dim = 64
- Norm epsilon: 1e-05 (layer_norm_epsilon in config)