Claude Code Plugins

Community-maintained marketplace

Feedback

qlora-quantization-impl

@tzervas/qlora-rs
0
0

Implement and debug 4-bit quantization schemes, GGUF export, and quantized training in qlora-rs

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name qlora-quantization-impl
description Implement and debug 4-bit quantization schemes, GGUF export, and quantized training in qlora-rs

QLoRA Quantization Implementation Skill

When to Use

Invoke when the user asks to:

  • Implement new quantization schemes (Q4_1, Q5_0, etc.)
  • Debug quantization accuracy issues
  • Optimize quantization performance
  • Add new GGUF export features
  • Implement double quantization

Quantization Fundamentals

NF4 Format

  • 16 levels optimized for N(0,1) distribution
  • Block-based with shared scale per block
  • Packed: 2 values per byte

Key Operations

// Quantize single value
fn quantize_value_nf4(normalized: f32) -> u8 {
    // Find nearest NF4 level (0-15)
}

// Dequantize single value
fn dequantize_value_nf4(code: u8, scale: f32) -> f32 {
    NF4_LEVELS[code as usize] * scale
}

Adding New Quantization Scheme

Step 1: Define Levels

// src/quantization/q5_0.rs
pub const Q5_0_LEVELS: [f32; 32] = [
    // 32 uniformly spaced levels
];

Step 2: Implement Quantize/Dequantize

pub fn quantize_q5_0(tensor: &Tensor, block_size: usize) -> Result<QuantizedTensor> {
    // Pack 5-bit values (8 values per 5 bytes)
}

Step 3: Add GGUF Type

// In export.rs
const GGUF_TYPE_Q5_0: u32 = 8;

Step 4: Test Accuracy

#[test]
fn test_q5_0_roundtrip() {
    // Verify bounded error
}

Debug Checklist

  • Verify scale computation (absmax vs std-based)
  • Check packing/unpacking byte order
  • Test edge cases (zeros, very large values)
  • Compare with reference implementation
  • Profile memory usage

Key Files

  • src/quantization.rs - Core NF4 implementation
  • src/export.rs - GGUF export
  • tests/quantization_accuracy.rs - Precision tests