name	strix-halo-setup
description	Complete setup for AMD Strix Halo (Ryzen AI MAX+ 395) PyTorch environments. Handles ROCm installation verification, PyTorch community builds (official wheels don't work with gfx1151), GTT memory configuration, and environment setup. Creates ready-to-use ML workspaces for running 30B parameter models.
license	MIT
metadata	[object Object]

Strix Halo Setup

Set up a new PyTorch project optimized for AMD Strix Halo (Ryzen AI MAX+ 395, gfx1151).

When Claude Should Use This Skill

This skill should be invoked when:

Setting up PyTorch on AMD Strix Halo (Ryzen AI MAX+ 395, gfx1151) hardware
User reports "HIP error: invalid device function" with PyTorch on AMD APU
Configuring environments for running LLMs on AMD integrated graphics
User mentions needing GTT memory configuration for ML workloads
Creating a new ML project specifically for Strix Halo hardware
User asks about running 30B parameter models on AMD Ryzen AI MAX+

What This Skill Does

Verifies system configuration (ROCm, GTT, user groups)
Creates a conda environment with working PyTorch for gfx1151
Sets up proper environment variables
Creates test scripts to verify GPU functionality
Provides a complete project template with best practices

Critical Information

PyTorch Installation: Official PyTorch wheels from pytorch.org DO NOT WORK with gfx1151. They detect the GPU but fail on compute with "HIP error: invalid device function". This skill installs community builds that actually work.

ROCm Installation Note: For Strix Halo APUs, ROCm should be installed with --no-dkms flag to use the inbox kernel driver. If you have amdgpu-dkms installed, it may cause issues when upgrading kernels.

Prerequisites Check

Before running setup, verify the system with:

./scripts/verify_system.sh

This checks:

ROCm installation (6.4.4+ or 7.0.2 required)
User in render and video groups
GTT memory configuration
Python/Conda availability

If any checks fail, see .claude/skills/strix-halo-setup/docs/TROUBLESHOOTING.md for detailed fix instructions.

Setup Process

Step 1: System Verification

Run the verification script:

cd .claude/skills/strix-halo-setup
./scripts/verify_system.sh

Expected output:

✓ AMD GPU detected
✓ ROCm installed
✓ User in render/video groups
✓ GTT configured (or warning if not)

If issues found, follow the script's instructions to fix them.

Step 2: Determine Project Name and Backend

Ask the user for:

Project name: If not specified, use strix-ml-project
Backend choice: PyTorch (training/custom code) or Vulkan (inference only)

Use the AskUserQuestion tool:

Question 1: "What would you like to name your project?" Question 2: "Which backend do you want to set up?"

PyTorch with ROCm: For training, custom code, full ML framework (supports transformers, etc.)
Vulkan: For inference only (llama.cpp, Ollama) - simpler setup, often faster

If PyTorch is chosen, continue with steps below. If Vulkan, skip to Vulkan setup section at the end.

Step 3: Create Environment

Using Conda (Recommended):

# Create new environment with Python 3.14 (or 3.13)
conda create -n {project_name} python=3.14 -y
conda activate {project_name}

Using uv (Alternative):

# Create new environment with Python 3.14 (or 3.13)
uv venv {project_name} --python 3.14
source {project_name}/bin/activate

Step 4: Install PyTorch (Community Build)

CRITICAL: Must use community builds, not official wheels.

Option 1: AMD Nightlies (Recommended)

pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchvision torchaudio

Option 2: scottt's Stable Builds (Fallback) If nightlies fail, use pre-built wheels from: https://github.com/scottt/rocm-TheRock/releases

Download and install with pip install <wheel_file>.

Verify Installation:

python -c "import torch; print('PyTorch:', torch.__version__); print('HIP:', torch.version.hip)"

Should show PyTorch 2.7+ and HIP 6.5+.

Step 5: Configure Environment Variables

Create activation script in the conda environment:

mkdir -p $CONDA_PREFIX/etc/conda/activate.d

cat > $CONDA_PREFIX/etc/conda/activate.d/strix_halo_env.sh << 'EOF'
#!/bin/bash

# Core ROCm settings for Strix Halo (gfx1151)
export HSA_OVERRIDE_GFX_VERSION=11.5.1
export PYTORCH_ROCM_ARCH=gfx1151

# Unified Memory Configuration - CRITICAL for accessing full memory
export HSA_XNACK=1
export HSA_FORCE_FINE_GRAIN_PCIE=1

# Memory allocation settings
export GPU_MAX_HEAP_SIZE=100
export GPU_MAX_ALLOC_PERCENT=100

# Device visibility
export ROCR_VISIBLE_DEVICES=0
export HIP_VISIBLE_DEVICES=0

# Performance optimizations
export ROCBLAS_USE_HIPBLASLT=1
export AMD_LOG_LEVEL=0
export HSA_CU_MASK=0xffffffffffffffff

echo "✓ Strix Halo environment variables set"
EOF

chmod +x $CONDA_PREFIX/etc/conda/activate.d/strix_halo_env.sh

Create deactivation script:

mkdir -p $CONDA_PREFIX/etc/conda/deactivate.d

cat > $CONDA_PREFIX/etc/conda/deactivate.d/strix_halo_env.sh << 'EOF'
#!/bin/bash

unset HSA_OVERRIDE_GFX_VERSION PYTORCH_ROCM_ARCH HSA_XNACK HSA_FORCE_FINE_GRAIN_PCIE
unset GPU_MAX_HEAP_SIZE GPU_MAX_ALLOC_PERCENT ROCR_VISIBLE_DEVICES HIP_VISIBLE_DEVICES
unset ROCBLAS_USE_HIPBLASLT AMD_LOG_LEVEL HSA_CU_MASK
EOF

chmod +x $CONDA_PREFIX/etc/conda/deactivate.d/strix_halo_env.sh

Step 7: Create Project Structure

mkdir -p {project_name}/{scripts,notebooks,data,models,tests}
cd {project_name}

Step 8: Copy Test Scripts

Copy the test scripts from the skill directory:

cp .claude/skills/strix-halo-setup/scripts/*.py scripts/
chmod +x scripts/*.py

Step 9: Create Project README

Create a README with project-specific information:

cat > README.md << 'EOF'
# {Project Name}

PyTorch project optimized for AMD Strix Halo (gfx1151).

## Environment

- **Hardware**: AMD Strix Halo (gfx1151)
- **ROCm**: 6.4.2+
- **PyTorch**: Community build for gfx1151
- **Python**: 3.12

## Setup

```bash
# Activate environment
conda activate {project_name}

# Verify GPU
python scripts/test_gpu_simple.py

# Test memory capacity
python scripts/test_memory.py

Hardware Capabilities

Compute: ~7 TFLOPS FP32, ~12 TFLOPS BF16
Memory: Up to 113GB GPU-accessible (with GTT configuration)
Model Capacity: 30B parameter models in FP16

Best Practices

Use BF16 for 1.6x speedup over FP32
Keep batch size small (1-4) for inference
Data in VRAM is faster than GTT memory
Monitor memory: rocm-smi --showmeminfo gtt

Troubleshooting

If compute fails with "HIP error: invalid device function":

You're using official PyTorch wheels (don't work with gfx1151)
Reinstall: pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch

For more help, see .claude/skills/strix-halo-setup/docs/COMPLETE_GUIDE.md

Created: {date} EOF


### Step 10: Verify Installation

Reactivate the environment to load variables:

```bash
conda deactivate
conda activate {project_name}

# Should see: "✓ Strix Halo environment variables set"

Run verification:

python scripts/test_gpu_simple.py

Expected output:

============================================================
STRIX HALO GPU TEST
============================================================
✓ GPU detected: AMD Radeon Graphics
  Memory: 113.2 GB
  Compute test successful
✓ ALL TESTS PASSED
============================================================

Step 11: Final Summary

Tell the user:

✓ Setup complete! Your Strix Halo environment is ready.

Project: {project_name}
Location: {full_path}

Next steps:
  1. Test GPU: python scripts/test_gpu_simple.py
  2. Test memory: python scripts/test_memory.py
  3. Try a model: See docs/COMPLETE_GUIDE.md for examples

Hardware capabilities:
  - 7-12 TFLOPS compute (FP32/BF16)
  - 113 GB GPU-accessible memory
  - Can run 30B parameter models in FP16

Activate anytime with: conda activate {project_name}

Success Criteria

All of these should pass:

✓ PyTorch detects GPU
✓ Compute operations succeed (no HIP errors)
✓ Can allocate 30GB+ memory
✓ BF16 operations work

Common Issues

Issue: "HIP error: invalid device function"

Cause: Using official PyTorch wheels (don't work with gfx1151)

Solution:

pip uninstall torch torchvision torchaudio
pip install --index-url https://rocm.nightlies.amd.com/v2/gfx1151/ --pre torch torchvision torchaudio

Verify it worked:

python -c "import torch; a=torch.tensor([1.0]).cuda(); print('✓ Works:', (a+1).item())"

Issue: Out of memory below 30GB

Cause: GTT not configured (limited to ~33GB)

Solution 1: Upgrade to kernel 6.16.9+ (no configuration needed)

Solution 2: For older kernels, configure GTT:

.claude/skills/strix-halo-setup/scripts/configure_gtt.sh

This adds kernel parameters to GRUB for GPU to access more system RAM.

Issue: GPU not detected

Cause: User not in render/video groups

Solution:

sudo usermod -aG render,video $USER
# Log out and back in (or reboot)
groups | grep -E "render|video"  # Verify

References

Complete Guide: .claude/skills/strix-halo-setup/docs/STRIX_HALO_COMPLETE_GUIDE.md
Troubleshooting: .claude/skills/strix-halo-setup/docs/TROUBLESHOOTING.md
GTT Configuration: .claude/skills/strix-halo-setup/docs/GTT_MEMORY_FIX.md
Community PyTorch: https://github.com/scottt/rocm-TheRock/releases

Vulkan Setup (Alternative to PyTorch)

If the user chose Vulkan for inference-only workloads:

Step V1: Install Vulkan Drivers

sudo apt install mesa-vulkan-drivers vulkan-tools

Step V2: Verify Vulkan

vulkaninfo | grep "deviceName"
# Should show: AMD Radeon Graphics or similar

Step V3: Install Inference Tools

For llama.cpp:

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make LLAMA_VULKAN=1

For Ollama:

curl -fsSL https://ollama.com/install.sh | sh

Step V4: Test Vulkan

# With llama.cpp
./llama-cli -m /path/to/model.gguf -ngl 99 --gpu-backend vulkan

# With Ollama
ollama run llama2

Vulkan Summary

Tell the user:

✓ Vulkan setup complete!

Backend: Vulkan (inference only)
Use with: llama.cpp, Ollama, other Vulkan-enabled tools

Note: Vulkan often provides better performance for inference than ROCm/HIP.
For training or custom PyTorch code, set up PyTorch instead.

References

Troubleshooting: .claude/skills/strix-halo-setup/docs/TROUBLESHOOTING.md
GTT Configuration: .claude/skills/strix-halo-setup/docs/GTT_MEMORY_FIX.md
Community PyTorch: https://github.com/scottt/rocm-TheRock/releases

Notes

GTT configuration needed for 30B+ models on kernels before 6.16.9 (kernel 6.16.9+ has automatic UMA support)
Vulkan backend often provides better performance for inference
Use BF16 precision in PyTorch for better performance

strix-halo-setup

Install Skill

SKILL.md

Strix Halo Setup

When Claude Should Use This Skill

What This Skill Does

Critical Information

Prerequisites Check

Setup Process

Step 1: System Verification

Step 2: Determine Project Name and Backend

Step 3: Create Environment

Step 4: Install PyTorch (Community Build)

Step 5: Configure Environment Variables

Step 7: Create Project Structure

Step 8: Copy Test Scripts

Step 9: Create Project README

Hardware Capabilities

Best Practices

Troubleshooting

Step 11: Final Summary

Success Criteria

Common Issues

Issue: "HIP error: invalid device function"

Issue: Out of memory below 30GB

Issue: GPU not detected

References

Vulkan Setup (Alternative to PyTorch)

Step V1: Install Vulkan Drivers

Step V2: Verify Vulkan

Step V3: Install Inference Tools

Step V4: Test Vulkan

Vulkan Summary

References

Notes