Claude Code Plugins

Community-maintained marketplace

Feedback

|

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name huggingface
description Import GGUF models from HuggingFace into Ollama. Pull models directly using the hf.co/ prefix, track download progress, and use imported models for inference.

HuggingFace Model Import

Overview

Ollama can directly pull GGUF models from HuggingFace using the hf.co/ prefix. This enables access to thousands of quantized models beyond the official Ollama library.

Quick Reference

Action Syntax
Pull model hf.co/{org}/{repo}:{quantization}
List models ollama.list()
Use model Same as any Ollama model
Delete model ollama.delete("hf.co/...")

Model Naming Format

hf.co/{organization}/{repository}-GGUF:{quantization}

Examples:

hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M
hf.co/TheBloke/Llama-2-7B-Chat-GGUF:Q4_K_M
hf.co/microsoft/Phi-3-mini-4k-instruct-gguf:Q4_K_M

Common Quantizations

Quantization Size Quality Use Case
Q2_K Smallest Lowest Testing only
Q4_K_M Medium Good Recommended default
Q5_K_M Larger Better Quality-focused
Q6_K Large High Near-original quality
Q8_0 Largest Highest Maximum quality

Pull Model from HuggingFace

With Progress Tracking

import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

print(f"Pulling {HF_MODEL}...")

last_status = ""
for progress in ollama.pull(HF_MODEL, stream=True):
    status = progress.get("status", "")
    digest = progress.get("digest", "")
    total = progress.get("total")

    # Only print when status changes
    if status != last_status:
        if status == "pulling manifest":
            print(f"  {status}")
        elif status.startswith("pulling") and digest:
            short_digest = digest.split(":")[-1][:12] if ":" in digest else digest[:12]
            size_mb = (total / 1024 / 1024) if total else 0
            if size_mb > 100:
                print(f"  pulling {short_digest}... ({size_mb:.0f} MB)")
        elif status in ["verifying sha256 digest", "writing manifest", "success"]:
            print(f"  {status}")

        last_status = status

print("Model pulled successfully!")

Simple Pull

import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

# Non-streaming (blocks until complete)
ollama.pull(HF_MODEL)
print("Model pulled!")

Verify Installation

import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

models = ollama.list()
model_names = [m.get("model", "") for m in models.get("models", [])]

# Check for the HF model
hf_model_installed = any(
    "Nous-Hermes" in name or HF_MODEL in name
    for name in model_names
)

if hf_model_installed:
    print("Model is installed!")
    for name in model_names:
        if "Nous-Hermes" in name or "hf.co" in name:
            print(f"  Name: {name}")
else:
    print("Model not found")

Show Model Details

import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

model_info = ollama.show(HF_MODEL)

print(f"Model: {HF_MODEL}")
if "details" in model_info:
    details = model_info["details"]
    print(f"Family: {details.get('family', 'N/A')}")
    print(f"Parameter Size: {details.get('parameter_size', 'N/A')}")
    print(f"Quantization: {details.get('quantization_level', 'N/A')}")

Use Imported Model

Generate Text

import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

result = ollama.generate(
    model=HF_MODEL,
    prompt="What is the capital of France?"
)
print(result["response"])

Chat Completion

import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

# Nous-Hermes-2 uses ChatML format natively
response = ollama.chat(
    model=HF_MODEL,
    messages=[
        {"role": "system", "content": "You are Hermes 2, a helpful AI assistant."},
        {"role": "user", "content": "Explain quantum computing in two sentences."}
    ]
)
print(response["message"]["content"])

Delete Imported Model

import ollama

HF_MODEL = "hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M"

ollama.delete(HF_MODEL)
print("Model deleted!")

Popular HuggingFace Models

General Purpose

Model HuggingFace Path Size
Nous-Hermes-2-Mistral hf.co/NousResearch/Nous-Hermes-2-Mistral-7B-DPO-GGUF:Q4_K_M 4.4 GB
Llama-2-7B-Chat hf.co/TheBloke/Llama-2-7B-Chat-GGUF:Q4_K_M 4.1 GB
Mistral-7B-Instruct hf.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF:Q4_K_M 4.4 GB

Code Models

Model HuggingFace Path Size
CodeLlama-7B hf.co/TheBloke/CodeLlama-7B-Instruct-GGUF:Q4_K_M 4.1 GB
Phind-CodeLlama hf.co/TheBloke/Phind-CodeLlama-34B-v2-GGUF:Q4_K_M 20 GB
WizardCoder hf.co/TheBloke/WizardCoder-Python-7B-V1.0-GGUF:Q4_K_M 4.1 GB

Small/Fast Models

Model HuggingFace Path Size
Phi-3-mini hf.co/microsoft/Phi-3-mini-4k-instruct-gguf:Q4_K_M 2.4 GB
TinyLlama hf.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF:Q4_K_M 0.7 GB

Finding Models on HuggingFace

  1. Go to huggingface.co/models
  2. Filter by:
    • Library: GGUF
    • Task: Text Generation
  3. Look for models with -GGUF suffix
  4. Check the "Files" tab for available quantizations

Troubleshooting

Model Not Found

Symptom: Error pulling model

Check:

  • Repository exists on HuggingFace
  • Repository has GGUF files
  • Quantization tag is correct
# Verify HuggingFace URL
# https://huggingface.co/{org}/{repo}/tree/main

Download Fails

Symptom: Download interrupted or fails

Fix:

  • Check internet connection
  • Try again (Ollama resumes partial downloads)
  • Check disk space

Wrong Prompt Format

Symptom: Model gives poor responses

Fix:

  • Check model card for correct prompt template
  • Some models require specific formats (ChatML, Alpaca, etc.)
# ChatML format example (Nous-Hermes-2)
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
]

# The ollama library handles format conversion automatically

When to Use This Skill

Use when:

  • You need a model not in the official Ollama library
  • Testing specific model variants
  • Using specialized/fine-tuned models
  • Comparing different quantizations

Resources

Cross-References

  • bazzite-ai-ollama:python - Using imported models
  • bazzite-ai-ollama:api - REST API for model management