name	huggingface-js
description	Runs ML models in the browser and Node.js with Transformers.js and Hugging Face Inference API. Use when adding local inference, embeddings, or calling hosted models without GPU servers.

Hugging Face JavaScript

Run ML models locally with Transformers.js or via the Inference API. Supports text generation, embeddings, image classification, speech recognition, and more.

Transformers.js (Local Inference)

Run models directly in browser or Node.js using ONNX Runtime.

npm install @huggingface/transformers

Text Generation

import { pipeline } from '@huggingface/transformers';

const generator = await pipeline('text-generation', 'Xenova/gpt2');

const result = await generator('The quick brown fox', {
  max_new_tokens: 50,
});

console.log(result[0].generated_text);

Text Classification (Sentiment)

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'text-classification',
  'Xenova/distilbert-base-uncased-finetuned-sst-2-english'
);

const result = await classifier('I love this product!');
// [{ label: 'POSITIVE', score: 0.9998 }]

Embeddings

import { pipeline } from '@huggingface/transformers';

const embedder = await pipeline(
  'feature-extraction',
  'Xenova/all-MiniLM-L6-v2'
);

const result = await embedder('Hello, world!', {
  pooling: 'mean',
  normalize: true,
});

const embedding = Array.from(result.data);
// [0.123, -0.456, ...] - 384 dimensions

Question Answering

import { pipeline } from '@huggingface/transformers';

const qa = await pipeline(
  'question-answering',
  'Xenova/distilbert-base-cased-distilled-squad'
);

const result = await qa({
  question: 'What is the capital of France?',
  context: 'France is a country in Europe. Paris is the capital of France.',
});

console.log(result);
// { answer: 'Paris', score: 0.98, start: 42, end: 47 }

Translation

import { pipeline } from '@huggingface/transformers';

const translator = await pipeline(
  'translation',
  'Xenova/nllb-200-distilled-600M'
);

const result = await translator('Hello, how are you?', {
  src_lang: 'eng_Latn',
  tgt_lang: 'fra_Latn',
});

console.log(result[0].translation_text);

Speech Recognition (Whisper)

import { pipeline } from '@huggingface/transformers';

const transcriber = await pipeline(
  'automatic-speech-recognition',
  'Xenova/whisper-tiny.en'
);

const result = await transcriber('./audio.mp3');
console.log(result.text);

Image Classification

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'image-classification',
  'Xenova/vit-base-patch16-224'
);

const result = await classifier('https://example.com/cat.jpg');
// [{ label: 'tabby cat', score: 0.95 }, ...]

Object Detection

import { pipeline } from '@huggingface/transformers';

const detector = await pipeline(
  'object-detection',
  'Xenova/detr-resnet-50'
);

const result = await detector('https://example.com/image.jpg');
// [{ label: 'cat', score: 0.98, box: { xmin, ymin, xmax, ymax } }, ...]

Zero-Shot Classification

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'zero-shot-classification',
  'Xenova/bart-large-mnli'
);

const result = await classifier(
  'This is a tutorial about machine learning',
  ['education', 'politics', 'sports']
);

console.log(result);
// { labels: ['education', ...], scores: [0.95, ...] }

Hugging Face Inference API

Call hosted models without local computation.

npm install @huggingface/inference

Setup

import { HfInference } from '@huggingface/inference';

const hf = new HfInference(process.env.HF_ACCESS_TOKEN);

Text Generation

const result = await hf.textGeneration({
  model: 'meta-llama/Llama-2-7b-chat-hf',
  inputs: 'What is the meaning of life?',
  parameters: {
    max_new_tokens: 100,
    temperature: 0.7,
  },
});

console.log(result.generated_text);

Streaming Text Generation

const stream = hf.textGenerationStream({
  model: 'meta-llama/Llama-2-7b-chat-hf',
  inputs: 'Tell me a story',
  parameters: {
    max_new_tokens: 200,
  },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.token.text);
}

Chat Completion

const result = await hf.chatCompletion({
  model: 'meta-llama/Llama-2-7b-chat-hf',
  messages: [
    { role: 'user', content: 'Hello!' },
  ],
  max_tokens: 100,
});

console.log(result.choices[0].message.content);

Embeddings

const result = await hf.featureExtraction({
  model: 'sentence-transformers/all-MiniLM-L6-v2',
  inputs: 'Hello, world!',
});

console.log(result); // embedding vector

Image Generation

const result = await hf.textToImage({
  model: 'stabilityai/stable-diffusion-2',
  inputs: 'A futuristic city at sunset',
  parameters: {
    negative_prompt: 'blurry, low quality',
  },
});

// result is a Blob
const buffer = Buffer.from(await result.arrayBuffer());
fs.writeFileSync('output.png', buffer);

Image Classification

const result = await hf.imageClassification({
  model: 'google/vit-base-patch16-224',
  data: await fs.openAsBlob('cat.jpg'),
});

console.log(result);
// [{ label: 'tabby cat', score: 0.95 }, ...]

Speech Recognition

const result = await hf.automaticSpeechRecognition({
  model: 'openai/whisper-large-v3',
  data: await fs.openAsBlob('audio.mp3'),
});

console.log(result.text);

Inference Endpoints

For dedicated hosted models.

import { InferenceClient } from '@huggingface/inference';

const client = new InferenceClient(process.env.HF_ACCESS_TOKEN);
const endpoint = client.endpoint('https://your-endpoint.endpoints.huggingface.cloud');

const result = await endpoint.textGeneration({
  inputs: 'Hello, world!',
});

Next.js Integration

// app/api/generate/route.ts
import { HfInference } from '@huggingface/inference';
import { NextResponse } from 'next/server';

const hf = new HfInference(process.env.HF_ACCESS_TOKEN);

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const result = await hf.textGeneration({
    model: 'meta-llama/Llama-2-7b-chat-hf',
    inputs: prompt,
    parameters: {
      max_new_tokens: 200,
    },
  });

  return NextResponse.json({ text: result.generated_text });
}

Streaming Response

// app/api/stream/route.ts
import { HfInference } from '@huggingface/inference';

const hf = new HfInference(process.env.HF_ACCESS_TOKEN);

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const stream = hf.textGenerationStream({
    model: 'meta-llama/Llama-2-7b-chat-hf',
    inputs: prompt,
    parameters: { max_new_tokens: 200 },
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        controller.enqueue(encoder.encode(chunk.token.text));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { 'Content-Type': 'text/plain' },
  });
}

Browser Usage

Transformers.js works in the browser with WebGPU acceleration.

<script type="module">
  import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers';

  const classifier = await pipeline('text-classification');
  const result = await classifier('I love this!');
  console.log(result);
</script>

With WebGPU

import { pipeline, env } from '@huggingface/transformers';

// Enable WebGPU
env.backends.onnx.wasm.proxy = true;

const classifier = await pipeline('text-classification', 'model-name', {
  device: 'webgpu',
});

Configuration

import { env } from '@huggingface/transformers';

// Cache settings
env.cacheDir = './models';
env.localModelPath = './local-models';

// Disable remote models (offline mode)
env.allowRemoteModels = false;

// Disable local models
env.allowLocalModels = false;

Available Tasks

Task	Pipeline	Example Model
Text Classification	text-classification	distilbert-base-uncased-finetuned-sst-2-english
Text Generation	text-generation	gpt2, llama
Question Answering	question-answering	distilbert-base-cased-distilled-squad
Summarization	summarization	t5-small
Translation	translation	nllb-200-distilled-600M
Feature Extraction	feature-extraction	all-MiniLM-L6-v2
Image Classification	image-classification	vit-base-patch16-224
Object Detection	object-detection	detr-resnet-50
Speech Recognition	automatic-speech-recognition	whisper-tiny
Zero-Shot Classification	zero-shot-classification	bart-large-mnli

Environment Variables

HF_ACCESS_TOKEN=hf_xxxxxxxx

Best Practices

Cache models - Download once, reuse
Use WebGPU - Faster inference in browsers
Choose small models - For client-side use
Stream responses - Better UX for generation
Use Inference API - For large models
Consider endpoints - For production workloads
Quantized models - Smaller, faster (look for ONNX models)

huggingface-js

Install Skill

SKILL.md

Hugging Face JavaScript

Transformers.js (Local Inference)

Text Generation

Text Classification (Sentiment)

Embeddings

Question Answering

Translation

Speech Recognition (Whisper)

Image Classification

Object Detection

Zero-Shot Classification

Hugging Face Inference API

Setup

Text Generation

Streaming Text Generation

Chat Completion

Embeddings

Image Generation

Image Classification

Speech Recognition

Inference Endpoints

Next.js Integration

Streaming Response

Browser Usage

With WebGPU

Configuration

Available Tasks

Environment Variables

Best Practices