| name | huggingface-js |
| description | Runs ML models in the browser and Node.js with Transformers.js and Hugging Face Inference API. Use when adding local inference, embeddings, or calling hosted models without GPU servers. |
Hugging Face JavaScript
Run ML models locally with Transformers.js or via the Inference API. Supports text generation, embeddings, image classification, speech recognition, and more.
Transformers.js (Local Inference)
Run models directly in browser or Node.js using ONNX Runtime.
npm install @huggingface/transformers
Text Generation
import { pipeline } from '@huggingface/transformers';
const generator = await pipeline('text-generation', 'Xenova/gpt2');
const result = await generator('The quick brown fox', {
max_new_tokens: 50,
});
console.log(result[0].generated_text);
Text Classification (Sentiment)
import { pipeline } from '@huggingface/transformers';
const classifier = await pipeline(
'text-classification',
'Xenova/distilbert-base-uncased-finetuned-sst-2-english'
);
const result = await classifier('I love this product!');
// [{ label: 'POSITIVE', score: 0.9998 }]
Embeddings
import { pipeline } from '@huggingface/transformers';
const embedder = await pipeline(
'feature-extraction',
'Xenova/all-MiniLM-L6-v2'
);
const result = await embedder('Hello, world!', {
pooling: 'mean',
normalize: true,
});
const embedding = Array.from(result.data);
// [0.123, -0.456, ...] - 384 dimensions
Question Answering
import { pipeline } from '@huggingface/transformers';
const qa = await pipeline(
'question-answering',
'Xenova/distilbert-base-cased-distilled-squad'
);
const result = await qa({
question: 'What is the capital of France?',
context: 'France is a country in Europe. Paris is the capital of France.',
});
console.log(result);
// { answer: 'Paris', score: 0.98, start: 42, end: 47 }
Translation
import { pipeline } from '@huggingface/transformers';
const translator = await pipeline(
'translation',
'Xenova/nllb-200-distilled-600M'
);
const result = await translator('Hello, how are you?', {
src_lang: 'eng_Latn',
tgt_lang: 'fra_Latn',
});
console.log(result[0].translation_text);
Speech Recognition (Whisper)
import { pipeline } from '@huggingface/transformers';
const transcriber = await pipeline(
'automatic-speech-recognition',
'Xenova/whisper-tiny.en'
);
const result = await transcriber('./audio.mp3');
console.log(result.text);
Image Classification
import { pipeline } from '@huggingface/transformers';
const classifier = await pipeline(
'image-classification',
'Xenova/vit-base-patch16-224'
);
const result = await classifier('https://example.com/cat.jpg');
// [{ label: 'tabby cat', score: 0.95 }, ...]
Object Detection
import { pipeline } from '@huggingface/transformers';
const detector = await pipeline(
'object-detection',
'Xenova/detr-resnet-50'
);
const result = await detector('https://example.com/image.jpg');
// [{ label: 'cat', score: 0.98, box: { xmin, ymin, xmax, ymax } }, ...]
Zero-Shot Classification
import { pipeline } from '@huggingface/transformers';
const classifier = await pipeline(
'zero-shot-classification',
'Xenova/bart-large-mnli'
);
const result = await classifier(
'This is a tutorial about machine learning',
['education', 'politics', 'sports']
);
console.log(result);
// { labels: ['education', ...], scores: [0.95, ...] }
Hugging Face Inference API
Call hosted models without local computation.
npm install @huggingface/inference
Setup
import { HfInference } from '@huggingface/inference';
const hf = new HfInference(process.env.HF_ACCESS_TOKEN);
Text Generation
const result = await hf.textGeneration({
model: 'meta-llama/Llama-2-7b-chat-hf',
inputs: 'What is the meaning of life?',
parameters: {
max_new_tokens: 100,
temperature: 0.7,
},
});
console.log(result.generated_text);
Streaming Text Generation
const stream = hf.textGenerationStream({
model: 'meta-llama/Llama-2-7b-chat-hf',
inputs: 'Tell me a story',
parameters: {
max_new_tokens: 200,
},
});
for await (const chunk of stream) {
process.stdout.write(chunk.token.text);
}
Chat Completion
const result = await hf.chatCompletion({
model: 'meta-llama/Llama-2-7b-chat-hf',
messages: [
{ role: 'user', content: 'Hello!' },
],
max_tokens: 100,
});
console.log(result.choices[0].message.content);
Embeddings
const result = await hf.featureExtraction({
model: 'sentence-transformers/all-MiniLM-L6-v2',
inputs: 'Hello, world!',
});
console.log(result); // embedding vector
Image Generation
const result = await hf.textToImage({
model: 'stabilityai/stable-diffusion-2',
inputs: 'A futuristic city at sunset',
parameters: {
negative_prompt: 'blurry, low quality',
},
});
// result is a Blob
const buffer = Buffer.from(await result.arrayBuffer());
fs.writeFileSync('output.png', buffer);
Image Classification
const result = await hf.imageClassification({
model: 'google/vit-base-patch16-224',
data: await fs.openAsBlob('cat.jpg'),
});
console.log(result);
// [{ label: 'tabby cat', score: 0.95 }, ...]
Speech Recognition
const result = await hf.automaticSpeechRecognition({
model: 'openai/whisper-large-v3',
data: await fs.openAsBlob('audio.mp3'),
});
console.log(result.text);
Inference Endpoints
For dedicated hosted models.
import { InferenceClient } from '@huggingface/inference';
const client = new InferenceClient(process.env.HF_ACCESS_TOKEN);
const endpoint = client.endpoint('https://your-endpoint.endpoints.huggingface.cloud');
const result = await endpoint.textGeneration({
inputs: 'Hello, world!',
});
Next.js Integration
// app/api/generate/route.ts
import { HfInference } from '@huggingface/inference';
import { NextResponse } from 'next/server';
const hf = new HfInference(process.env.HF_ACCESS_TOKEN);
export async function POST(request: Request) {
const { prompt } = await request.json();
const result = await hf.textGeneration({
model: 'meta-llama/Llama-2-7b-chat-hf',
inputs: prompt,
parameters: {
max_new_tokens: 200,
},
});
return NextResponse.json({ text: result.generated_text });
}
Streaming Response
// app/api/stream/route.ts
import { HfInference } from '@huggingface/inference';
const hf = new HfInference(process.env.HF_ACCESS_TOKEN);
export async function POST(request: Request) {
const { prompt } = await request.json();
const stream = hf.textGenerationStream({
model: 'meta-llama/Llama-2-7b-chat-hf',
inputs: prompt,
parameters: { max_new_tokens: 200 },
});
const encoder = new TextEncoder();
const readable = new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
controller.enqueue(encoder.encode(chunk.token.text));
}
controller.close();
},
});
return new Response(readable, {
headers: { 'Content-Type': 'text/plain' },
});
}
Browser Usage
Transformers.js works in the browser with WebGPU acceleration.
<script type="module">
import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers';
const classifier = await pipeline('text-classification');
const result = await classifier('I love this!');
console.log(result);
</script>
With WebGPU
import { pipeline, env } from '@huggingface/transformers';
// Enable WebGPU
env.backends.onnx.wasm.proxy = true;
const classifier = await pipeline('text-classification', 'model-name', {
device: 'webgpu',
});
Configuration
import { env } from '@huggingface/transformers';
// Cache settings
env.cacheDir = './models';
env.localModelPath = './local-models';
// Disable remote models (offline mode)
env.allowRemoteModels = false;
// Disable local models
env.allowLocalModels = false;
Available Tasks
| Task | Pipeline | Example Model |
|---|---|---|
| Text Classification | text-classification | distilbert-base-uncased-finetuned-sst-2-english |
| Text Generation | text-generation | gpt2, llama |
| Question Answering | question-answering | distilbert-base-cased-distilled-squad |
| Summarization | summarization | t5-small |
| Translation | translation | nllb-200-distilled-600M |
| Feature Extraction | feature-extraction | all-MiniLM-L6-v2 |
| Image Classification | image-classification | vit-base-patch16-224 |
| Object Detection | object-detection | detr-resnet-50 |
| Speech Recognition | automatic-speech-recognition | whisper-tiny |
| Zero-Shot Classification | zero-shot-classification | bart-large-mnli |
Environment Variables
HF_ACCESS_TOKEN=hf_xxxxxxxx
Best Practices
- Cache models - Download once, reuse
- Use WebGPU - Faster inference in browsers
- Choose small models - For client-side use
- Stream responses - Better UX for generation
- Use Inference API - For large models
- Consider endpoints - For production workloads
- Quantized models - Smaller, faster (look for ONNX models)