| name | seed-data-generator |
| description | Generate realistic test data for database development, testing, and demos. |
Seed Data Generator Skill
Generate realistic test data for database development, testing, and demos.
Instructions
You are a test data generation expert. When invoked:
Analyze Schema:
- Identify tables and relationships
- Understand column types and constraints
- Detect foreign key dependencies
- Recognize data patterns (email, phone, dates, etc.)
Generate Realistic Data:
- Use faker libraries for realistic data
- Maintain referential integrity
- Follow business logic constraints
- Create diverse but realistic scenarios
Seed Database:
- Insert data in correct order (respect foreign keys)
- Handle different database systems
- Provide both SQL and ORM-based seeders
- Support incremental seeding
Customize Generation:
- Allow quantity specification
- Support different data scenarios (edge cases, happy path)
- Enable data relationships customization
- Provide reproducible seeds (with random seed values)
Supported Tools
- JavaScript/TypeScript: Faker.js, Chance.js, Casual
- Python: Faker, Factory Boy, Mimesis
- Ruby: Faker, FactoryBot
- Raw SQL: Generate INSERT statements
- ORMs: Prisma, TypeORM, Sequelize, Django, Rails
Usage Examples
@seed-data-generator
@seed-data-generator --count 100
@seed-data-generator --table users
@seed-data-generator --scenario e-commerce
@seed-data-generator --realistic-relationships
SQL Seed Data
PostgreSQL - Basic Insert
-- seed/001_users.sql
INSERT INTO users (username, email, password_hash, active, created_at)
VALUES
('john_doe', 'john@example.com', '$2b$10$...', true, '2024-01-15 10:00:00'),
('jane_smith', 'jane@example.com', '$2b$10$...', true, '2024-01-16 11:30:00'),
('bob_wilson', 'bob@example.com', '$2b$10$...', true, '2024-01-17 09:15:00'),
('alice_brown', 'alice@example.com', '$2b$10$...', false, '2024-01-18 14:45:00'),
('charlie_davis', 'charlie@example.com', '$2b$10$...', true, '2024-01-19 16:20:00');
-- seed/002_categories.sql
INSERT INTO categories (name, slug, parent_id)
VALUES
('Electronics', 'electronics', NULL),
('Computers', 'computers', 1),
('Laptops', 'laptops', 2),
('Desktops', 'desktops', 2),
('Accessories', 'accessories', 1),
('Clothing', 'clothing', NULL),
('Men', 'men', 6),
('Women', 'women', 6);
-- seed/003_products.sql
INSERT INTO products (name, description, price, stock_quantity, category_id, created_at)
VALUES
(
'MacBook Pro 16"',
'Powerful laptop with M3 chip, 16GB RAM, 512GB SSD',
2499.99,
15,
3,
NOW()
),
(
'Dell XPS 13',
'Compact laptop with Intel i7, 16GB RAM, 512GB SSD',
1299.99,
20,
3,
NOW()
),
(
'Gaming Desktop',
'High-performance desktop with RTX 4080, 32GB RAM',
2999.99,
8,
4,
NOW()
),
(
'Wireless Mouse',
'Ergonomic wireless mouse with precision tracking',
29.99,
100,
5,
NOW()
),
(
'Mechanical Keyboard',
'RGB mechanical keyboard with Cherry MX switches',
149.99,
45,
5,
NOW()
);
-- seed/004_orders.sql
INSERT INTO orders (user_id, total_amount, status, created_at)
VALUES
(1, 2529.98, 'completed', '2024-01-20 10:30:00'),
(2, 1299.99, 'completed', '2024-01-21 14:15:00'),
(3, 179.98, 'processing', '2024-01-22 09:45:00'),
(1, 2999.99, 'pending', '2024-01-23 16:00:00'),
(4, 29.99, 'completed', '2024-01-24 11:20:00');
-- seed/005_order_items.sql
INSERT INTO order_items (order_id, product_id, quantity, price)
VALUES
-- Order 1
(1, 1, 1, 2499.99),
(1, 4, 1, 29.99),
-- Order 2
(2, 2, 1, 1299.99),
-- Order 3
(3, 4, 1, 29.99),
(3, 5, 1, 149.99),
-- Order 4
(4, 3, 1, 2999.99),
-- Order 5
(5, 4, 1, 29.99);
Generated Seed with Functions
-- Generate random users
CREATE OR REPLACE FUNCTION generate_users(count INTEGER)
RETURNS void AS $$
DECLARE
i INTEGER;
BEGIN
FOR i IN 1..count LOOP
INSERT INTO users (username, email, password_hash, active, created_at)
VALUES (
'user_' || i,
'user' || i || '@example.com',
'$2b$10$fakehashedpassword',
random() > 0.1, -- 90% active
NOW() - (random() * 365 || ' days')::INTERVAL
);
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- Generate random products
CREATE OR REPLACE FUNCTION generate_products(count INTEGER)
RETURNS void AS $$
DECLARE
i INTEGER;
categories INTEGER[];
BEGIN
SELECT ARRAY_AGG(id) INTO categories FROM categories;
FOR i IN 1..count LOOP
INSERT INTO products (name, description, price, stock_quantity, category_id, created_at)
VALUES (
'Product ' || i,
'Description for product ' || i,
(random() * 1000 + 10)::NUMERIC(10,2),
(random() * 100)::INTEGER,
categories[1 + floor(random() * array_length(categories, 1))::INTEGER],
NOW() - (random() * 180 || ' days')::INTERVAL
);
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- Generate random orders
CREATE OR REPLACE FUNCTION generate_orders(count INTEGER)
RETURNS void AS $$
DECLARE
i INTEGER;
user_ids INTEGER[];
product_ids INTEGER[];
order_id INTEGER;
item_count INTEGER;
j INTEGER;
total NUMERIC(10,2);
BEGIN
SELECT ARRAY_AGG(id) INTO user_ids FROM users WHERE active = true;
SELECT ARRAY_AGG(id) INTO product_ids FROM products WHERE stock_quantity > 0;
FOR i IN 1..count LOOP
-- Random number of items (1-5)
item_count := 1 + floor(random() * 5)::INTEGER;
total := 0;
INSERT INTO orders (user_id, total_amount, status, created_at)
VALUES (
user_ids[1 + floor(random() * array_length(user_ids, 1))::INTEGER],
0, -- Will update after adding items
(ARRAY['pending', 'processing', 'completed', 'cancelled'])[1 + floor(random() * 4)::INTEGER],
NOW() - (random() * 90 || ' days')::INTERVAL
)
RETURNING id INTO order_id;
-- Add order items
FOR j IN 1..item_count LOOP
DECLARE
product_price NUMERIC(10,2);
quantity INTEGER;
BEGIN
SELECT price INTO product_price
FROM products
WHERE id = product_ids[1 + floor(random() * array_length(product_ids, 1))::INTEGER];
quantity := 1 + floor(random() * 3)::INTEGER;
INSERT INTO order_items (order_id, product_id, quantity, price)
VALUES (
order_id,
product_ids[1 + floor(random() * array_length(product_ids, 1))::INTEGER],
quantity,
product_price
);
total := total + (product_price * quantity);
END;
END LOOP;
-- Update order total
UPDATE orders SET total_amount = total WHERE id = order_id;
END LOOP;
END;
$$ LANGUAGE plpgsql;
-- Usage
SELECT generate_users(100);
SELECT generate_products(50);
SELECT generate_orders(200);
JavaScript/TypeScript Seeders
Faker.js + Prisma
// prisma/seed.ts
import { PrismaClient } from '@prisma/client';
import { faker } from '@faker-js/faker';
const prisma = new PrismaClient();
async function main() {
console.log('Seeding database...');
// Clear existing data
await prisma.orderItem.deleteMany();
await prisma.order.deleteMany();
await prisma.review.deleteMany();
await prisma.product.deleteMany();
await prisma.category.deleteMany();
await prisma.user.deleteMany();
// Create categories
const categories = await Promise.all([
prisma.category.create({
data: {
name: 'Electronics',
slug: 'electronics',
},
}),
prisma.category.create({
data: {
name: 'Clothing',
slug: 'clothing',
},
}),
prisma.category.create({
data: {
name: 'Home & Garden',
slug: 'home-garden',
},
}),
]);
// Create users
const users = await Promise.all(
Array.from({ length: 20 }, async () => {
const firstName = faker.person.firstName();
const lastName = faker.person.lastName();
return prisma.user.create({
data: {
username: faker.internet.userName({ firstName, lastName }),
email: faker.internet.email({ firstName, lastName }),
passwordHash: faker.string.alphanumeric(60), // In real app, use bcrypt
active: faker.datatype.boolean(0.9), // 90% active
createdAt: faker.date.past({ years: 2 }),
},
});
})
);
console.log(`Created ${users.length} users`);
// Create products
const products = await Promise.all(
Array.from({ length: 50 }, async () => {
return prisma.product.create({
data: {
name: faker.commerce.productName(),
description: faker.commerce.productDescription(),
price: parseFloat(faker.commerce.price({ min: 10, max: 2000 })),
stockQuantity: faker.number.int({ min: 0, max: 100 }),
categoryId: faker.helpers.arrayElement(categories).id,
createdAt: faker.date.past({ years: 1 }),
},
});
})
);
console.log(`Created ${products.length} products`);
// Create orders with items
const activeUsers = users.filter(u => u.active);
for (let i = 0; i < 100; i++) {
const user = faker.helpers.arrayElement(activeUsers);
const orderProducts = faker.helpers.arrayElements(
products,
faker.number.int({ min: 1, max: 5 })
);
const orderItems = orderProducts.map(product => ({
productId: product.id,
quantity: faker.number.int({ min: 1, max: 3 }),
price: product.price,
}));
const totalAmount = orderItems.reduce(
(sum, item) => sum + item.price * item.quantity,
0
);
await prisma.order.create({
data: {
userId: user.id,
totalAmount,
status: faker.helpers.arrayElement([
'pending',
'processing',
'completed',
'cancelled',
]),
createdAt: faker.date.past({ years: 0.5 }),
items: {
create: orderItems,
},
},
});
}
console.log('Created 100 orders with items');
// Create reviews
for (let i = 0; i < 150; i++) {
const user = faker.helpers.arrayElement(activeUsers);
const product = faker.helpers.arrayElement(products);
// Check if user already reviewed this product
const existingReview = await prisma.review.findFirst({
where: {
userId: user.id,
productId: product.id,
},
});
if (!existingReview) {
await prisma.review.create({
data: {
userId: user.id,
productId: product.id,
rating: faker.number.int({ min: 1, max: 5 }),
comment: faker.helpers.maybe(() => faker.lorem.paragraph(), { probability: 0.7 }),
createdAt: faker.date.past({ years: 0.5 }),
},
});
}
}
console.log('Created reviews');
console.log('Seeding completed!');
}
main()
.catch((e) => {
console.error(e);
process.exit(1);
})
.finally(async () => {
await prisma.$disconnect();
});
// package.json
{
"prisma": {
"seed": "ts-node prisma/seed.ts"
}
}
# Run seed
npx prisma db seed
TypeORM Seeder
// src/database/seeds/user.seeder.ts
import { DataSource } from 'typeorm';
import { Seeder } from 'typeorm-extension';
import { faker } from '@faker-js/faker';
import { User } from '../entities/user.entity';
import { Order } from '../entities/order.entity';
import { Product } from '../entities/product.entity';
export default class UserSeeder implements Seeder {
async run(dataSource: DataSource): Promise<void> {
const userRepository = dataSource.getRepository(User);
const productRepository = dataSource.getRepository(Product);
const orderRepository = dataSource.getRepository(Order);
// Create users
const users: User[] = [];
for (let i = 0; i < 50; i++) {
const user = userRepository.create({
username: faker.internet.userName(),
email: faker.internet.email(),
passwordHash: faker.string.alphanumeric(60),
active: faker.datatype.boolean(0.9),
createdAt: faker.date.past({ years: 2 }),
});
users.push(user);
}
await userRepository.save(users);
console.log(`Seeded ${users.length} users`);
// Create products
const products: Product[] = [];
for (let i = 0; i < 100; i++) {
const product = productRepository.create({
name: faker.commerce.productName(),
description: faker.commerce.productDescription(),
price: parseFloat(faker.commerce.price({ min: 10, max: 1000 })),
stockQuantity: faker.number.int({ min: 0, max: 100 }),
createdAt: faker.date.past({ years: 1 }),
});
products.push(product);
}
await productRepository.save(products);
console.log(`Seeded ${products.length} products`);
// Create orders
const activeUsers = users.filter(u => u.active);
const orders: Order[] = [];
for (let i = 0; i < 200; i++) {
const user = faker.helpers.arrayElement(activeUsers);
const orderProducts = faker.helpers.arrayElements(
products,
faker.number.int({ min: 1, max: 5 })
);
const totalAmount = orderProducts.reduce(
(sum, p) => sum + p.price * faker.number.int({ min: 1, max: 3 }),
0
);
const order = orderRepository.create({
user,
totalAmount,
status: faker.helpers.arrayElement([
'pending',
'processing',
'completed',
'cancelled',
]),
createdAt: faker.date.past({ years: 0.5 }),
});
orders.push(order);
}
await orderRepository.save(orders);
console.log(`Seeded ${orders.length} orders`);
}
}
# Run seeder
npm run seed
Python Seeders
Django Fixtures
# management/commands/seed.py
from django.core.management.base import BaseCommand
from faker import Faker
from app.models import User, Product, Category, Order, OrderItem
import random
from decimal import Decimal
class Command(BaseCommand):
help = 'Seed database with sample data'
def add_arguments(self, parser):
parser.add_argument(
'--users',
type=int,
default=50,
help='Number of users to create'
)
parser.add_argument(
'--products',
type=int,
default=100,
help='Number of products to create'
)
parser.add_argument(
'--orders',
type=int,
default=200,
help='Number of orders to create'
)
def handle(self, *args, **options):
fake = Faker()
# Clear existing data
self.stdout.write('Clearing existing data...')
OrderItem.objects.all().delete()
Order.objects.all().delete()
Product.objects.all().delete()
Category.objects.all().delete()
User.objects.all().delete()
# Create categories
self.stdout.write('Creating categories...')
categories = []
category_names = ['Electronics', 'Clothing', 'Home & Garden', 'Books', 'Sports']
for name in category_names:
category = Category.objects.create(
name=name,
slug=name.lower().replace(' ', '-')
)
categories.append(category)
# Create users
self.stdout.write(f'Creating {options["users"]} users...')
users = []
for _ in range(options['users']):
user = User.objects.create(
username=fake.user_name(),
email=fake.email(),
password_hash=fake.sha256(),
active=fake.boolean(chance_of_getting_true=90),
created_at=fake.date_time_this_year()
)
users.append(user)
# Create products
self.stdout.write(f'Creating {options["products"]} products...')
products = []
for _ in range(options['products']):
product = Product.objects.create(
name=fake.catch_phrase(),
description=fake.text(max_nb_chars=200),
price=Decimal(random.uniform(10, 1000)).quantize(Decimal('0.01')),
stock_quantity=random.randint(0, 100),
category=random.choice(categories),
created_at=fake.date_time_this_year()
)
products.append(product)
# Create orders
self.stdout.write(f'Creating {options["orders"]} orders...')
active_users = [u for u in users if u.active]
statuses = ['pending', 'processing', 'completed', 'cancelled']
for _ in range(options['orders']):
user = random.choice(active_users)
order_products = random.sample(products, random.randint(1, 5))
order = Order.objects.create(
user=user,
status=random.choice(statuses),
total_amount=0, # Will calculate
created_at=fake.date_time_this_year()
)
total = Decimal('0.00')
for product in order_products:
quantity = random.randint(1, 3)
OrderItem.objects.create(
order=order,
product=product,
quantity=quantity,
price=product.price
)
total += product.price * quantity
order.total_amount = total
order.save()
self.stdout.write(self.style.SUCCESS('Successfully seeded database!'))
# Run seeder
python manage.py seed --users 100 --products 200 --orders 500
Factory Boy (Python)
# factories.py
import factory
from factory.django import DjangoModelFactory
from faker import Faker
from app.models import User, Product, Category, Order, OrderItem
fake = Faker()
class CategoryFactory(DjangoModelFactory):
class Meta:
model = Category
name = factory.Faker('word')
slug = factory.LazyAttribute(lambda obj: obj.name.lower())
class UserFactory(DjangoModelFactory):
class Meta:
model = User
username = factory.Faker('user_name')
email = factory.Faker('email')
password_hash = factory.Faker('sha256')
active = factory.Faker('boolean', chance_of_getting_true=90)
created_at = factory.Faker('date_time_this_year')
class ProductFactory(DjangoModelFactory):
class Meta:
model = Product
name = factory.Faker('catch_phrase')
description = factory.Faker('text', max_nb_chars=200)
price = factory.Faker('pydecimal', left_digits=4, right_digits=2, positive=True, min_value=10, max_value=1000)
stock_quantity = factory.Faker('random_int', min=0, max=100)
category = factory.SubFactory(CategoryFactory)
created_at = factory.Faker('date_time_this_year')
class OrderFactory(DjangoModelFactory):
class Meta:
model = Order
user = factory.SubFactory(UserFactory, active=True)
status = factory.Faker('random_element', elements=['pending', 'processing', 'completed', 'cancelled'])
total_amount = factory.Faker('pydecimal', left_digits=5, right_digits=2, positive=True)
created_at = factory.Faker('date_time_this_year')
class OrderItemFactory(DjangoModelFactory):
class Meta:
model = OrderItem
order = factory.SubFactory(OrderFactory)
product = factory.SubFactory(ProductFactory)
quantity = factory.Faker('random_int', min=1, max=5)
price = factory.LazyAttribute(lambda obj: obj.product.price)
# seed_db.py
from factories import UserFactory, ProductFactory, CategoryFactory, OrderFactory, OrderItemFactory
# Create data
categories = CategoryFactory.create_batch(5)
users = UserFactory.create_batch(50)
products = ProductFactory.create_batch(100)
orders = OrderFactory.create_batch(200)
print(f'Created {len(users)} users, {len(products)} products, {len(orders)} orders')
Realistic Data Scenarios
E-Commerce Platform
// Realistic e-commerce scenario
import { faker } from '@faker-js/faker';
// Create realistic product catalog
const productCategories = {
electronics: [
{ name: 'MacBook Pro 16"', price: 2499, stock: 15 },
{ name: 'iPhone 15 Pro', price: 999, stock: 50 },
{ name: 'iPad Air', price: 599, stock: 30 },
{ name: 'AirPods Pro', price: 249, stock: 100 },
],
clothing: [
{ name: 'Men\'s T-Shirt', price: 29.99, stock: 200 },
{ name: 'Women\'s Jeans', price: 79.99, stock: 150 },
{ name: 'Sneakers', price: 89.99, stock: 80 },
{ name: 'Winter Jacket', price: 149.99, stock: 60 },
],
};
// Create realistic user personas
const userPersonas = [
{
type: 'frequent_buyer',
orderFrequency: 'high',
avgOrderValue: 200,
preferredCategories: ['electronics'],
},
{
type: 'occasional_shopper',
orderFrequency: 'medium',
avgOrderValue: 100,
preferredCategories: ['clothing'],
},
{
type: 'bargain_hunter',
orderFrequency: 'low',
avgOrderValue: 50,
preferredCategories: ['clothing', 'home'],
},
];
// Create orders matching user behavior
function generateRealisticOrders(user, persona) {
const orderCount = {
high: faker.number.int({ min: 10, max: 50 }),
medium: faker.number.int({ min: 3, max: 10 }),
low: faker.number.int({ min: 1, max: 3 }),
}[persona.orderFrequency];
const orders = [];
for (let i = 0; i < orderCount; i++) {
const products = getProductsFromCategories(persona.preferredCategories);
const total = calculateTotal(products, persona.avgOrderValue);
orders.push({
userId: user.id,
products,
totalAmount: total,
status: getRealisticStatus(),
createdAt: faker.date.past({ years: 1 }),
});
}
return orders;
}
function getRealisticStatus() {
// 80% completed, 10% processing, 5% pending, 5% cancelled
const rand = Math.random();
if (rand < 0.8) return 'completed';
if (rand < 0.9) return 'processing';
if (rand < 0.95) return 'pending';
return 'cancelled';
}
SaaS Multi-Tenant
# Realistic SaaS data
from faker import Faker
import random
from datetime import datetime, timedelta
fake = Faker()
# Organization sizes (realistic distribution)
ORG_SIZES = {
'startup': {'users': (1, 5), 'projects': (1, 3), 'weight': 0.5},
'small': {'users': (5, 20), 'projects': (3, 10), 'weight': 0.3},
'medium': {'users': (20, 100), 'projects': (10, 50), 'weight': 0.15},
'enterprise': {'users': (100, 500), 'projects': (50, 200), 'weight': 0.05},
}
def create_organization():
org_type = random.choices(
list(ORG_SIZES.keys()),
weights=[v['weight'] for v in ORG_SIZES.values()]
)[0]
config = ORG_SIZES[org_type]
org = Organization.objects.create(
name=fake.company(),
slug=fake.slug(),
plan=org_type,
created_at=fake.date_time_between(start_date='-2y', end_date='now')
)
# Create users for org
user_count = random.randint(*config['users'])
users = []
for i in range(user_count):
role = 'admin' if i == 0 else random.choice(['member', 'member', 'member', 'viewer'])
user = User.objects.create(
organization=org,
email=fake.email(),
name=fake.name(),
role=role,
created_at=org.created_at + timedelta(days=random.randint(0, 30))
)
users.append(user)
# Create projects
project_count = random.randint(*config['projects'])
for _ in range(project_count):
Project.objects.create(
organization=org,
owner=random.choice(users),
name=fake.catch_phrase(),
description=fake.text(),
status=random.choice(['active', 'active', 'active', 'archived']),
created_at=org.created_at + timedelta(days=random.randint(0, 60))
)
return org
Best Practices
DO ✓
- Use realistic data - Names, emails, addresses should look real
- Maintain referential integrity - Respect foreign key constraints
- Create diverse scenarios - Include edge cases, not just happy path
- Set random seed for reproducibility -
faker.seed(12345) - Clear data before seeding - Prevent duplicate key errors
- Seed in correct order - Parents before children
- Use transactions - Rollback on error
- Add meaningful relationships - Realistic user-order patterns
- Include timestamps - Spread across realistic time periods
- Validate data - Ensure generated data meets constraints
DON'T ✗
- Don't use same password for all users - Vary or use faker
- Don't ignore constraints - Check NOT NULL, UNIQUE, CHECK
- Don't create unrealistic data - No 200-year-old users
- Don't hardcode IDs - Use auto-generated values
- Don't seed production - Only development/staging
- Don't create too much data - Start small, increase as needed
- Don't ignore performance - Use batch inserts for large datasets
- Don't forget to cleanup - Delete test data when done
Performance Tips
Batch Inserts
// Instead of individual creates
for (const data of userData) {
await prisma.user.create({ data });
}
// Use createMany
await prisma.user.createMany({
data: userData,
skipDuplicates: true,
});
Transaction for Multiple Tables
await prisma.$transaction(async (tx) => {
const users = await tx.user.createMany({ data: userData });
const products = await tx.product.createMany({ data: productData });
const orders = await tx.order.createMany({ data: orderData });
});
Notes
- Always seed development and staging, never production
- Use environment variables to control seed behavior
- Include seed data in version control for consistency
- Document seed scenarios for team understanding
- Consider using database snapshots for faster resets
- Test application with seeded data regularly
- Keep seed scripts up to date with schema changes