| name | database-data-generation |
| description | Generate realistic database seed data and test fixtures for development, testing, and demonstrations. Creates realistic users, products, orders, and custom schemas using Faker libraries while maintaining relational integrity and data consistency. Use when populating databases, creating test fixtures, seeding development environments, or generating demo data. |
| allowed-tools | Read, Write, Edit, Grep, Glob, Bash |
| version | 1.0.0 |
Database Data Generation
Automate creation of realistic seed data and test fixtures for databases, ensuring relational integrity and data consistency across development and testing environments.
Overview
This skill empowers you to:
- Generate seed data: Create production-like data for development databases
- Create test fixtures: Generate diverse test data for automated testing
- Maintain relationships: Ensure foreign key integrity across related tables
- Customize volume: Control the amount of data generated
- Support multiple formats: Export as SQL, JSON, JavaScript, Python, or framework-specific formats
Part 1: Database Seeding
How Seeding Works
- Analyze Schema: Examine database structure, tables, columns, data types, and relationships
- Generate Realistic Data: Use Faker libraries to create believable data matching your schema
- Maintain Relationships: Ensure foreign keys link correctly between related tables
- Create Seed Scripts: Generate executable scripts (SQL, migration files, ORM seeds)
When to Use Database Seeding
- Populate development databases with realistic data
- Create initial data for new application installations
- Reset database to known state for testing
- Demonstrate application features with pre-populated data
- Train new team members with realistic examples
Seeding Examples
Example 1: Simple User Seeding (SQL)
User request: "Create a seed script to populate my users table with 50 realistic users."
Generated SQL:
-- Users seed data
INSERT INTO users (name, email, password, created_at) VALUES
('John Doe', 'john.doe@example.com', '$2y$10$...', '2025-01-15 10:23:45'),
('Jane Smith', 'jane.smith@example.com', '$2y$10$...', '2025-01-16 14:12:30'),
('Michael Johnson', 'michael.j@example.com', '$2y$10$...', '2025-01-17 09:45:10'),
-- ... 47 more users
('Sarah Williams', 'sarah.w@example.com', '$2y$10$...', '2025-02-10 16:30:22');
Example 2: E-Commerce Database Seeding (Laravel)
User request: "Generate test data for my e-commerce database, including users, products, categories, and orders."
Generated Seeder (Laravel):
<?php
namespace Database\Seeders;
use Illuminate\Database\Seeder;
use App\Models\User;
use App\Models\Category;
use App\Models\Product;
use App\Models\Order;
class EcommerceSeeder extends Seeder
{
public function run(): void
{
// Create categories first (no dependencies)
$categories = [
Category::create(['name' => 'Electronics', 'slug' => 'electronics']),
Category::create(['name' => 'Clothing', 'slug' => 'clothing']),
Category::create(['name' => 'Home & Garden', 'slug' => 'home-garden']),
];
// Create users (no dependencies)
$users = User::factory(50)->create();
// Create products (depends on categories)
$products = [];
foreach ($categories as $category) {
$products = array_merge(
$products,
Product::factory(20)
->for($category)
->create()
->toArray()
);
}
// Create orders (depends on users and products)
foreach ($users as $user) {
// Each user has 1-5 orders
$orderCount = rand(1, 5);
for ($i = 0; $i < $orderCount; $i++) {
$order = Order::factory()->for($user)->create();
// Attach 1-4 random products to each order
$orderProducts = collect($products)
->random(rand(1, 4))
->pluck('id');
$order->products()->attach($orderProducts);
}
}
}
}
Example 3: Blog Database with Relationships (TypeScript/Prisma)
User request: "Seed my blog database with users, posts, comments, and tags."
Generated Seeder (Prisma):
import { PrismaClient } from '@prisma/client';
import { faker } from '@faker-js/faker';
const prisma = new PrismaClient();
async function main() {
// Create users
const users = await Promise.all(
Array.from({ length: 20 }, async () => {
return prisma.user.create({
data: {
name: faker.person.fullName(),
email: faker.internet.email(),
bio: faker.lorem.paragraph(),
avatar: faker.image.avatar(),
},
});
})
);
// Create tags
const tags = await Promise.all(
['JavaScript', 'Python', 'React', 'Node.js', 'TypeScript'].map(name =>
prisma.tag.create({ data: { name, slug: name.toLowerCase() } })
)
);
// Create posts with comments
for (const user of users) {
const postCount = faker.number.int({ min: 2, max: 10 });
for (let i = 0; i < postCount; i++) {
const post = await prisma.post.create({
data: {
title: faker.lorem.sentence(),
slug: faker.helpers.slugify(faker.lorem.words(5)),
content: faker.lorem.paragraphs(5),
published: faker.datatype.boolean(0.8),
userId: user.id,
tags: {
connect: faker.helpers.arrayElements(tags, { min: 1, max: 3 })
.map(tag => ({ id: tag.id })),
},
},
});
// Add comments to post
const commentCount = faker.number.int({ min: 0, max: 8 });
for (let j = 0; j < commentCount; j++) {
await prisma.comment.create({
data: {
content: faker.lorem.paragraph(),
postId: post.id,
userId: faker.helpers.arrayElement(users).id,
},
});
}
}
}
console.log(`✅ Seeded ${users.length} users with posts and comments`);
}
main()
.catch((e) => {
console.error(e);
process.exit(1);
})
.finally(async () => {
await prisma.$disconnect();
});
Part 2: Test Data Generation
How Test Data Generation Works
- Identify Requirements: Determine data types, volumes, and specific attributes needed
- Generate Diverse Data: Create varied test cases including edge cases
- Format Output: Structure data for easy consumption (JSON, CSV, SQL, fixtures)
- Support Custom Schemas: Generate data matching your specific data models
When to Use Test Data Generation
- Populate test databases for automated testing
- Create fixtures for unit/integration tests
- Generate load testing data
- Simulate production-like scenarios
- Test edge cases and boundary conditions
- Create demo datasets for presentations
Test Data Examples
Example 1: User Test Fixtures (JSON)
User request: "Generate 100 test users with realistic names, emails, and addresses in JSON format."
Generated JSON:
[
{
"id": 1,
"name": "Alice Johnson",
"email": "alice.johnson@example.com",
"phone": "+1-555-0123",
"address": {
"street": "123 Main St",
"city": "New York",
"state": "NY",
"zipCode": "10001",
"country": "USA"
},
"dateOfBirth": "1985-03-15",
"createdAt": "2025-01-10T08:30:00Z"
},
{
"id": 2,
"name": "Bob Martinez",
"email": "bob.martinez@example.com",
"phone": "+1-555-0456",
"address": {
"street": "456 Oak Ave",
"city": "Los Angeles",
"state": "CA",
"zipCode": "90001",
"country": "USA"
},
"dateOfBirth": "1992-07-22",
"createdAt": "2025-01-11T14:15:00Z"
}
// ... 98 more users
]
Example 2: Product Test Data (Python Pytest Fixtures)
User request: "Create product test data including name, description, price, and category for automated tests."
Generated Pytest Fixtures:
import pytest
from faker import Faker
from decimal import Decimal
fake = Faker()
@pytest.fixture
def sample_products():
"""Generate sample products for testing"""
categories = ['Electronics', 'Clothing', 'Books', 'Home & Garden', 'Sports']
return [
{
'id': i + 1,
'name': fake.catch_phrase(),
'description': fake.text(max_nb_chars=200),
'price': Decimal(fake.pydecimal(left_digits=3, right_digits=2, positive=True)),
'category': fake.random_element(elements=categories),
'sku': fake.bothify(text='??-####'),
'stock': fake.random_int(min=0, max=500),
'is_active': fake.boolean(chance_of_getting_true=85),
'created_at': fake.date_time_this_year(),
}
for i in range(100)
]
@pytest.fixture
def product_edge_cases():
"""Edge cases for product testing"""
return [
{'id': 1, 'name': '', 'price': Decimal('0.00')}, # Empty name, zero price
{'id': 2, 'name': 'A' * 255, 'price': Decimal('999999.99')}, # Max length, max price
{'id': 3, 'name': 'Test', 'price': Decimal('0.01')}, # Minimum price
{'id': 4, 'name': None, 'price': None}, # Null values
]
def test_product_creation(sample_products):
"""Test product creation with realistic data"""
for product in sample_products:
# Your test logic here
assert product['price'] > 0
assert len(product['name']) > 0
Example 3: Order Transaction Test Data (Custom Schema)
User request: "Generate order test data with payment info, shipping addresses, and line items."
Generated Test Data:
// testData/orders.js
const { faker } = require('@faker-js/faker');
function generateOrders(count = 50) {
const orders = [];
for (let i = 0; i < count; i++) {
const lineItems = [];
const itemCount = faker.number.int({ min: 1, max: 5 });
for (let j = 0; j < itemCount; j++) {
const quantity = faker.number.int({ min: 1, max: 3 });
const price = faker.number.float({ min: 10, max: 500, precision: 0.01 });
lineItems.push({
productId: faker.string.uuid(),
productName: faker.commerce.productName(),
quantity,
price,
subtotal: quantity * price,
});
}
const subtotal = lineItems.reduce((sum, item) => sum + item.subtotal, 0);
const tax = subtotal * 0.08;
const shipping = faker.number.float({ min: 5, max: 20, precision: 0.01 });
const total = subtotal + tax + shipping;
orders.push({
orderId: faker.string.uuid(),
orderNumber: faker.string.alphanumeric(8).toUpperCase(),
customerId: faker.string.uuid(),
status: faker.helpers.arrayElement(['pending', 'processing', 'shipped', 'delivered', 'cancelled']),
// Line items
items: lineItems,
// Pricing
subtotal,
tax,
shipping,
total,
// Shipping address
shippingAddress: {
name: faker.person.fullName(),
street: faker.location.streetAddress(),
city: faker.location.city(),
state: faker.location.state(),
zipCode: faker.location.zipCode(),
country: 'USA',
},
// Payment info (sanitized for testing)
payment: {
method: faker.helpers.arrayElement(['credit_card', 'paypal', 'bank_transfer']),
lastFourDigits: faker.string.numeric(4),
cardType: faker.helpers.arrayElement(['Visa', 'Mastercard', 'Amex']),
},
// Timestamps
createdAt: faker.date.recent({ days: 30 }),
updatedAt: faker.date.recent({ days: 7 }),
});
}
return orders;
}
module.exports = { generateOrders };
Supported Data Types & Faker Methods
Personal Information
faker.person.fullName() // "John Doe"
faker.person.firstName() // "Jane"
faker.person.lastName() // "Smith"
faker.internet.email() // "john.doe@example.com"
faker.phone.number() // "+1-555-0123"
faker.date.birthdate() // 1985-03-15
Location Data
faker.location.streetAddress() // "123 Main St"
faker.location.city() // "New York"
faker.location.state() // "California"
faker.location.zipCode() // "90210"
faker.location.country() // "United States"
Business Data
faker.company.name() // "Acme Corporation"
faker.commerce.productName() // "Handcrafted Steel Shoes"
faker.commerce.department() // "Electronics"
faker.commerce.price() // "123.45"
faker.finance.creditCardNumber()// "4532-1234-5678-9010"
Internet & Tech
faker.internet.url() // "https://example.com"
faker.internet.userName() // "john_doe_123"
faker.internet.password() // "aB3$xYz9!"
faker.internet.ip() // "192.168.1.1"
faker.string.uuid() // "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
Text Content
faker.lorem.sentence() // "Lorem ipsum dolor sit amet."
faker.lorem.paragraph() // Full paragraph
faker.lorem.words(5) // "lorem ipsum dolor sit amet"
faker.hacker.phrase() // "Try to compress the RAM circuit!"
Best Practices
Data Volume
- ✅ Start small (10-50 records) and scale up gradually
- ✅ Consider database performance limits
- ✅ Use pagination for large datasets
- ❌ Don't generate millions of records without need
Data Consistency
- ✅ Use appropriate Faker methods for data types
- ✅ Respect database constraints (NOT NULL, UNIQUE)
- ✅ Match data formats (dates, phone numbers, currencies)
- ✅ Use consistent locale settings
- ❌ Don't mix data formats (e.g., US vs EU dates)
Relationship Integrity
- ✅ Create parent records before child records
- ✅ Use valid foreign key references
- ✅ Test referential integrity constraints
- ❌ Don't create orphan records
Idempotency
- ✅ Design seeds to run multiple times safely
- ✅ Clear existing data before seeding
- ✅ Use transactions for atomic operations
- ✅ Handle unique constraint violations
- ❌ Don't assume empty database
Testing Focus
- ✅ Include edge cases (empty, null, max values)
- ✅ Generate diverse data for comprehensive testing
- ✅ Use seed values for reproducible tests
- ✅ Separate test data from seed data
- ❌ Don't rely solely on "happy path" data
Framework-Specific Examples
Laravel Seeders
// database/seeders/UserSeeder.php
class UserSeeder extends Seeder {
public function run(): void {
User::factory(100)->create();
}
}
Django Fixtures
# management/commands/seed.py
from django.core.management.base import BaseCommand
from myapp.factories import UserFactory
class Command(BaseCommand):
def handle(self, *args, **options):
UserFactory.create_batch(100)
Rails Seeds
# db/seeds.rb
100.times do
User.create!(
name: Faker::Name.name,
email: Faker::Internet.email
)
end
Prisma Seeds
// prisma/seed.ts
import { PrismaClient } from '@prisma/client'
const prisma = new PrismaClient()
async function main() {
await prisma.user.createMany({
data: users
})
}
Integration with Other Tools
- Testing Frameworks: Jest, Pytest, PHPUnit, Mocha
- ORMs: Prisma, TypeORM, Sequelize, Eloquent, Django ORM
- Migration Tools: Flyway, Liquibase, Alembic
- CI/CD Pipelines: Automate seeding in test environments
- Load Testing: Generate large datasets for performance testing
Remember: Good test data and seeds make development and testing faster, more reliable, and closer to production scenarios.