name	data-augmentation-strategies
description	Data augmentation - techniques (vision, NLP, audio), strength tuning, validation safety

Data Augmentation Strategies

Overview

Data augmentation artificially increases training data diversity by applying transformations that preserve labels. This is one of the most cost-effective ways to improve model robustness and reduce overfitting, but it requires domain knowledge and careful strength tuning.

Core Principle: Augmentation is NOT a universal technique. The right augmentations depend on your domain, task, data distribution, and model capacity. Wrong augmentations can hurt more than help.

Critical Rule: Augment ONLY training data. Validation and test data must remain unaugmented to provide accurate performance estimates.

Why Augmentation Matters:

Creates label-preserving variations, teaching invariance
Reduces overfitting by preventing memorization
Improves robustness to distribution shift
Essentially "free" data—no labeling cost
Can outperform adding more labeled data in some domains

When to Use This Skill

Load this skill when:

Training on limited dataset (< 10,000 examples) and seeing overfitting
Addressing distribution shift or robustness concerns
Selecting augmentations for vision, NLP, audio, or tabular tasks
Designing augmentation pipelines and strength tuning
Troubleshooting training issues (accuracy drop with augmentation)
Implementing test-time augmentation (TTA) or augmentation policies
Choosing between weak augmentation (100% prob) vs strong (lower prob)

Don't use for: General training debugging (use using-training-optimization), optimization algorithm selection (use optimization-algorithms), regularization without domain context (augmentation is domain-specific)

Part 1: Augmentation Decision Framework

The Core Question: "When should I augment?"

WRONG ANSWER: "Use augmentation for all datasets."

RIGHT APPROACH: Use this decision framework.

Clarifying Questions

"How much training data do you have?"
- < 1,000 examples → Strong augmentation needed
- 1,000-10,000 examples → Medium augmentation
- 10,000-100,000 examples → Light augmentation often sufficient
- 100,000 examples → Augmentation helps but not critical
- Rule: Smaller dataset = more aggressive augmentation
"What's your train/validation accuracy gap?"
- Train 90%, val 70% (20% gap) → Overfitting, augmentation will help
- Train 85%, val 83% (2% gap) → Well-regularized, augmentation optional
- Train 60%, val 58% (2% gap) → Underfitting, augmentation won't help (need more capacity)
- Rule: Large gap indicates augmentation will help
"How much distribution shift is expected at test time?"
- Same domain, clean images → Light augmentation (rotation ±15°, crop 90%, brightness ±10%)
- Real-world conditions → Medium augmentation (rotation ±30°, crop 75%, brightness ±20%)
- Extreme conditions (weather, blur) → Strong augmentation + robust architectures
- Rule: Augment for expected shift, not beyond
"What's your domain?"
- Vision → Rich augmentation toolkit available
- NLP → Limited augmentations (preserve syntax/semantics)
- Audio → Time/frequency domain transforms
- Tabular → SMOTE, feature dropout, noise injection
- Rule: Domain determines augmentation types
"Do you have compute budget for increased training time?"
- Yes → Stronger augmentation possible
- No → Lighter augmentation to save training time
- Rule: Online augmentation adds ~10-20% training time

Decision Tree

START: Should I augment?

├─ Is your training data < 10,000 examples?
│  ├─ YES → Augmentation will likely help. Go to Part 2 (domain selection).
│  │
│  └─ NO → Check train/validation gap...

├─ Is your train-validation accuracy gap > 10%?
│  ├─ YES → Augmentation will likely help. Go to Part 2.
│  │
│  └─ NO → Continue...

├─ Are you in a domain where distribution shift is expected?
│  │  (medical imaging varies by scanner, autonomous driving weather varies,
│  │   satellite imagery has seasonal changes, etc.)
│  ├─ YES → Augmentation will help. Go to Part 2.
│  │
│  └─ NO → Continue...

├─ Do you have compute budget for 10-20% extra training time?
│  ├─ YES, but data is ample → Optional: light augmentation helps margins
│  │        May improve generalization even with large data.
│  │
│  └─ NO → Skip augmentation or use very light augmentation.

└─ DEFAULT: Apply light-to-medium augmentation for target domain.
   Start with conservative parameters.
   Measure impact before increasing strength.

Part 2: Domain-Specific Augmentation Catalogs

Vision Augmentations (Image Classification, Detection, Segmentation)

Key Principle: Preserve semantic content while varying appearance and geometry.

Geometric Transforms (Preserve Class)

Rotation:

from torchvision import transforms
transform = transforms.RandomRotation(degrees=15)
# ±15° for most tasks (natural objects rotate ±15°)
# ±30° for synthetic/manufactured objects
# ±45° for symmetric objects (digits, logos)
# Avoid: ±180° (completely unrecognizable)

When to use: All vision tasks. Rotation-invariance is common.

Strength tuning:

Light: ±5° to ±15° (most conservative)
Medium: ±15° to ±30°
Strong: ±30° to ±45° (only for symmetric classes)
Never: ±180° (makes label ambiguous)

Domain exceptions:

Medical imaging: ±10° maximum (anatomy is not rotation-invariant)
Satellite: ±5° maximum (geographic north is meaningful)
Handwriting: ±15° okay (natural variation)
OCR: ±10° maximum (upside-down is different class)

Crop (Random Crop + Resize):

transform = transforms.RandomResizedCrop(224, scale=(0.8, 1.0))
# Crops 80-100% of original, resizes to 224x224
# Teaches invariance to framing and zoom

When to use: Classification, detection (with care), segmentation.

Strength tuning:

Light: scale=(0.9, 1.0) - crop 90-100%
Medium: scale=(0.8, 1.0) - crop 80-100%
Strong: scale=(0.5, 1.0) - crop 50-100% (can lose important features)

Domain considerations:

Detection: Minimum scale should keep objects ≥50px
Segmentation: Crops must preserve mask validity
Medical: Center-biased crops (avoid cutting off pathology)

Horizontal Flip:

transform = transforms.RandomHorizontalFlip(p=0.5)
# Mirrors image left-right

When to use: Most vision tasks WHERE LEFT-RIGHT SYMMETRY IS NATURAL.

CRITICAL EXCEPTION:

❌ Medical imaging (L/R markers mean something)
❌ Text/documents (flipped text is unreadable)
❌ Objects with semantic left/right (cars facing direction)
❌ Faces (though some datasets use it)

Safe domains:

✅ Natural scene classification
✅ Animal classification (except directional animals)
✅ Generic object detection (not vehicles)

Vertical Flip (Use Rarely):

transform = transforms.RandomVerticalFlip(p=0.5)

VERY LIMITED USE: Most natural objects are not up-down symmetric.

❌ Most natural images (horizon has direction)
❌ Medical imaging (anatomical direction matters)
✅ Texture classification (some textures rotationally symmetric)

Perspective Transform (Affine):

transform = transforms.RandomAffine(
    degrees=0,
    translate=(0.1, 0.1),  # ±10% translation
    scale=(0.9, 1.1),       # ±10% scaling
    shear=(-15, 15)         # ±15° shear
)

When to use: Scene understanding, 3D object detection, autonomous driving.

Caution: Shear and extreme perspective can make images unrecognizable. Use conservatively.

Color and Brightness Transforms (Appearance Variance)

Color Jitter:

transform = transforms.ColorJitter(
    brightness=0.2,  # ±20% brightness
    contrast=0.2,    # ±20% contrast
    saturation=0.2,  # ±20% saturation
    hue=0.1          # ±10% hue shift
)

When to use: All vision tasks (teaches color-invariance).

Strength tuning:

Light: brightness=0.1, contrast=0.1, saturation=0.1, hue=0.05
Medium: brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1
Strong: brightness=0.5, contrast=0.5, saturation=0.5, hue=0.3

Domain exceptions:

Medical imaging: brightness/contrast only (color is artificial)
Satellite: All channels safe (handles weather/season)
Thermal imaging: Only brightness meaningful

Gaussian Blur:

from torchvision.transforms.functional import gaussian_blur
transform = transforms.GaussianBlur(kernel_size=(3, 7), sigma=(0.1, 2.0))

When to use: Makes model robust to soft focus, mimics unfocused camera.

Strength tuning:

Light: sigma=(0.1, 0.5)
Medium: sigma=(0.1, 1.0)
Strong: sigma=(0.5, 2.0)

Domain consideration: Don't blur medical/satellite (loses diagnostic/geographic detail).

Grayscale:

transform = transforms.Grayscale(p=0.2)  # 20% probability

When to use: When color information is redundant or unreliable.

Domain exceptions:

Medical imaging: Apply selectively (preserve when color is diagnostic)
Satellite: Don't apply (multi-spectral bands are essential)
Natural scene: Safe to apply

Mixing Augmentations (Mixup, Cutmix, Cutout)

Mixup: Linear interpolation of images and labels

def mixup(x, y, alpha=1.0):
    """Mixup augmentation: blend two images and labels."""
    batch_size = x.size(0)
    index = torch.randperm(batch_size)

    lam = np.random.beta(alpha, alpha)  # Sample mixing ratio
    mixed_x = lam * x + (1 - lam) * x[index]
    y_a, y_b = y, y[index]

    return mixed_x, y_a, y_b, lam

# Use with soft labels during training:
# loss = lam * loss_fn(pred, y_a) + (1-lam) * loss_fn(pred, y_b)

When to use: All image classification tasks.

Strength tuning:

Light: alpha=2.0 (blends close to original)
Medium: alpha=1.0 (uniform blending)
Strong: alpha=0.2 (extreme blends)

Effectiveness: One of the best modern augmentations, ~1-2% accuracy improvement typical.

Cutmix: Replace rectangular region with another image

def cutmix(x, y, alpha=1.0):
    """CutMix augmentation: replace rectangular patch."""
    batch_size = x.size(0)
    index = torch.randperm(batch_size)

    lam = np.random.beta(alpha, alpha)
    height, width = x.size(2), x.size(3)

    # Sample patch coordinates
    cut_ratio = np.sqrt(1.0 - lam)
    cut_h = int(height * cut_ratio)
    cut_w = int(width * cut_ratio)

    cx = np.random.randint(0, width)
    cy = np.random.randint(0, height)

    bbx1 = np.clip(cx - cut_w // 2, 0, width)
    bby1 = np.clip(cy - cut_h // 2, 0, height)
    bbx2 = np.clip(cx + cut_w // 2, 0, width)
    bby2 = np.clip(cy + cut_h // 2, 0, height)

    x[index, :, bby1:bby2, bbx1:bbx2] = x[index, :, bby1:bby2, bbx1:bbx2]

    lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1)) / (height * width)

    return x, y, y[index], lam

When to use: Image classification (especially effective).

Advantage over Mixup: Preserves spatial structure better, more realistic.

Typical improvement: 1-3% accuracy increase.

Cutout: Remove rectangular patch (fill with zero/mean)

def cutout(x, patch_size=32, p=0.5):
    """Cutout: remove rectangular region."""
    if np.random.rand() > p:
        return x

    batch_size, _, height, width = x.size()

    for i in range(batch_size):
        cx = np.random.randint(0, width)
        cy = np.random.randint(0, height)

        x1 = np.clip(cx - patch_size // 2, 0, width)
        y1 = np.clip(cy - patch_size // 2, 0, height)
        x2 = np.clip(cx + patch_size // 2, 0, width)
        y2 = np.clip(cy + patch_size // 2, 0, height)

        x[i, :, y1:y2, x1:x2] = 0

    return x

When to use: Regularization effect, teaches local invariance.

Typical improvement: 0.5-1% accuracy increase.

AutoAugment and Learned Policies

RandAugment: Random selection from augmentation space

from torchvision.transforms import RandAugment

transform = RandAugment(num_ops=2, magnitude=9)
# Apply 2 random augmentations from 14 operation space
# Magnitude 0-30 controls strength

When to use: When unsure about augmentation selection.

Advantage: Removes manual hyperparameter tuning.

Typical improvement: 1-2% accuracy compared to manual selection.

AutoAugment: Data-dependent learned policy

from torchvision.transforms import AutoAugment, AutoAugmentPolicy

transform = AutoAugment(AutoAugmentPolicy.IMAGENET)
# Predefined policy for ImageNet-like tasks
# Policies: IMAGENET, CIFAR10, SVHN

Pre-trained policies:

IMAGENET: General-purpose, vision tasks
CIFAR10: Smaller images (32x32), high regularization
SVHN: Street view house numbers

Typical improvement: 0.5-1% accuracy.

NLP Augmentations (Text Classification, QA, Generation)

Key Principle: Preserve meaning while varying surface form. Syntax and semantics must be preserved.

Rule-Based Augmentations

Back-Translation:

def back_translate(text: str, src_lang='en', inter_lang='fr') -> str:
    """Translate to intermediate language and back to create paraphrase."""
    # English -> French -> English
    # Example: "The cat sat on mat" -> "Le chat s'assit sur le tapis" -> "The cat sat on the mat"

    # Use library like transformers or marian-mt
    from transformers import MarianMTModel, MarianTokenizer

    # Translate en->fr
    model_name = f"Helsinki-NLP/Opus-MT-{src_lang}-{inter_lang}"
    tokenizer = MarianTokenizer.from_pretrained(model_name)
    model = MarianMTModel.from_pretrained(model_name)

    inputs = tokenizer(text, return_tensors="pt")
    outputs = model.generate(**inputs)
    intermediate = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

    # Translate fr->en
    model_name_back = f"Helsinki-NLP/Opus-MT-{inter_lang}-{src_lang}"
    tokenizer_back = MarianTokenizer.from_pretrained(model_name_back)
    model_back = MarianMTModel.from_pretrained(model_name_back)

    inputs_back = tokenizer_back(intermediate, return_tensors="pt")
    outputs_back = model_back.generate(**inputs_back)
    result = tokenizer_back.batch_decode(outputs_back, skip_special_tokens=True)[0]

    return result

When to use: Text classification, sentiment analysis, intent detection.

Strength tuning:

Use 1-2 intermediate languages
Probability 0.3-0.5 (paraphrases, not all data)

Advantage: Creates natural paraphrases.

Disadvantage: Slow (requires neural translation model).

Synonym Replacement (EDA):

import nltk
from nltk.corpus import wordnet

def synonym_replacement(text: str, n=2):
    """Replace n random words with synonyms."""
    words = text.split()
    new_words = words.copy()

    random_word_list = list(set([word for word in words if word.isalnum()]))
    random.shuffle(random_word_list)

    num_replaced = 0
    for random_word in random_word_list:
        synonyms = get_synonyms(random_word)
        if len(synonyms) > 0:
            synonym = random.choice(synonyms)
            new_words = [synonym if word == random_word else word for word in new_words]
            num_replaced += 1
        if num_replaced >= n:
            break

    return ' '.join(new_words)

def get_synonyms(word):
    """Find synonyms using WordNet."""
    synonyms = set()
    for syn in wordnet.synsets(word):
        for lemma in syn.lemmas():
            synonyms.add(lemma.name())
    return list(synonyms - {word})

When to use: Text classification, low-resource languages.

Strength tuning:

n=1-3 synonyms per sentence
Probability 0.5 (replace in half of training data)

Typical improvement: 1-2% for small datasets.

Random Insertion:

def random_insertion(text: str, n=2):
    """Insert n random synonyms of random words."""
    words = text.split()
    new_words = words.copy()

    for _ in range(n):
        add_word(new_words)

    return ' '.join(new_words)

def add_word(new_words):
    synonyms = []
    counter = 0
    while len(synonyms) < 1:
        if counter >= 10:
            return
        random_word = new_words[random.randint(0, len(new_words)-1)]
        synonyms = get_synonyms(random_word)
        counter += 1

    random_synonym = synonyms[random.randint(0, len(synonyms)-1)]
    random_idx = random.randint(0, len(new_words)-1)
    new_words.insert(random_idx, random_synonym)

When to use: Text classification, paraphrase detection.

Random Swap:

def random_swap(text: str, n=2):
    """Randomly swap positions of n word pairs."""
    words = text.split()
    new_words = words.copy()

    for _ in range(n):
        new_words = swap_word(new_words)

    return ' '.join(new_words)

def swap_word(new_words):
    random_idx_1 = random.randint(0, len(new_words)-1)
    random_idx_2 = random_idx_1

    counter = 0
    while random_idx_2 == random_idx_1:
        random_idx_2 = random.randint(0, len(new_words)-1)
        counter += 1
        if counter > 3:
            return new_words

    new_words[random_idx_1], new_words[random_idx_2] = new_words[random_idx_2], new_words[random_idx_1]
    return new_words

When to use: Robustness to word order variations.

Random Deletion:

def random_deletion(text: str, p=0.2):
    """Randomly delete words with probability p."""
    if len(text.split()) == 1:
        return text

    words = text.split()
    new_words = [word for word in words if random.uniform(0, 1) > p]

    if len(new_words) == 0:
        return random.choice(words)

    return ' '.join(new_words)

When to use: Robustness to missing/incomplete input.

Sentence-Level Augmentations

Paraphrase Generation:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

def paraphrase(text: str):
    """Generate paraphrase using pretrained model."""
    model_name = "Vamsi/T5_Paraphrase_Paws"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

    input_ids = tokenizer.encode(text, return_tensors="pt")
    outputs = model.generate(input_ids)
    paraphrase = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return paraphrase

When to use: Text classification with limited data.

Advantage: High-quality semantic paraphrases.

Disadvantage: Model-dependent, can be slow.

Audio Augmentations (Speech Recognition, Music)

Key Principle: Preserve content while varying acoustic conditions.

Pitch Shift:

import librosa
import numpy as np

def pitch_shift(waveform: np.ndarray, sr: int, steps: int):
    """Shift pitch without changing speed."""
    # Shift by ±2-4 semitones typical
    return librosa.effects.pitch_shift(waveform, sr=sr, n_steps=steps)

# Usage:
audio, sr = librosa.load('audio.wav')
augmented = pitch_shift(audio, sr, steps=np.random.randint(-4, 5))

When to use: Speech recognition (speaker variation).

Strength tuning:

Light: ±2 semitones
Medium: ±4 semitones
Strong: ±8 semitones (avoid, changes phone identity)

Time Stretching:

def time_stretch(waveform: np.ndarray, rate: float):
    """Speed up/slow down without changing pitch."""
    return librosa.effects.time_stretch(waveform, rate=rate)

# Usage:
augmented = time_stretch(audio, rate=np.random.uniform(0.9, 1.1))  # ±10% speed

When to use: Speech recognition (speech rate variation).

Strength tuning:

Light: 0.95-1.05 (±5% speed)
Medium: 0.9-1.1 (±10% speed)
Strong: 0.8-1.2 (±20% speed, too aggressive)

Background Noise Injection:

def add_background_noise(waveform: np.ndarray, noise: np.ndarray, snr_db: float):
    """Add noise at specified SNR (signal-to-noise ratio)."""
    signal_power = np.mean(waveform ** 2)
    snr_linear = 10 ** (snr_db / 10)
    noise_power = signal_power / snr_linear

    noise_scaled = noise * np.sqrt(noise_power / np.mean(noise ** 2))

    # Mix only first len(waveform) samples of noise
    augmented = waveform + noise_scaled[:len(waveform)]
    return np.clip(augmented, -1, 1)  # Prevent clipping

# Usage:
noise, _ = librosa.load('background_noise.wav', sr=sr)
augmented = add_background_noise(audio, noise, snr_db=np.random.uniform(15, 30))

When to use: Speech recognition, robustness to noisy environments.

Strength tuning:

Light: SNR 30-40 dB (minimal noise)
Medium: SNR 20-30 dB (moderate noise)
Strong: SNR 10-20 dB (very noisy, challenging)

SpecAugment: Augmentation in spectrogram space

def spec_augment(mel_spec: np.ndarray, freq_mask_width: int, time_mask_width: int):
    """Apply frequency and time masking to mel-spectrogram."""
    freq_axis_size = mel_spec.shape[0]
    time_axis_size = mel_spec.shape[1]

    # Frequency masking
    f0 = np.random.randint(0, freq_axis_size - freq_mask_width)
    mel_spec[f0:f0+freq_mask_width, :] = 0

    # Time masking
    t0 = np.random.randint(0, time_axis_size - time_mask_width)
    mel_spec[:, t0:t0+time_mask_width] = 0

    return mel_spec

# Usage:
mel_spec = librosa.feature.melspectrogram(y=audio, sr=sr)
augmented = spec_augment(mel_spec, freq_mask_width=30, time_mask_width=40)

When to use: Speech recognition (standard for ASR).

Tabular Augmentations (Regression, Classification on Structured Data)

Key Principle: Preserve relationships between features while adding noise/variation.

SMOTE (Synthetic Minority Over-sampling):

from imblearn.over_sampling import SMOTE

# Balance imbalanced classification
X_train = your_features  # shape: (n_samples, n_features)
y_train = your_labels

smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X_train, y_train)

# Now X_resampled has balanced classes with synthetic minority examples

When to use: Imbalanced classification (rare class oversampling).

Advantage: Addresses class imbalance by creating synthetic examples.

Feature-wise Noise Injection:

def add_noise_to_features(X: np.ndarray, noise_std: float):
    """Add Gaussian noise to features (percentage of feature std)."""
    noise = np.random.normal(0, noise_std, X.shape)
    # Scale noise to percentage of feature std
    feature_stds = np.std(X, axis=0)
    scaled_noise = noise * (feature_stds * noise_std)
    return X + scaled_noise

When to use: Robustness to measurement noise.

Strength tuning:

Light: noise_std=0.01 (1% of feature std)
Medium: noise_std=0.05 (5% of feature std)
Strong: noise_std=0.1 (10% of feature std)

Feature Dropout:

def feature_dropout(X: np.ndarray, p: float):
    """Randomly set features to zero."""
    mask = np.random.binomial(1, 1-p, X.shape)
    return X * mask

When to use: Robustness to missing/unavailable features.

Strength tuning:

p=0.1 (drop 10% of features)
p=0.2 (drop 20%)
Avoid p>0.3 (too much information loss)

Mixup for Tabular Data:

def mixup_tabular(X: np.ndarray, y: np.ndarray, alpha: float = 1.0):
    """Apply mixup to tabular features."""
    batch_size = X.shape[0]
    index = np.random.permutation(batch_size)
    lam = np.random.beta(alpha, alpha)

    X_mixed = lam * X + (1 - lam) * X[index]
    y_a, y_b = y, y[index]

    return X_mixed, y_a, y_b, lam

When to use: Regression and classification on tabular data.

Part 3: Augmentation Strength Tuning

Conservative vs Aggressive Augmentation

Principle: Start conservative, increase gradually. Test impact.

Weak Augmentation (100% probability)

Apply light augmentation to ALL training data, EVERY epoch.

weak_augmentation = transforms.Compose([
    transforms.RandomRotation(degrees=10),
    transforms.ColorJitter(brightness=0.1, contrast=0.1),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomAffine(degrees=0, translate=(0.05, 0.05)),
])

Typical improvement: +1-2% accuracy.

Pros:

Consistent, no randomness in augmentation strength
Easier to reproduce
Less prone to catastrophic augmentation

Cons:

Each image same number of times
Less diversity per image

Strong Augmentation (Lower Probability)

Apply strong augmentations with 30-50% probability.

strong_augmentation = transforms.Compose([
    transforms.RandomRotation(degrees=45),
    transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3),
    transforms.RandomAffine(degrees=0, translate=(0.15, 0.15), shear=(15, 15)),
    transforms.RandomPerspective(distortion_scale=0.3),
])

class StrongAugmentationWrapper:
    def __init__(self, transform, p=0.3):
        self.transform = transform
        self.p = p

    def __call__(self, x):
        if np.random.rand() < self.p:
            return self.transform(x)
        return x

aug_wrapper = StrongAugmentationWrapper(strong_augmentation, p=0.3)

Typical improvement: +2-3% accuracy.

Pros:

More diversity
Better robustness to extreme conditions

Cons:

Risk of too-aggressive augmentation
Requires careful strength tuning

Finding Optimal Strength

Algorithm:

Start with weak augmentation (parameters at 50% of expected range)
Train for 1 epoch, measure validation accuracy
Keep weak augmentation for full training
Increase strength by 25% and retrain
Compare final accuracies
If accuracy improved, increase further; if hurt, decrease
Stop when accuracy plateaus or decreases

Example:

# Start: rotation ±10°, brightness ±0.1
# After test 1: accuracy improves, try rotation ±15°, brightness ±0.15
# After test 2: accuracy improves, try rotation ±20°, brightness ±0.2
# After test 3: accuracy decreases, revert to rotation ±15°, brightness ±0.15

Part 4: Test-Time Augmentation (TTA)

Definition: Apply augmentation at inference time, average predictions.

def predict_with_tta(model, image, num_augmentations=8):
    """Make predictions with test-time augmentation."""
    predictions = []

    for _ in range(num_augmentations):
        # Apply light augmentation
        augmented = augmentation(image)
        with torch.no_grad():
            pred = model(augmented.unsqueeze(0))
        predictions.append(pred.softmax(dim=1))

    # Average predictions
    final_pred = torch.stack(predictions).mean(dim=0)
    return final_pred

When to use:

Final evaluation (test set submission)
Robustness testing
Post-training calibration

Don't use for:

Validation (metrics must reflect single-pass performance)
Production inference (too slow, accuracy not worth inference latency)

Typical improvement: +0.5-1% accuracy.

Computational cost: 8-10x slower inference.

Part 5: Common Pitfalls and Rationalization

Pitfall 1: Augmenting Validation/Test Data

Symptom: Validation accuracy inflated, test performance poor.

User Says: "More diversity helps, so augment everywhere"

Why It Fails: Validation measures true performance on ORIGINAL data, not augmented.

Fix:

# WRONG:
val_transform = transforms.Compose([
    transforms.RandomRotation(20),
    transforms.ToTensor(),
])

# RIGHT:
val_transform = transforms.Compose([
    transforms.ToTensor(),
])

Pitfall 2: Over-Augmentation (Unrecognizable Images)

Symptom: Training loss doesn't decrease, accuracy worse with augmentation.

User Says: "More augmentation = more robustness"

Why It Fails: If image unrecognizable, model cannot learn the class.

Fix: Start conservative. Test incrementally.

Pitfall 3: Wrong Domain Augmentations

Symptom: Accuracy drops with augmentation.

User Says: "These augmentations work for images, why not text?"

Why It Fails: Flipped text is unreadable. Domain-specific invariances differ.

Fix: Use augmentations designed for your domain.

Pitfall 4: Augmentation Inconsistency Across Train/Val

Symptom: Model overfits, ignores augmentation benefit.

User Says: "I normalize images, so different augmentation pipelines okay"

Why It Fails: Train augmentation must be intentional; val must not have it.

Fix: Explicitly separate training and validation transforms.

Pitfall 5: Ignoring Label Semantics

Symptom: Model predicts wrong class after augmentation.

User Says: "The label is preserved, so any transformation okay"

Why It Fails: Extreme transformations obscure discriminative features.

Example: Medical image rotated 180° may have artifacts that change diagnosis.

Fix: Consider label semantics, not just label preservation.

Pitfall 6: No Augmentation on Small Dataset

Symptom: Severe overfitting, poor generalization.

User Says: "My data is unique, standard augmentations won't help"

Why It Fails: Overfitting still happens, augmentation reduces it.

Fix: Use domain-appropriate augmentations even on small datasets.

Pitfall 7: Augmentation Not Reproducible

Symptom: Different training runs give different results.

User Says: "Random augmentation is fine, natural variation"

Why It Fails: Makes debugging impossible, non-reproducible research.

Fix: Set random seeds for reproducible augmentation.

import random
import numpy as np
import torch

random.seed(42)
np.random.seed(42)
torch.manual_seed(42)

Pitfall 8: Using One Augmentation Policy for All Tasks

Symptom: Augmentation works for classification, hurts for detection.

User Says: "Augmentation is general, works everywhere"

Why It Fails: Detection needs different augmentations (preserve boxes).

Fix: Domain AND task-specific augmentation selection.

Pitfall 9: Augmentation Overhead Too High

Symptom: Training 2x slower, minimal accuracy improvement.

User Says: "Augmentation is worth the overhead"

Why It Fails: Sometimes it is, sometimes not. Measure impact.

Fix: Profile training time. Balance overhead vs accuracy gain.

Pitfall 10: Mixing Incompatible Augmentations

Symptom: Unexpected behavior, degraded performance.

User Says: "Combining augmentations = better diversity"

Why It Fails: Some augmentations conflict or overlap.

Example: CutMix + random crop can create strange patches.

Fix: Design augmentation pipelines carefully, test combinations.

Part 6: Augmentation Policy Design

Step-by-Step Augmentation Design

Step 1: Identify invariances in your domain

What transformations preserve the class label?

Vision: Rotation ±15° (natural), flip (depends), color jitter (yes)
Text: Synonym replacement (yes), flip sentence (no)
Audio: Pitch shift ±4 semitones (yes), time stretch ±20% (yes)
Tabular: Feature noise (yes), feature permutation (no)

Step 2: Select weak augmentations

Choose conservative parameters.

weak_aug = transforms.Compose([
    transforms.RandomRotation(degrees=15),
    transforms.ColorJitter(brightness=0.1),
])

Step 3: Measure impact

Train with/without augmentation, compare validation accuracy.

# Without augmentation
model_no_aug = train(no_aug_transforms, epochs=10)
val_acc_no_aug = evaluate(model_no_aug, val_loader)

# With weak augmentation
model_weak_aug = train(weak_aug, epochs=10)
val_acc_weak_aug = evaluate(model_weak_aug, val_loader)

print(f"Without augmentation: {val_acc_no_aug}")
print(f"With weak augmentation: {val_acc_weak_aug}")

Step 4: Increase gradually if beneficial

If augmentation helped, increase strength 25%.

medium_aug = transforms.Compose([
    transforms.RandomRotation(degrees=20),      # ±20° vs ±15°
    transforms.ColorJitter(brightness=0.15),   # 0.15 vs 0.1
])

model_medium = train(medium_aug, epochs=10)
val_acc_medium = evaluate(model_medium, val_loader)

Step 5: Stop when improvement plateaus

When accuracy no longer improves, use previous best parameters.

Augmentation for Different Dataset Sizes

< 1,000 examples: Heavy augmentation needed

heavy_aug = transforms.Compose([
    transforms.RandomRotation(degrees=30),
    transforms.RandomResizedCrop(224, scale=(0.7, 1.0)),
    transforms.ColorJitter(brightness=0.3, contrast=0.3),
    transforms.RandomAffine(degrees=0, shear=15),
    transforms.RandomHorizontalFlip(p=0.5),
])

1,000-10,000 examples: Medium augmentation

medium_aug = transforms.Compose([
    transforms.RandomRotation(degrees=15),
    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.RandomHorizontalFlip(p=0.5),
])

10,000-100,000 examples: Light augmentation

light_aug = transforms.Compose([
    transforms.RandomRotation(degrees=10),
    transforms.ColorJitter(brightness=0.1),
    transforms.RandomHorizontalFlip(p=0.3),
])

> 100,000 examples: Minimal augmentation (optional)

minimal_aug = transforms.Compose([
    transforms.ColorJitter(brightness=0.05),
])

Part 7: Augmentation Composition Strategies

Sequential vs Compound Augmentation

Sequential (Apply transforms in sequence, each has independent probability):

# Sequential: each transform independent
sequential = transforms.Compose([
    transforms.RandomRotation(degrees=15),      # 100% probability
    transforms.ColorJitter(brightness=0.2),    # 100% probability
    transforms.RandomHorizontalFlip(p=0.5),    # 50% probability
])
# Result: Always rotate and color jitter, sometimes flip
# Most common approach

Compound (Random selection of augmentation combinations):

# Compound: choose one from alternatives
def compound_augmentation(image):
    choice = np.random.choice(['light', 'medium', 'heavy'])

    if choice == 'light':
        return light_aug(image)
    elif choice == 'medium':
        return medium_aug(image)
    else:
        return heavy_aug(image)

When to use compound:

When augmentations conflict
When you want balanced diversity
When computational resources limited

Augmentation Order Matters

Some augmentations should be applied in specific order:

Optimal order:

Geometric transforms first (rotation, shear, perspective)
Cropping (RandomResizedCrop)
Flipping (horizontal, vertical)
Color/intensity transforms (brightness, contrast, hue)
Final normalization

optimal_order = transforms.Compose([
    transforms.RandomRotation(15),
    transforms.RandomAffine(degrees=0, shear=10),
    transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.ColorJitter(brightness=0.2, contrast=0.2),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                        std=[0.229, 0.224, 0.225]),
])

Why: Geometric first (operate on pixel coordinates), then color (invariant to coordinate changes).

Probability-Based Augmentation Control

Weak augmentation (apply to all data):

# Weak: always apply
weak = transforms.Compose([
    transforms.RandomRotation(degrees=10),
    transforms.ColorJitter(brightness=0.1),
    transforms.RandomHorizontalFlip(p=0.5),
])

# Apply to every training image
for epoch in range(epochs):
    for images, labels in train_loader:
        images = weak(images)
        # ... train

Strong augmentation with probability:

class ProbabilisticAugmentation:
    def __init__(self, transform, p: float):
        self.transform = transform
        self.p = p

    def __call__(self, x):
        if np.random.rand() < self.p:
            return self.transform(x)
        return x

# Use strong augmentation with 30% probability
strong = transforms.Compose([
    transforms.RandomRotation(degrees=45),
    transforms.ColorJitter(brightness=0.4),
])
probabilistic = ProbabilisticAugmentation(strong, p=0.3)

# Each image: 70% unaugmented (training signal), 30% strongly augmented

Part 8: Augmentation for Specific Tasks

Augmentation for Object Detection

Challenge: Must preserve bounding boxes after augmentation.

Strategy: Use augmentations that preserve geometry or can remap boxes.

from albumentations import (
    HorizontalFlip, VerticalFlip, Rotate, ColorJitter, Resize, Compose
)

# Albumentations handles box remapping automatically
detection_augmentation = Compose([
    HorizontalFlip(p=0.5),
    Rotate(limit=15, p=0.5),
    ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, p=0.5),
], bbox_params=BboxParams(format='pascal_voc', label_fields=['labels']))

# Usage:
image, boxes, labels = detection_sample
augmented = detection_augmentation(
    image=image,
    bboxes=boxes,
    labels=labels
)

Safe augmentations:

✅ Horizontal flip (adjust box x-coordinates)
✅ Crop (clip boxes to cropped region)
✅ Rotate ±15° (remaps box corners)
✅ Color jitter (no box changes)

Avoid:

❌ Vertical flip (semantic meaning changes for many objects)
❌ Perspective distortion (complex box remapping)
❌ Large rotation (hard to remap boxes)

Augmentation for Semantic Segmentation

Challenge: Masks must be transformed identically to images.

Strategy: Apply same transform to both image and mask.

from albumentations import (
    HorizontalFlip, RandomCrop, Rotate, ColorJitter, Compose
)

segmentation_augmentation = Compose([
    HorizontalFlip(p=0.5),
    Rotate(limit=15, p=0.5),
    RandomCrop(height=256, width=256),
    ColorJitter(brightness=0.2, contrast=0.2, p=0.5),
], keypoint_params=KeypointParams(format='xy'))

# Usage:
image, mask = segmentation_sample
augmented = segmentation_augmentation(image=image, mask=mask)
image_aug, mask_aug = augmented['image'], augmented['mask']

Key requirement: Image and mask transformed identically.

Augmentation for Fine-Grained Classification

Challenges: Small objects, subtle differences between classes.

Strategy: Use conservative geometric transforms, aggressive color/texture.

# Fine-grained: preserve structure, vary appearance
fine_grained = transforms.Compose([
    transforms.RandomRotation(degrees=5),        # Conservative rotation
    transforms.RandomResizedCrop(224, scale=(0.9, 1.0)),  # Minimal crop
    transforms.ColorJitter(brightness=0.3, contrast=0.3),  # Aggressive color
    transforms.RandomAffine(degrees=0, translate=(0.05, 0.05)),
])

Avoid:

Large crops (lose discriminative details)
Extreme rotations (change object orientation)
Perspective distortion (distorts fine structures)

Augmentation for Medical Imaging

Critical requirements: Domain-specific, label-preserving, anatomically valid.

# Medical imaging augmentation (conservative)
medical_aug = transforms.Compose([
    transforms.RandomRotation(degrees=10),  # Max ±10°
    transforms.ColorJitter(brightness=0.1, contrast=0.1),
    # Avoid: vertical flip (anatomical direction), excessive crop
])

# Never apply:
# - Vertical flip (anatomy has direction)
# - Random crops cutting off pathology
# - Extreme color transforms (diagnostic colors matter)
# - Perspective distortion (can distort anatomy)

Domain-specific augmentations for medical:

✅ Elastic deformation (models anatomical variation)
✅ Rotation ±10° (patient positioning variation)
✅ Small brightness/contrast (scanner variation)
✅ Gaussian blur (image quality variation)

Augmentation for Time Series / Sequences

For 1D sequences (signal processing, ECG, EEG):

def jitter(x: np.ndarray, std: float = 0.01):
    """Add small random noise to sequence."""
    return x + np.random.normal(0, std, x.shape)

def scaling(x: np.ndarray, scale: float = 0.1):
    """Scale magnitude of sequence."""
    return x * np.random.uniform(1 - scale, 1 + scale)

def rotation(x: np.ndarray):
    """Rotate in 2D space (for multivariate sequences)."""
    theta = np.random.uniform(-np.pi/4, np.pi/4)
    rotation_matrix = np.array([
        [np.cos(theta), -np.sin(theta)],
        [np.sin(theta), np.cos(theta)]
    ])
    return x @ rotation_matrix.T

def magnitude_warping(x: np.ndarray, sigma: float = 0.2):
    """Apply smooth scaling variations."""
    knots = np.linspace(0, len(x), 5)
    values = np.random.normal(1, sigma, len(knots))
    from scipy.interpolate import interp1d
    smooth_scale = interp1d(knots, values, kind='cubic')(np.arange(len(x)))
    return x * smooth_scale[:, np.newaxis]

def window_slicing(x: np.ndarray, window_ratio: float = 0.1):
    """Reduce window size, then scale back to original length."""
    window_size = int(len(x) * window_ratio)
    start = np.random.randint(0, len(x) - window_size)
    x_sliced = x[start:start + window_size]
    # Interpolate back to original length
    from scipy.interpolate import interp1d
    f = interp1d(np.arange(len(x_sliced)), x_sliced, axis=0, kind='linear',
                  fill_value='extrapolate')
    return f(np.linspace(0, len(x_sliced)-1, len(x)))

Part 9: Augmentation Red Flags and Troubleshooting

Red Flags: When Augmentation Is Hurting

Validation accuracy DECREASES with augmentation
- Likely: Too aggressive augmentation
- Solution: Reduce augmentation strength by 50%, retrain
Training loss doesn't decrease
- Likely: Images too distorted to learn
- Solution: Visualize augmented images, check if recognizable
Test accuracy much worse than validation
- Likely: Validation data accidentally augmented
- Solution: Check transform pipelines, ensure validation/test unaugmented
High variance in results across runs
- Likely: Augmentation randomness not seeded
- Solution: Set random seeds for reproducibility
Specific class performance drops with augmentation
- Likely: Augmentation inappropriate for that class
- Solution: Design class-specific augmentation (or disable for that class)
Memory usage doubled
- Likely: Applying augmentation twice (in data loader and training)
- Solution: Remove duplicate augmentation pipeline
Model never converges to baseline
- Likely: Augmentation too strong, label semantics lost
- Solution: Use weak augmentation first, increase gradually
Overfitting still severe despite augmentation
- Likely: Augmentation too weak or wrong type
- Solution: Increase strength, try different augmentations, use regularization too

Troubleshooting Checklist

Before concluding augmentation doesn't help:

Validation transform pipeline has NO augmentations
Training transform pipeline has only desired augmentations
Random seed set for reproducibility
Augmented images are visually recognizable (not noise)
Augmentation applied consistently across epochs
Baseline training tested (no augmentation) for comparison
Accuracy impact measured on same hardware/compute
Computational cost justified by accuracy improvement

Part 10: Rationalization Table (What Users Say vs Reality)

User Statement	Reality	Evidence	Fix
"Augmentation is overhead, skip it"	Augmentation prevents overfitting on small data	+5-10% accuracy on <5K examples	Enable augmentation, measure impact
"Use augmentation on validation too"	Validation measures true performance on original data	Metrics misleading if augmented	Remove augmentation from val transforms
"More augmentation always better"	Extreme augmentation creates label noise	Accuracy drops with too-aggressive transforms	Start conservative, increase gradually
"Same augmentation for all domains"	Each domain has different invariances	Text upside-down ≠ same class	Use domain-specific augmentations
"Augmentation takes too long"	~10-20% training overhead, usually worth it	Depends on accuracy gain vs compute cost	Profile: measure accuracy/time tradeoff
"Flip works for everything"	Vertical flip changes anatomy/semantics	Medical imaging, some objects not symmetric	Know when flip is appropriate
"Random augmentation same as fixed"	Randomness prevents memorization, fixed is repetitive	Stochastic variation teaches invariance	Use random, not fixed transforms
"My data is too unique for standard augmentations"	Even unique data benefits from domain-appropriate augmentation	Overfitting still happens with small unique datasets	Adapt augmentations to your domain
"Augmentation is regularization"	Augmentation and regularization different; both help together	Dropout+BatchNorm+Augmentation > any single one	Use augmentation AND regularization
"TTA means augment validation"	TTA is optional post-training, not validation practice	TTA averaged over multiple forward passes	Use TTA only at final inference

Summary: Quick Reference

Domain	Light Augmentations	Medium Augmentations	Strong Augmentations
Vision	±10° rotation, ±10% brightness, 0.5 H-flip	±20° rotation, ±20% brightness, CutMix	±45° rotation, ±30% jitter, strong perspective
NLP	Synonym replacement (1 word)	Back-translation, EDA	Multiple paraphrases, sentence reordering
Audio	Pitch ±2 semitones, noise SNR 30dB	Pitch ±4, noise SNR 20dB	Pitch ±8, noise SNR 10dB
Tabular	Feature noise 1%, SMOTE	Feature noise 5%, feature dropout	Feature noise 10%, heavy SMOTE

Critical Rules

Augment training data ONLY. Validation and test data must be unaugmented.
Start conservative, increase gradually. Measure impact at each step.
Domain matters. No universal augmentation strategy exists.
Preserve labels. Do not apply transformations that change the class.
Test incrementally. Add one augmentation at a time, measure impact.
Reproducibility. Set random seeds for ablation studies.
Avoid extremes. If images/text unrecognizable, augmentation too strong.
Know your domain. Understand what invariances matter for your task.
Measure impact. Profile training time and accuracy improvement.
Combine with regularization. Augmentation works best with dropout, batch norm, weight decay.

Install Skill

SKILL.md

Data Augmentation Strategies

Overview

When to Use This Skill

Part 1: Augmentation Decision Framework

The Core Question: "When should I augment?"

Clarifying Questions

Decision Tree

Part 2: Domain-Specific Augmentation Catalogs

Vision Augmentations (Image Classification, Detection, Segmentation)

Geometric Transforms (Preserve Class)

Color and Brightness Transforms (Appearance Variance)

Mixing Augmentations (Mixup, Cutmix, Cutout)

AutoAugment and Learned Policies

NLP Augmentations (Text Classification, QA, Generation)

Rule-Based Augmentations

Sentence-Level Augmentations

Audio Augmentations (Speech Recognition, Music)

Tabular Augmentations (Regression, Classification on Structured Data)

Part 3: Augmentation Strength Tuning

Conservative vs Aggressive Augmentation

Weak Augmentation (100% probability)

Strong Augmentation (Lower Probability)

Finding Optimal Strength

Part 4: Test-Time Augmentation (TTA)

Part 5: Common Pitfalls and Rationalization

Pitfall 1: Augmenting Validation/Test Data

Pitfall 2: Over-Augmentation (Unrecognizable Images)

Pitfall 3: Wrong Domain Augmentations

Pitfall 4: Augmentation Inconsistency Across Train/Val

Pitfall 5: Ignoring Label Semantics

Pitfall 6: No Augmentation on Small Dataset

Pitfall 7: Augmentation Not Reproducible

Pitfall 8: Using One Augmentation Policy for All Tasks

Pitfall 9: Augmentation Overhead Too High

Pitfall 10: Mixing Incompatible Augmentations

Part 6: Augmentation Policy Design

Step-by-Step Augmentation Design

Augmentation for Different Dataset Sizes

Part 7: Augmentation Composition Strategies

Sequential vs Compound Augmentation

Augmentation Order Matters

Probability-Based Augmentation Control

Part 8: Augmentation for Specific Tasks

Augmentation for Object Detection

Augmentation for Semantic Segmentation

Augmentation for Fine-Grained Classification

Augmentation for Medical Imaging

Augmentation for Time Series / Sequences

Part 9: Augmentation Red Flags and Troubleshooting

Red Flags: When Augmentation Is Hurting

Troubleshooting Checklist

Part 10: Rationalization Table (What Users Say vs Reality)

Summary: Quick Reference

Critical Rules