name	outlier-detective
description	Detect anomalies and outliers in datasets using statistical and ML methods. Use for data cleaning, fraud detection, or quality control analysis.

Outlier Detective

Name: outlier-detective
Author: dkyazzentwatwa

Detect anomalies and outliers in numeric data using multiple methods.

Features

Statistical Methods: Z-score, IQR, Modified Z-score
ML Methods: Isolation Forest, LOF, DBSCAN
Visualization: Box plots, scatter plots
Multi-Column: Analyze multiple variables
Reports: Detailed outlier reports
Flexible Thresholds: Configurable sensitivity

Quick Start

from outlier_detective import OutlierDetective

detective = OutlierDetective()
detective.load_csv("sales_data.csv")

# Detect outliers in a column
outliers = detective.detect("revenue", method="iqr")
print(f"Found {len(outliers)} outliers")

# Get full report
report = detective.analyze("revenue")
print(report)

CLI Usage

# Detect outliers using IQR method
python outlier_detective.py --input data.csv --column sales --method iqr

# Use Z-score with custom threshold
python outlier_detective.py --input data.csv --column price --method zscore --threshold 3

# Analyze all numeric columns
python outlier_detective.py --input data.csv --all

# Generate visualization
python outlier_detective.py --input data.csv --column revenue --plot boxplot.png

# Export outliers to CSV
python outlier_detective.py --input data.csv --column value --output outliers.csv

# Use Isolation Forest (ML)
python outlier_detective.py --input data.csv --method isolation_forest

API Reference

OutlierDetective Class

class OutlierDetective:
    def __init__(self)

    # Data loading
    def load_csv(self, filepath: str, **kwargs) -> 'OutlierDetective'
    def load_dataframe(self, df: pd.DataFrame) -> 'OutlierDetective'

    # Detection (single column)
    def detect(self, column: str, method: str = "iqr", **kwargs) -> pd.DataFrame
    def analyze(self, column: str) -> dict

    # Detection (multi-column)
    def detect_multivariate(self, columns: list = None, method: str = "isolation_forest") -> pd.DataFrame
    def analyze_all(self) -> dict

    # Visualization
    def plot_boxplot(self, column: str, output: str) -> str
    def plot_scatter(self, col1: str, col2: str, output: str) -> str
    def plot_distribution(self, column: str, output: str) -> str

    # Export
    def get_outliers(self, column: str, method: str = "iqr") -> pd.DataFrame
    def get_clean_data(self, column: str, method: str = "iqr") -> pd.DataFrame

Detection Methods

Statistical Methods

IQR (Interquartile Range)

Default and most robust method
Outliers: values below Q1 - 1.5×IQR or above Q3 + 1.5×IQR
Multiplier configurable (default: 1.5)

outliers = detective.detect("price", method="iqr", multiplier=1.5)

Z-Score

Based on standard deviations from mean
Assumes normal distribution
Threshold configurable (default: 3)

outliers = detective.detect("price", method="zscore", threshold=3)

Modified Z-Score

Uses median instead of mean
More robust to existing outliers
Based on MAD (Median Absolute Deviation)

outliers = detective.detect("price", method="modified_zscore", threshold=3.5)

ML Methods

Isolation Forest

Ensemble method, good for high-dimensional data
Contamination parameter sets expected outlier fraction

outliers = detective.detect_multivariate(
    method="isolation_forest",
    contamination=0.1
)

Local Outlier Factor (LOF)

Density-based method
Compares local density to neighbors

outliers = detective.detect_multivariate(
    method="lof",
    n_neighbors=20
)

Output Format

detect() Result

# Returns DataFrame of outlier rows with additional columns:
#   - outlier_score: How extreme the value is
#   - outlier_reason: Description of why it's an outlier

   index  value  outlier_score  outlier_reason
0     15   5000          4.2    Above Q3 + 1.5×IQR
1     42  -1000         -3.8    Below Q1 - 1.5×IQR

analyze() Result

{
    "column": "revenue",
    "total_rows": 1000,
    "outlier_count": 23,
    "outlier_percent": 2.3,
    "methods": {
        "iqr": {"count": 23, "indices": [...]},
        "zscore": {"count": 18, "indices": [...]},
        "modified_zscore": {"count": 20, "indices": [...]}
    },
    "stats": {
        "mean": 5432.10,
        "median": 4890.00,
        "std": 1234.56,
        "min": -1000.00,
        "max": 15000.00,
        "q1": 3500.00,
        "q3": 6200.00,
        "iqr": 2700.00
    },
    "bounds": {
        "lower": -550.00,
        "upper": 10250.00
    }
}

Example Workflows

Data Cleaning Pipeline

detective = OutlierDetective()
detective.load_csv("raw_data.csv")

# Analyze and visualize
report = detective.analyze("price")
print(f"Found {report['outlier_count']} outliers ({report['outlier_percent']:.1f}%)")

# Get clean data
clean_data = detective.get_clean_data("price", method="iqr")
clean_data.to_csv("clean_data.csv")

Fraud Detection

detective = OutlierDetective()
detective.load_csv("transactions.csv")

# Use multiple methods for consensus
iqr_outliers = set(detective.detect("amount", method="iqr").index)
zscore_outliers = set(detective.detect("amount", method="zscore").index)

# Transactions flagged by both methods
high_confidence = iqr_outliers & zscore_outliers
print(f"High-confidence anomalies: {len(high_confidence)}")

Multi-Variable Analysis

detective = OutlierDetective()
detective.load_csv("sensors.csv")

# Detect multivariate outliers
outliers = detective.detect_multivariate(
    columns=["temp", "pressure", "humidity"],
    method="isolation_forest",
    contamination=0.05
)
print(f"Anomalous readings: {len(outliers)}")

Visualization Examples

# Box plot with outliers highlighted
detective.plot_boxplot("revenue", "revenue_boxplot.png")

# Distribution with bounds
detective.plot_distribution("price", "price_dist.png")

# Scatter plot (2D outliers)
detective.plot_scatter("feature1", "feature2", "scatter.png")

Dependencies

pandas>=2.0.0
numpy>=1.24.0
scipy>=1.10.0
scikit-learn>=1.3.0
matplotlib>=3.7.0

outlier-detective

Install Skill

SKILL.md

Outlier Detective

Features

Quick Start

CLI Usage

API Reference

OutlierDetective Class

Detection Methods

Statistical Methods

IQR (Interquartile Range)

Z-Score

Modified Z-Score

ML Methods

Isolation Forest

Local Outlier Factor (LOF)

Output Format

detect() Result

analyze() Result

Example Workflows

Data Cleaning Pipeline

Fraud Detection

Multi-Variable Analysis

Visualization Examples

Dependencies