| name | xsv |
| description | Use xsv for fast CSV data processing with selection, filtering, statistics, joining, sorting, and indexing for high-performance data manipulation. |
xsv - CSV Command Line Toolkit Skill
You are a CSV data manipulation specialist using xsv, a fast command-line CSV toolkit written in Rust. This skill provides comprehensive guidance for processing, analyzing, and transforming CSV data efficiently.
Why xsv?
xsv is designed for high-performance CSV operations:
- Extremely fast: Rust-based, optimized for speed
- Low memory: Streaming operations when possible
- Rich features: 20+ commands for CSV manipulation
- Indexing: Create indexes for faster random access
- Composable: Unix philosophy - pipe commands together
Core Capabilities
- Selection: select, headers, slice, sample
- Searching: search, frequency
- Analysis: stats, frequency, count
- Transformation: fmt, fixlengths, flatten
- Combination: join, cat
- Sorting: sort
- Display: table
- Splitting: split
- Performance: index
Quick Start
View CSV Data as Table
xsv table data.csv
xsv table data.csv | less -S
# Limit field width for readability
xsv table -c 20 data.csv
Count Rows
xsv count data.csv
View Headers
xsv headers data.csv
Select Columns
xsv select 1,3,5 data.csv
xsv select Name,Email,Age data.csv
Essential Commands
headers - View Column Names
# Show all column names with indices
xsv headers data.csv
# Output format:
# 1 Name
# 2 Email
# 3 Age
Use case: Understand CSV structure before operations
count - Count Records
# Count records (excluding header)
xsv count data.csv
# Count with index (much faster)
xsv count data.csv.idx
Performance: O(1) with index, O(n) without
select - Select Columns
# By index (1-based)
xsv select 1,3,5 data.csv
# By name
xsv select Name,Email,Age data.csv
# Column ranges
xsv select 1-4 data.csv
xsv select Name-Age data.csv
# From column to end
xsv select 3- data.csv
# Exclude columns
xsv select '!1-2' data.csv
# Reorder and duplicate
xsv select 3,1,2,1 data.csv
# Disambiguate duplicate column names
xsv select 'Name[0],Name[1],Name[2]' data.csv
# Quote names with special characters
xsv select '"Date - Opening","Date - Closing"' data.csv
Common options:
-o, --output <file>: Write to file-n, --no-headers: Treat first row as data-d, --delimiter <char>: Input delimiter (default: ,)
search - Filter Rows by Regex
# Basic search
xsv search "pattern" data.csv
# Case insensitive
xsv search -i "pattern" data.csv
# Search specific columns
xsv search -s Email "gmail.com" data.csv
xsv search -s 1,3,5 "pattern" data.csv
# Invert match (exclude matching rows)
xsv search -v "pattern" data.csv
# Save results
xsv search "active" data.csv -o active_users.csv
Use case: Filter CSV rows like grep
slice - Extract Row Ranges
# First 10 rows
xsv slice -l 10 data.csv
# Rows 100-200
xsv slice -s 100 -e 200 data.csv
# Start at row 50, take 20 rows
xsv slice -s 50 -l 20 data.csv
# Last 10 rows (requires index)
xsv count data.csv # Get total
xsv slice -s -10 data.csv
# Single row
xsv slice -i 42 data.csv
Performance: Much faster with index
stats - Compute Statistics
# Basic stats (mean, min, max, stddev)
xsv stats data.csv
# All statistics (includes median, mode, cardinality)
xsv stats --everything data.csv
# Specific columns
xsv stats -s Age,Salary data.csv
# Include median (requires memory)
xsv stats --median data.csv
# Include mode
xsv stats --mode data.csv
# Include cardinality (unique count)
xsv stats --cardinality data.csv
# Parallel processing
xsv stats -j 4 data.csv
# Output as table
xsv stats data.csv | xsv table
Output fields: field, type, sum, min, max, mean, stddev, median, mode, cardinality
frequency - Frequency Tables
# Top 10 values per column
xsv frequency data.csv
# Specific columns
xsv frequency -s Status,Category data.csv
# Top 20 values
xsv frequency -l 20 data.csv
# All values (no limit)
xsv frequency -l 0 data.csv
# Ascending order
xsv frequency --asc data.csv
# Exclude nulls
xsv frequency --no-nulls data.csv
# View as table
xsv frequency -s Status data.csv | xsv table
Output: field, value, count
Use case: Value distribution analysis
sort - Sort Records
# Sort by first column
xsv sort data.csv
# Sort by specific columns
xsv sort -s Age data.csv
xsv sort -s LastName,FirstName data.csv
# Numeric sort
xsv sort -s Age -N data.csv
# Reverse order
xsv sort -s Age -R data.csv
# Numeric + reverse
xsv sort -s Salary -N -R data.csv
# Save sorted
xsv sort -s Name data.csv -o sorted.csv
Note: Requires reading entire file into memory
join - Join Two CSV Files
# Inner join
xsv join ID users.csv ID orders.csv
# Left outer join
xsv join --left ID users.csv ID orders.csv
# Right outer join
xsv join --right ID users.csv ID orders.csv
# Full outer join
xsv join --full ID users.csv ID orders.csv
# Case-insensitive join
xsv join --no-case Email users.csv Email contacts.csv
# Join on multiple columns
xsv join 'ID,Date' file1.csv 'ID,Date' file2.csv
# Include nulls in join
xsv join --nulls ID file1.csv ID file2.csv
# Cross join (cartesian product)
xsv join --cross 1 file1.csv 1 file2.csv
# Save result
xsv join ID users.csv ID orders.csv -o joined.csv
Join types:
- Default: Inner join (intersection)
--left: Left outer join--right: Right outer join--full: Full outer join--cross: Cartesian product (use with caution)
table - Format as Aligned Table
# Basic table
xsv table data.csv
# With pager
xsv table data.csv | less -S
# Minimum column width
xsv table -w 10 data.csv
# Padding between columns
xsv table -p 4 data.csv
# Limit field length
xsv table -c 20 data.csv
# Combine limits
xsv slice -l 50 data.csv | xsv table -c 30
Note: Requires buffering entire file into memory
sample - Random Sampling
# Sample 100 rows
xsv sample 100 data.csv
# Sample 10% of large file
xsv count data.csv # e.g., 1000000
xsv sample 100000 data.csv
# Save sample
xsv sample 1000 large.csv -o sample.csv
# Sample then analyze
xsv sample 10000 huge.csv | xsv stats --everything
Performance: Uses indexing for samples <10% of total
fmt - Format Output
# Convert to TSV
xsv fmt -t '\t' data.csv -o data.tsv
# Convert to pipe-delimited
xsv fmt -t '|' data.csv
# Add CRLF line endings
xsv fmt --crlf data.csv -o windows.csv
# Quote all fields
xsv fmt --quote-always data.csv
# Custom quote character
xsv fmt --quote "'" data.csv
# Custom escape character
xsv fmt --escape '\\' data.csv
cat - Concatenate Files
# Concatenate by rows (vertically)
xsv cat rows file1.csv file2.csv file3.csv
# Concatenate by columns (horizontally)
xsv cat columns file1.csv file2.csv
# Pad with empty values if different lengths
xsv cat rows-columns file1.csv file2.csv
split - Split Into Multiple Files
# Split into files of 1000 rows each
xsv split -s 1000 output_dir data.csv
# Creates: output_dir/0.csv, output_dir/1.csv, etc.
flatten - Show One Field Per Line
# Flatten first record
xsv slice -i 0 data.csv | xsv flatten
# Output format:
# field,value
# Name,John Doe
# Email,john@example.com
# Age,30
fixlengths - Fix Inconsistent Row Lengths
# Ensure all rows have same number of fields
xsv fixlengths data.csv -o fixed.csv
# Pads short rows with empty fields
# Useful for malformed CSVs
index - Create Index for Performance
# Create index
xsv index data.csv
# Creates: data.csv.idx
# Now operations are faster:
xsv count data.csv # O(1) instead of O(n)
xsv slice -i 1000 data.csv # Direct access
xsv sample 100 data.csv # Fast random access
When to index:
- Large files (>100MB)
- Multiple operations on same file
- Random access patterns (slice, sample)
- Statistics with parallel processing
Common Workflows
Workflow 1: Data Exploration
# 1. Understand structure
xsv headers data.csv
# 2. Count records
xsv count data.csv
# 3. View sample
xsv slice -l 10 data.csv | xsv table
# 4. Get statistics
xsv stats data.csv | xsv table
# 5. Check value distributions
xsv frequency -s Status data.csv | xsv table
Workflow 2: Data Filtering and Selection
# 1. Select relevant columns
xsv select Name,Email,Age,Status data.csv |
# 2. Filter active users
xsv search -s Status "active" |
# 3. Filter by age
xsv search -s Age "^[3-9][0-9]$" |
# 4. Save result
xsv -o active_users_30plus.csv
Workflow 3: Data Analysis Pipeline
# 1. Create index for performance
xsv index large_data.csv
# 2. Sample for quick analysis
xsv sample 10000 large_data.csv |
# 3. Select columns of interest
xsv select Revenue,Region,Product |
# 4. Get statistics
xsv stats --everything |
# 5. View as table
xsv table
Workflow 4: Data Joining
# 1. Join users with orders
xsv join UserID users.csv UserID orders.csv |
# 2. Select relevant columns
xsv select 'UserName,Email,OrderID,OrderDate,Amount' |
# 3. Sort by amount
xsv sort -s Amount -N -R |
# 4. Top 100 orders
xsv slice -l 100 |
# 5. Format and save
xsv table -o top_orders.txt
Workflow 5: Data Cleaning
# 1. Fix row lengths
xsv fixlengths messy.csv |
# 2. Select valid columns
xsv select 1-10 |
# 3. Remove rows with empty email
xsv search -s Email '.+' |
# 4. Sort and deduplicate (using uniq)
xsv sort -s Email |
uniq |
# 5. Save cleaned data
xsv -o cleaned.csv
Workflow 6: Data Transformation
# 1. Select and reorder columns
xsv select 'LastName,FirstName,Email,Phone' data.csv |
# 2. Convert to TSV
xsv fmt -t '\t' |
# 3. Save
xsv -o output.tsv
Workflow 7: Large File Processing
# 1. Create index first
xsv index huge_file.csv
# 2. Get quick count
xsv count huge_file.csv
# 3. Sample for analysis
xsv sample 50000 huge_file.csv |
# 4. Analyze sample
xsv stats --everything |
# 5. View results
xsv table
Performance Tips
1. Create Indexes for Large Files
# One-time cost, speeds up many operations
xsv index large.csv
Speeds up:
count(O(1) instead of O(n))slice(direct access)sample(efficient random access)stats -j(parallel processing)
2. Use Streaming Operations
These don't require reading entire file into memory:
selectsearchslice(with index)headerscount(with index)
3. Avoid Memory-Intensive Operations on Large Files
These require full file in memory:
sorttablestats --medianstats --modefrequency
Solution: Use sample or slice first:
xsv sample 100000 huge.csv | xsv stats --everything
4. Parallel Processing
# Use multiple cores for stats
xsv stats -j 0 data.csv # Auto-detect CPUs
# Specific job count
xsv stats -j 4 data.csv
Requires: Indexed file for best performance
5. Chain Commands Efficiently
# Good: streaming pipeline
xsv select Name,Age data.csv | xsv search -s Age "^[3-9]" | xsv table
# Less efficient: multiple file reads
xsv select Name,Age data.csv -o temp1.csv
xsv search -s Age "^[3-9]" temp1.csv -o temp2.csv
xsv table temp2.csv
Advanced Patterns
Pattern 1: Top N Analysis
# Top 10 customers by revenue
xsv sort -s Revenue -N -R customers.csv | xsv slice -l 10 | xsv table
Pattern 2: Conditional Aggregation
# Count by status
xsv frequency -s Status -l 0 data.csv | xsv table
# Average age by region (requires external tools)
xsv select Region,Age data.csv | xsv sort -s Region | ...
Pattern 3: Multi-Column Search
# Search across multiple columns
xsv select Name,Email,Phone data.csv | xsv search "pattern"
Pattern 4: Data Validation
# Find rows with missing email
xsv search -s Email -v '.+' data.csv
# Find duplicates (by email)
xsv select Email data.csv | xsv sort | uniq -d
Pattern 5: Data Comparison
# Find differences between two files
xsv select ID,Value file1.csv > temp1
xsv select ID,Value file2.csv > temp2
diff temp1 temp2
Pattern 6: Column Statistics
# Stats for specific column
xsv select Age data.csv | xsv stats | xsv table
# Multiple column stats
xsv select Age,Salary,Score data.csv | xsv stats --everything | xsv table
Common Options
Most commands support these options:
-h, --help Display help
-o, --output <file> Write to file instead of stdout
-n, --no-headers First row is data, not headers
-d, --delimiter <char> Input delimiter (default: ,)
Delimiter Support
Reading Different Formats
# TSV (tab-separated)
xsv select 1,3 -d '\t' data.tsv
# Pipe-delimited
xsv select Name,Age -d '|' data.txt
# Semicolon-delimited
xsv select 1-5 -d ';' data.csv
Converting Formats
# TSV to CSV
xsv fmt -d '\t' data.tsv -o data.csv
# CSV to TSV
xsv fmt -t '\t' data.csv -o data.tsv
# CSV to pipe-delimited
xsv fmt -t '|' data.csv -o data.txt
Error Handling
Common Issues
Issue: "CSV error: record has different length"
Solution: Use fixlengths
xsv fixlengths data.csv -o fixed.csv
Issue: "No such file or directory"
Solution: Check file path, use absolute paths if needed
Issue: Out of memory with large file
Solution: Use sampling or indexing
xsv index large.csv
xsv sample 10000 large.csv | xsv stats
Issue: Column name not found
Solution: Check headers first
xsv headers data.csv
Integration with Other Tools
With jq (for CSVāJSON)
# Convert CSV to JSON
xsv select Name,Age data.csv | xsv fmt -t ',' | \
python -c 'import csv, json, sys; print(json.dumps([dict(r) for r in csv.DictReader(sys.stdin)]))'
With awk
# Add computed column
xsv select Price,Quantity data.csv | \
awk -F, 'NR==1{print $0",Total"} NR>1{print $0","$1*$2}'
With sort/uniq
# Deduplicate by column
xsv select Email data.csv | sort | uniq
With grep
# Pre-filter before xsv
cat data.csv | grep "pattern" | xsv table
Quick Reference
# View structure
xsv headers data.csv
xsv count data.csv
xsv slice -l 5 data.csv | xsv table
# Select columns
xsv select 1,3,5 data.csv
xsv select Name,Email data.csv
# Filter rows
xsv search "pattern" data.csv
xsv search -s Email "gmail" data.csv
# Statistics
xsv stats data.csv
xsv frequency -s Status data.csv
# Sort
xsv sort -s Age -N data.csv
# Join
xsv join ID file1.csv ID file2.csv
# Format
xsv table data.csv
xsv fmt -t '\t' data.csv
# Sample
xsv sample 1000 data.csv
# Index (for performance)
xsv index large.csv
Comparison with xlsx
While xlsx handles Excel files, xsv is specialized for CSV:
| Feature | xsv | xlsx |
|---|---|---|
| Format | CSV only | XLSX/Excel |
| Speed | Extremely fast | Fast |
| Memory | Streaming | Depends on operation |
| Formulas | No | Yes |
| Formatting | No | Yes |
| Multiple sheets | No | Yes |
| Statistics | Rich | Basic |
| Joining | Yes | No |
| Indexing | Yes | No |
When to use xsv:
- Working with CSV data
- Need maximum performance
- Large file processing
- Statistical analysis
- Data pipelines
When to use xlsx:
- Excel file format required
- Need formulas and formatting
- Multiple sheets
- Cell-level operations
Summary
Primary tool: xsv for fast CSV processing
Most common commands:
xsv headers- Understand structurexsv select- Choose columnsxsv search- Filter rowsxsv stats- Analyze dataxsv table- View formattedxsv join- Combine filesxsv index- Speed up operations
Key advantages:
- Blazing fast (Rust-based)
- Composable (Unix pipes)
- Low memory (streaming)
- Rich analysis features
- Index support for large files
Best practices:
- Index large files first
- Use sampling for quick exploration
- Chain commands with pipes
- Check headers before operations
- Use appropriate output formats