name: processing-data-type description: Use this skill when processing [data type], transforming datasets, or analyzing [domain] data. This includes [specific operations like filtering, aggregating, converting]. Keywords: [data type], data, process, transform, analyze, [format names].
[Data Type] Processing
Process, transform, and analyze [data type] using CLI tools and Node.js.
Supported Formats
| Format | Read | Write | Tools |
|---|---|---|---|
| JSON | Yes | Yes | jq, Node.js |
| CSV | Yes | Yes | csvkit, Node.js |
| [Other] | Yes | Yes | [tools] |
CLI Tools
# JSON processing
jq '.field' file.json
jq 'select(.active == true)' file.json
# CSV processing (if using csvkit)
csvcut -c column1,column2 file.csv
csvgrep -c column -m "value" file.csv
# Text processing
awk '{print $1}' file.txt
sort | uniq -c | sort -rn
Processing Approach
1. Data Exploration
First, understand the data structure:
# JSON: View structure
jq 'keys' data.json
jq '.[0]' data.json # First record
# CSV: View headers and sample
head -5 data.csv
Node.js:
#!/usr/bin/env node
import { readFile } from 'fs/promises';
const data = JSON.parse(await readFile('data.json', 'utf-8'));
console.log('Keys:', Object.keys(data[0]));
console.log('Sample:', data.slice(0, 3));
2. Data Transformation
Transform data to required format:
Filtering:
#!/usr/bin/env node
import { readFile, writeFile } from 'fs/promises';
const data = JSON.parse(await readFile('input.json', 'utf-8'));
const filtered = data.filter(item => item.active && item.value > 100);
await writeFile('output.json', JSON.stringify(filtered, null, 2));
Mapping:
const transformed = data.map(item => ({
id: item.id,
name: item.firstName + ' ' + item.lastName,
total: item.price * item.quantity
}));
Aggregating:
const summary = data.reduce((acc, item) => {
acc.total += item.amount;
acc.count += 1;
acc.byCategory[item.category] = (acc.byCategory[item.category] || 0) + 1;
return acc;
}, { total: 0, count: 0, byCategory: {} });
3. Data Validation
Verify processed data meets requirements:
#!/usr/bin/env node
import { readFile } from 'fs/promises';
const data = JSON.parse(await readFile('output.json', 'utf-8'));
// Validation checks
const errors = [];
if (!Array.isArray(data)) {
errors.push('Data must be an array');
}
for (const [i, item] of data.entries()) {
if (!item.id) errors.push(`Record ${i}: missing id`);
if (typeof item.value !== 'number') errors.push(`Record ${i}: value must be number`);
}
if (errors.length > 0) {
console.error('Validation errors:', errors);
process.exit(1);
}
console.log(`Validated ${data.length} records successfully`);
4. Output Generation
Write results in desired format:
#!/usr/bin/env node
import { writeFile } from 'fs/promises';
// JSON output
await writeFile('output.json', JSON.stringify(data, null, 2));
// CSV output
const headers = Object.keys(data[0]).join(',');
const rows = data.map(item => Object.values(item).join(','));
await writeFile('output.csv', [headers, ...rows].join('\n'));
Examples
Example 1: Filter and Aggregate
User Query: "Filter orders over $100 and sum by category"
Solution:
#!/usr/bin/env node
import { readFile, writeFile } from 'fs/promises';
const orders = JSON.parse(await readFile('orders.json', 'utf-8'));
const result = orders
.filter(order => order.total > 100)
.reduce((acc, order) => {
acc[order.category] = (acc[order.category] || 0) + order.total;
return acc;
}, {});
console.log('Totals by category:', result);
await writeFile('category-totals.json', JSON.stringify(result, null, 2));
Example 2: Join Datasets
User Query: "Combine users and orders by user_id"
Solution:
#!/usr/bin/env node
import { readFile, writeFile } from 'fs/promises';
const users = JSON.parse(await readFile('users.json', 'utf-8'));
const orders = JSON.parse(await readFile('orders.json', 'utf-8'));
const userMap = new Map(users.map(u => [u.id, u]));
const enrichedOrders = orders.map(order => ({
...order,
user: userMap.get(order.user_id)
}));
await writeFile('enriched-orders.json', JSON.stringify(enrichedOrders, null, 2));
Example 3: CSV to JSON Conversion
User Query: "Convert this CSV to JSON"
Solution:
#!/usr/bin/env node
import { readFile, writeFile } from 'fs/promises';
const csv = await readFile('data.csv', 'utf-8');
const lines = csv.trim().split('\n');
const headers = lines[0].split(',');
const data = lines.slice(1).map(line => {
const values = line.split(',');
return Object.fromEntries(headers.map((h, i) => [h, values[i]]));
});
await writeFile('data.json', JSON.stringify(data, null, 2));
console.log(`Converted ${data.length} records`);
Best Practices
- Explore data structure before processing
- Validate input data before transformation
- Handle edge cases (empty arrays, null values)
- Use streaming for large files
- Preserve data types during conversion
- Include error handling in scripts
Validation Checklist
- Input data loaded successfully
- Transformation produces expected structure
- No data loss during processing
- Output format is correct
- Edge cases handled (empty, null, duplicates)
Troubleshooting
Issue: JSON Parse Error
Symptoms: SyntaxError: Unexpected token
Solution: Validate JSON structure
jq '.' file.json # Will show parse errors
Issue: Memory Error on Large Files
Symptoms: JavaScript heap out of memory
Solution: Use streaming
import { createReadStream } from 'fs';
import { createInterface } from 'readline';
const rl = createInterface({
input: createReadStream('large-file.jsonl')
});
for await (const line of rl) {
const record = JSON.parse(line);
// Process one record at a time
}