| name | python-json-parsing |
| description | Python JSON parsing best practices covering performance optimization (orjson/msgspec), handling large files (streaming/JSONL), security (injection prevention), and advanced querying (JSONPath/JMESPath). Use when working with JSON data, parsing APIs, handling large JSON files, or optimizing JSON performance. |
Python JSON Parsing Best Practices
Comprehensive guide to JSON parsing in Python with focus on performance, security, and scalability.
Quick Start
Basic JSON Parsing
import json
# Parse JSON string
data = json.loads('{"name": "Alice", "age": 30}')
# Parse JSON file
with open("data.json", "r", encoding="utf-8") as f:
data = json.load(f)
# Write JSON file
with open("output.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
Key Rule: Always specify encoding="utf-8" when reading/writing files.
When to Use This Skill
Use this skill when:
- Working with JSON APIs or data interchange
- Optimizing JSON performance in high-throughput applications
- Handling large JSON files (> 100MB)
- Securing applications against JSON injection
- Extracting data from complex nested JSON structures
Performance: Choose the Right Library
Library Comparison (10,000 records benchmark)
| Library | Serialize (s) | Deserialize (s) | Best For |
|---|---|---|---|
| orjson | 0.42 | 1.27 | FastAPI, web APIs (3.9x faster) |
| msgspec | 0.49 | 0.93 | Maximum performance (1.7x faster deserialization) |
| json (stdlib) | 1.62 | 1.62 | Universal compatibility |
| ujson | 1.41 | 1.85 | Drop-in replacement (2x faster) |
Recommendation:
- Use orjson for FastAPI/web APIs (native support, fastest serialization)
- Use msgspec for data pipelines (fastest overall, typed validation)
- Use json when compatibility is critical
Installation
# High-performance libraries
pip install orjson msgspec ujson
# Advanced querying
pip install jsonpath-ng jmespath
# Streaming large files
pip install ijson
# Schema validation
pip install jsonschema
Large Files: Streaming Strategies
For files > 100MB, avoid loading into memory.
Strategy 1: JSONL (JSON Lines)
Convert large JSON arrays to line-delimited format:
# Stream process JSONL
with open("large.jsonl", "r") as infile, open("output.jsonl", "w") as outfile:
for line in infile:
obj = json.loads(line)
obj["processed"] = True
outfile.write(json.dumps(obj) + "\n")
Strategy 2: Streaming with ijson
import ijson
# Process large JSON without loading into memory
with open("huge.json", "rb") as f:
for item in ijson.items(f, "products.item"):
process(item) # Handle one item at a time
See: patterns/streaming-large-json.md
Security: Prevent JSON Injection
Critical Rules:
- Always use
json.loads(), nevereval() - Validate input with
jsonschema - Sanitize user input before serialization
- Escape special characters (
"and\)
Vulnerable Code:
# NEVER DO THIS
username = request.GET['username'] # User input: admin", "role": "admin
json_string = f'{{"user":"{username}","role":"user"}}'
# Result: privilege escalation
Secure Code:
# Use json.dumps for serialization
data = {"user": username, "role": "user"}
json_string = json.dumps(data) # Properly escaped
See: anti-patterns/security-json-injection.md, anti-patterns/eval-usage.md
Advanced: JSONPath for Complex Queries
Extract data from nested JSON without complex loops:
import jsonpath_ng as jp
data = {
"products": [
{"name": "Apple", "price": 12.88},
{"name": "Peach", "price": 27.25}
]
}
# Filter by price
query = jp.parse("products[?price>20].name")
results = [match.value for match in query.find(data)]
# Output: ["Peach"]
Key Operators:
$- Root selector..- Recursive descendant*- Wildcard[?<predicate>]- Filter (e.g.,[?price > 20])[start:end:step]- Array slicing
See: patterns/jsonpath-querying.md
Custom Objects: Serialization
Handle datetime, UUID, Decimal, and custom classes:
from datetime import datetime
import json
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime):
return obj.isoformat()
if isinstance(obj, set):
return list(obj)
return super().default(obj)
# Usage
data = {"timestamp": datetime.now(), "tags": {"python", "json"}}
json_str = json.dumps(data, cls=CustomEncoder)
See: patterns/custom-object-serialization.md
Performance Checklist
- Use orjson/msgspec for high-throughput applications
- Specify UTF-8 encoding when reading/writing files
- Use streaming (ijson/JSONL) for files > 100MB
- Minify JSON for production (
separators=(',', ':')) - Pretty-print for development (
indent=2)
Security Checklist
- Never use
eval()for JSON parsing - Validate input with
jsonschema - Sanitize user input before serialization
- Use
json.dumps()to prevent injection - Escape special characters in user data
Reference Documentation
Performance:
reference/python-json-parsing-best-practices-2025.md- Comprehensive research with benchmarks
Patterns:
patterns/streaming-large-json.md- ijson and JSONL strategiespatterns/custom-object-serialization.md- Handle datetime, UUID, custom classespatterns/jsonpath-querying.md- Advanced nested data extraction
Security:
anti-patterns/security-json-injection.md- Prevent injection attacksanti-patterns/eval-usage.md- Why never to use eval()
Examples:
examples/high-performance-parsing.py- orjson and msgspec codeexamples/large-file-streaming.py- Streaming with ijsonexamples/secure-validation.py- jsonschema validation
Tools:
tools/json-performance-benchmark.py- Benchmark different libraries