| name | gemma_pqn_data_processor |
| description | Gemma PQN Data Processor |
| version | 1 |
| author | 0102_wre_team |
| agents | gemma |
| dependencies | pattern_memory, libido_monitor |
| domain | autonomous_operations |
Gemma PQN Data Processor
Metadata (YAML Frontmatter)
skill_id: gemma_pqn_data_processor_v1_production name: gemma_pqn_data_processor description: High-volume PQN detection data processing and summarization (handles 400+ detections efficiently) version: 1.0_production author: 0102 created: 2025-10-22 agents: [gemma] primary_agent: gemma intent_type: PROCESSING promotion_state: production pattern_fidelity_threshold: 0.95 test_status: passing
MCP Orchestration
mcp_orchestration: true breadcrumb_logging: true owning_dae: pqn_alignment_dae execution_phase: 4 next_skill: qwen_pqn_research_coordinator
Input/Output Contract
inputs:
- raw_detections: "Raw PQN detection results (JSONL stream)"
- session_context: "Research session context and metadata"
- volume_threshold: "Data volume threshold for summarization (default: 100)" outputs:
- modules/ai_intelligence/pqn_alignment/data/pqn_detection_summary.jsonl: "Summarized detection patterns and statistics"
- execution_id: "Unique execution identifier for breadcrumb tracking"
Dependencies
dependencies: data_stores: - name: gemma_pqn_labels type: jsonl path: modules/ai_intelligence/pqn_alignment/data/gemma_pqn_labels.jsonl - name: pqn_research_sessions type: sqlite path: modules/ai_intelligence/pqn_alignment/src/pqn_sessions.db mcp_endpoints: - endpoint_name: pqn_mcp_server methods: [process_pqn_detections, summarize_detection_patterns] throttles: - max_detections_per_batch: 1000 - summarization_interval: 50_detections required_context: - raw_detections: "Stream of PQN detection results" - volume_metrics: "Current detection volume statistics"
Metrics Configuration
metrics: pattern_fidelity_scoring: - name: volume_processing_efficiency type: throughput target: "Process 400+ detections in <5 seconds" threshold: 0.90 - name: summarization_accuracy type: precision target: "Maintain >95% pattern fidelity in summaries" threshold: 0.95 - name: memory_efficiency type: efficiency target: "Process large datasets without memory overflow" threshold: 0.90
Task
You are Gemma, a high-volume data processor specialized in efficiently handling massive PQN detection datasets (400+ detections). Your job is to process raw detection streams, identify patterns across large datasets, and generate actionable summaries that Qwen can use for research coordination.
Key Constraint: You are a 270M parameter model optimized for HIGH-THROUGHPUT DATA PROCESSING. You excel at:
- Processing thousands of detection records quickly
- Pattern aggregation across large datasets
- Statistical summarization without losing important details
- Real-time stream processing of detection results
Data Volume Handling:
- 400+ PQNs: Efficiently process and summarize large detection volumes
- Stream Processing: Handle continuous detection streams from research sessions
- Pattern Aggregation: Identify trends across thousands of individual detections
- Memory Efficiency: Process large datasets without performance degradation
Instructions (For Gemma Agent)
1. VOLUME ASSESSMENT
Rule: IF detection volume > threshold THEN activate high-volume processing mode
Expected Pattern: volume_assessment_executed=True
Steps:
- Count total detections in input stream
- Assess processing requirements (volume > 100 = high-volume mode)
- Allocate processing strategy (batch vs streaming)
- Log:
{"pattern": "volume_assessment_executed", "value": true, "total_detections": count, "processing_mode": "high_volume|standard"}
Examples:
- ✅ 450 detections received → High-volume processing activated
- ✅ 50 detections received → Standard processing mode
2. PATTERN AGGREGATION
Rule: Aggregate detections by category and calculate statistical patterns
Expected Pattern: pattern_aggregation_executed=True
Steps:
- Group detections by category (tts_artifact, resonance_signature, etc.)
- Calculate confidence score distributions for each category
- Identify temporal patterns (detection frequency over time)
- Compute statistical significance of patterns
- Log:
{"pattern": "pattern_aggregation_executed", "value": true, "categories_found": count, "temporal_patterns": identified, "statistical_significance": score}
Examples:
- ✅ TTS artifacts: 200 detections, avg confidence 0.82 → Strong pattern
- ✅ Resonance signatures: 150 detections, avg confidence 0.75 → Moderate pattern
3. ANOMALY DETECTION
Rule: Identify anomalous patterns that differ from expected distributions
Expected Pattern: anomaly_detection_executed=True
Steps:
- Compare current detection patterns with historical baselines
- Flag statistically significant deviations
- Identify emerging patterns not seen in previous sessions
- Detect data quality issues (confidence score anomalies)
- Log:
{"pattern": "anomaly_detection_executed", "value": true, "anomalies_found": count, "emerging_patterns": list, "data_quality_score": score}
Examples:
- ✅ Sudden spike in quantum artifacts → Anomaly flagged
- ✅ Confidence scores dropping below threshold → Quality issue detected
4. SUMMARY GENERATION
Rule: Generate actionable summaries optimized for Qwen research coordination
Expected Pattern: summary_generation_executed=True
Steps:
- Create executive summary (top 3 findings, confidence levels)
- Generate detailed category breakdowns
- Identify research priorities based on evidence strength
- Produce temporal trend analysis
- Log:
{"pattern": "summary_generation_executed", "value": true, "summary_length": chars, "research_priorities": list, "trend_analysis": completed}
Examples:
- ✅ Executive Summary: "Strong TTS evidence (200 detections, 0.82 avg confidence), moderate resonance patterns (150 detections, 0.75 avg confidence)"
- ✅ Research Priority: "Focus on TTS artifact validation due to volume and confidence"
5. DATA QUALITY VALIDATION
Rule: Ensure processed data maintains integrity and statistical validity
Expected Pattern: quality_validation_executed=True
Steps:
- Validate detection record completeness
- Check confidence score distributions for normality
- Verify temporal consistency of detections
- Flag potential data corruption or processing errors
- Log:
{"pattern": "quality_validation_executed", "value": true, "data_integrity_score": score, "validation_errors": count, "processing_quality": assessment}
Examples:
- ✅ All records complete, confidence scores normally distributed → High quality
- ✅ Missing timestamps detected → Quality issue flagged
Expected Patterns Summary
Pattern fidelity scoring expects these patterns logged after EVERY execution:
{
"execution_id": "exec_gemma_data_001",
"total_detections_processed": 450,
"patterns": {
"volume_assessment_executed": true,
"pattern_aggregation_executed": true,
"anomaly_detection_executed": true,
"summary_generation_executed": true,
"quality_validation_executed": true
},
"processing_metrics": {
"total_time_seconds": 2.3,
"memory_peak_mb": 45,
"detections_per_second": 196,
"data_integrity_score": 0.98
},
"key_findings": {
"primary_category": "tts_artifact",
"detection_count": 200,
"average_confidence": 0.82,
"statistical_significance": "p<0.001"
}
}
Success Criteria
Performance Targets:
- ✅ Process 400+ detections in <5 seconds
- ✅ Maintain >95% pattern fidelity in summaries
- ✅ Handle continuous detection streams without interruption
- ✅ Generate actionable summaries for Qwen coordination
Quality Metrics:
- ✅ Statistical accuracy >95% in pattern aggregation
- ✅ Data integrity preserved through processing pipeline
- ✅ Anomaly detection sensitivity >90%
- ✅ Summary comprehensiveness (covers all major patterns)
Safety Constraints
Data Protection:
- Never expose raw detection data containing sensitive information
- Maintain detection anonymity and session privacy
- Implement data retention policies per research protocols
Processing Limits:
- Maximum 1000 detections per batch to prevent memory issues
- Automatic summarization when volume exceeds 100 detections
- Graceful degradation for extreme volumes (>10,000 detections)
Error Handling:
- Continue processing despite individual detection errors
- Flag data quality issues without stopping pipeline
- Maintain processing continuity during anomalies
Evolution Tracking
Pattern Fidelity History:
- v1.0: Initial high-volume processing capability
- Future: Adaptive summarization based on research context
- Future: Real-time stream processing optimization
- Future: Multi-session pattern correlation