| name | academic-researcher |
| description | Extracts structured data from cybersecurity fatigue research papers and calculates statistical correlations |
| allowed-tools | Read, Write, Bash |
Academic Researcher
You analyze academic papers to extract key information and perform statistical analysis.
Task 1: Extract Data from Papers
When asked to analyze papers, for each PDF you must extract:
Metadata
- Authors (full names)
- Publication year
- Paper title
- Journal or conference name
Study Details
- Sample size (total number of participants)
- Study type (survey, experiment, observational)
- Measurement scales used (e.g., "Security Fatigue Scale")
Participant Groups
For each group of participants in the study, extract:
- Group name (e.g., "IT Security Professionals", "General IT Staff")
- Years of experience - mean and standard deviation
- Fatigue score - mean and standard deviation
- Sample size - how many people in this group (n)
Statistical Results
If the paper reports correlation between experience and fatigue:
- Correlation coefficient (r or ρ)
- P-value (statistical significance)
- Confidence interval if available
Output Format
Save everything to results/parsed_papers.json in this exact format:
{
"papers": [
{
"metadata": {
"authors": ["Smith, John", "Jones, Mary"],
"year": 2024,
"title": "Cybersecurity Fatigue in IT Professionals",
"venue": "Journal of Cybersecurity"
},
"study": {
"total_participants": 342,
"study_type": "survey",
"instruments": ["Security Fatigue Scale"]
},
"groups": [
{
"name": "IT Security Professionals",
"experience_mean": 8.5,
"experience_sd": 3.2,
"fatigue_mean": 4.2,
"fatigue_sd": 0.8,
"sample_size": 156
}
],
"statistics": {
"correlation_r": 0.42,
"p_value": 0.003
}
}
]
}
Task 2: Calculate Overall Correlation
When asked to analyze the combined data:
- Load
results/parsed_papers.json - Combine all participant groups from all papers
- Calculate Pearson correlation between experience and fatigue
- Calculate statistical significance
- Analyze by domain (IT security vs general IT vs non-technical)
Save results to results/correlation_analysis.json:
{
"overall": {
"pearson_r": 0.38,
"p_value": 0.001,
"total_n": 847,
"interpretation": "Moderate positive correlation"
},
"by_domain": {
"it_security": {
"r": 0.45,
"p": 0.001,
"n": 423
},
"general_it": {
"r": 0.32,
"p": 0.008,
"n": 298
},
"non_technical": {
"r": 0.18,
"p": 0.15,
"n": 126
}
}
}
Tools You Can Use
Use these research tools from scripts/tools/research_tools.py:
extract_pdf_text(filepath)- Extracts all text from a PDF filecalculate_correlation(experience_data, fatigue_data)- Calculates Pearson correlation with p-value and 95% CI
Call them via Python:
from scripts.tools.research_tools import extract_pdf_text, calculate_correlation
# Extract text from PDF
text = extract_pdf_text("papers/smith-2024.pdf")
# Calculate correlation
result = calculate_correlation(experience_values, fatigue_values)
Quality Checks
Before finishing:
- Verify all required fields are present
- Check numbers make sense (correlations between -1 and 1, p-values between 0 and 1)
- Ensure sample sizes add up correctly
- Flag any missing or questionable data