| name | Traversing Citation Networks |
| description | Smart backward and forward citation following via Semantic Scholar, with relevance filtering and deduplication |
| when_to_use | After finding relevant paper. When need to find related work. When following references or citations. When building citation graph. When exploring paper connections. |
| version | 1.0.0 |
Traversing Citation Networks
Overview
Intelligently follow citations backward (references) and forward (citing papers) using Semantic Scholar API.
Core principle: Only follow citations relevant to user's query. Avoid exponential explosion by filtering before traversing.
When to Use
Use this skill when:
- Found a highly relevant paper (score ≥ 7)
- Need to find related work
- User asks "what papers cite this?"
- Building comprehensive understanding of a topic
When NOT to use:
- Paper scored < 7 (not relevant enough to follow)
- Already at 50 papers (check with user first)
- Citations look off-topic from abstract
Citation Traversal Strategy
1. Get Paper ID from Semantic Scholar
Lookup by DOI:
curl "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example.2023?fields=paperId,title,year"
Response:
{
"paperId": "abc123def456",
"title": "Paper Title",
"year": 2023
}
Save paperId - needed for citations/references queries
2. Backward Traversal (References)
Get references from paper:
curl "https://api.semanticscholar.org/graph/v1/paper/abc123def456/references?fields=contexts,intents,title,year,abstract,externalIds&limit=100"
Response format:
{
"data": [
{
"citedPaper": {
"paperId": "xyz789",
"title": "Referenced Paper Title",
"year": 2020,
"abstract": "...",
"externalIds": {
"DOI": "10.5678/referenced.2020",
"PubMed": "87654321"
}
},
"contexts": [
"...as described in previous work [15]...",
"...we used the method from [15] to..."
],
"intents": ["methodology", "background"]
}
]
}
Filter for relevance:
For each reference, check:
- Context keywords: Do citation contexts mention user's query terms?
- Example: If user asks about "IC50 values", look for contexts mentioning "IC50", "activity", "potency"
- Title match: Does title contain relevant keywords?
- Intent: Is intent "methodology" or "result" (more relevant) vs "background" (less relevant)?
Scoring:
- Context keywords match: +3 points
- Title keywords match: +2 points
- Intent is methodology/result: +2 points
- Recent (< 5 years old): +1 point
Only add to queue if score ≥ 5
3. Forward Traversal (Citations)
Get papers citing this one:
curl "https://api.semanticscholar.org/graph/v1/paper/abc123def456/citations?fields=title,year,abstract,externalIds&limit=100"
Response format:
{
"data": [
{
"citingPaper": {
"paperId": "def456ghi",
"title": "Newer Paper Citing This",
"year": 2024,
"abstract": "We extended the work of [original paper]...",
"externalIds": {
"DOI": "10.9012/citing.2024"
}
}
}
]
}
Filter for relevance:
For each citing paper:
- Title match: Keywords present in title?
- Abstract match: User's query terms in abstract?
- Recency: Newer papers often build on findings (prioritize < 2 years)
- Citation count: If Semantic Scholar provides, highly cited papers more likely relevant
Scoring:
- Title keywords match: +3 points
- Abstract keywords match: +2 points
- Recent (< 2 years): +2 points
- Moderate recency (2-5 years): +1 point
Only add to queue if score ≥ 5
4. Deduplication
Before adding to queue:
Check papers-reviewed.json:
doi = paper["externalIds"].get("DOI")
if doi in papers_reviewed:
skip # Already processed
else:
add to queue
CRITICAL: After evaluating any paper from citation traversal, add it to papers-reviewed.json regardless of score. This prevents re-processing the same paper from multiple sources.
Track citation relationship in citations/citation-graph.json:
{
"10.1234/example.2023": {
"references": ["10.5678/ref1.2020", "10.5678/ref2.2021"],
"cited_by": ["10.9012/cite1.2024", "10.9012/cite2.2024"]
}
}
CRITICAL: Use ONLY citation-graph.json for citation tracking. Do NOT create custom files like forward_citation_pmids.txt or citation_analysis.md. All findings go in SUMMARY.md.
5. Process Queue
Add relevant citations to processing queue:
{
"doi": "10.5678/referenced.2020",
"title": "Referenced Paper",
"relevance_score": 7,
"source": "backward_from:10.1234/example.2023",
"context": "Method citation - describes IC50 measurement protocol"
}
Then:
- Evaluate using
evaluating-paper-relevanceskill - If relevant, extract data and potentially traverse its citations too
Smart Traversal Limits
To avoid explosion:
- Only traverse papers scoring ≥ 7 in initial evaluation
- Only follow citations scoring ≥ 5 in relevance filtering
- Limit traversal depth to 2 levels (original → references → references of references)
- Check with user after every 50 papers total
Breadth-first strategy:
- Get all references + citations for current paper
- Filter and score them
- Add high-scoring ones to queue
- Process next paper in queue
- Repeat until queue empty or hit limit
Progress Reporting
Report as you traverse:
🔗 Analyzing citations for: "Original Paper Title"
→ Found 45 references, 12 look relevant
→ Found 23 citing papers, 8 look relevant
→ Adding 20 papers to queue
📄 [51/127] Following reference: "Method for measuring IC50"
Source: Referenced by original paper in Methods section
Abstract score: 7 → Fetching full text...
API Rate Limiting
Semantic Scholar limits:
- Free tier: 100 requests per 5 minutes
- With API key: 1000 requests per 5 minutes
Be efficient:
- Request multiple fields in one call (
?fields=title,abstract,externalIds,year) - Use
limit=100to get more results per request - Cache responses - don't re-fetch same paper
If rate limited:
- Wait 5 minutes
- Report to user: "⏸️ Rate limited by Semantic Scholar API. Waiting 5 minutes..."
- Consider getting API key for higher limits
Integration with Other Skills
After traversing citations:
- Queue now has N new papers to evaluate
- For each, use
evaluating-paper-relevanceskill - If relevant, extract to SUMMARY.md
- If highly relevant (≥9), traverse its citations too
- Update citation-graph.json to track relationships
Quick Reference
| Task | API Endpoint |
|---|---|
| Get paper by DOI | GET /graph/v1/paper/DOI:{doi}?fields=paperId,title |
| Get references | GET /graph/v1/paper/{paperId}/references?fields=contexts,title,abstract,externalIds |
| Get citations | GET /graph/v1/paper/{paperId}/citations?fields=title,abstract,externalIds |
| Check if processed | Look up DOI in papers-reviewed.json |
| Filter relevance | Score based on context/title/intent/recency |
Relevance Filtering Checklist
Before adding citation to queue:
- Check if already in papers-reviewed.json (skip if yes)
- Score based on context/title keywords (need ≥ 5)
- Verify external ID (DOI or PMID) exists
- Add source tracking ("backward_from:DOI" or "forward_from:DOI")
- Add to queue with metadata
Common Mistakes
Not tracking all evaluated papers: Only adding relevant papers to papers-reviewed.json → Add EVERY paper after evaluation to prevent re-review Creating custom analysis files: Making forward_citation_pmids.txt, CITATION_ANALYSIS.md, etc. → Use ONLY citation-graph.json and SUMMARY.md Following all citations: Exponential explosion → Filter before adding to queue Ignoring context: Citation might be tangential → Read context strings Not deduplicating: Re-process same papers → Always check papers-reviewed.json before and after evaluation Too deep: Following 5+ levels → Limit to 2 levels, check with user Missing forward citations: Only checking references → Use both backward and forward No rate limiting awareness: API blocks you → Add delays, handle 429 errors
Example Workflow
1. User asks: "Find selectivity data for BTK inhibitors"
2. Search finds Paper A (score: 9, has great IC50 data)
3. Traverse citations for Paper A:
- References: 45 total, 12 relevant (mention "selectivity", "IC50")
- Citations: 23 total, 8 relevant (newer papers on BTK)
4. Add 20 papers to queue
5. Evaluate first queued paper (score: 8)
6. Extract data, traverse its citations (add 5 more)
7. Continue until queue empty or user says stop
Next Steps
After traversing citations:
- Process queued papers with
evaluating-paper-relevance - Update SUMMARY.md with new findings
- Check if reached checkpoint (50 papers or 5 minutes)
- If checkpoint: ask user to continue or stop