Claude Code Plugins

Community-maintained marketplace

Feedback

Traversing Citation Networks

@kthorn/research-superpower
2
0

Smart backward and forward citation following via Semantic Scholar, with relevance filtering and deduplication

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name Traversing Citation Networks
description Smart backward and forward citation following via Semantic Scholar, with relevance filtering and deduplication
when_to_use After finding relevant paper. When need to find related work. When following references or citations. When building citation graph. When exploring paper connections.
version 1.0.0

Traversing Citation Networks

Overview

Intelligently follow citations backward (references) and forward (citing papers) using Semantic Scholar API.

Core principle: Only follow citations relevant to user's query. Avoid exponential explosion by filtering before traversing.

When to Use

Use this skill when:

  • Found a highly relevant paper (score ≥ 7)
  • Need to find related work
  • User asks "what papers cite this?"
  • Building comprehensive understanding of a topic

When NOT to use:

  • Paper scored < 7 (not relevant enough to follow)
  • Already at 50 papers (check with user first)
  • Citations look off-topic from abstract

Citation Traversal Strategy

1. Get Paper ID from Semantic Scholar

Lookup by DOI:

curl "https://api.semanticscholar.org/graph/v1/paper/DOI:10.1234/example.2023?fields=paperId,title,year"

Response:

{
  "paperId": "abc123def456",
  "title": "Paper Title",
  "year": 2023
}

Save paperId - needed for citations/references queries

2. Backward Traversal (References)

Get references from paper:

curl "https://api.semanticscholar.org/graph/v1/paper/abc123def456/references?fields=contexts,intents,title,year,abstract,externalIds&limit=100"

Response format:

{
  "data": [
    {
      "citedPaper": {
        "paperId": "xyz789",
        "title": "Referenced Paper Title",
        "year": 2020,
        "abstract": "...",
        "externalIds": {
          "DOI": "10.5678/referenced.2020",
          "PubMed": "87654321"
        }
      },
      "contexts": [
        "...as described in previous work [15]...",
        "...we used the method from [15] to..."
      ],
      "intents": ["methodology", "background"]
    }
  ]
}

Filter for relevance:

For each reference, check:

  1. Context keywords: Do citation contexts mention user's query terms?
    • Example: If user asks about "IC50 values", look for contexts mentioning "IC50", "activity", "potency"
  2. Title match: Does title contain relevant keywords?
  3. Intent: Is intent "methodology" or "result" (more relevant) vs "background" (less relevant)?

Scoring:

  • Context keywords match: +3 points
  • Title keywords match: +2 points
  • Intent is methodology/result: +2 points
  • Recent (< 5 years old): +1 point

Only add to queue if score ≥ 5

3. Forward Traversal (Citations)

Get papers citing this one:

curl "https://api.semanticscholar.org/graph/v1/paper/abc123def456/citations?fields=title,year,abstract,externalIds&limit=100"

Response format:

{
  "data": [
    {
      "citingPaper": {
        "paperId": "def456ghi",
        "title": "Newer Paper Citing This",
        "year": 2024,
        "abstract": "We extended the work of [original paper]...",
        "externalIds": {
          "DOI": "10.9012/citing.2024"
        }
      }
    }
  ]
}

Filter for relevance:

For each citing paper:

  1. Title match: Keywords present in title?
  2. Abstract match: User's query terms in abstract?
  3. Recency: Newer papers often build on findings (prioritize < 2 years)
  4. Citation count: If Semantic Scholar provides, highly cited papers more likely relevant

Scoring:

  • Title keywords match: +3 points
  • Abstract keywords match: +2 points
  • Recent (< 2 years): +2 points
  • Moderate recency (2-5 years): +1 point

Only add to queue if score ≥ 5

4. Deduplication

Before adding to queue:

Check papers-reviewed.json:

doi = paper["externalIds"].get("DOI")
if doi in papers_reviewed:
    skip  # Already processed
else:
    add to queue

CRITICAL: After evaluating any paper from citation traversal, add it to papers-reviewed.json regardless of score. This prevents re-processing the same paper from multiple sources.

Track citation relationship in citations/citation-graph.json:

{
  "10.1234/example.2023": {
    "references": ["10.5678/ref1.2020", "10.5678/ref2.2021"],
    "cited_by": ["10.9012/cite1.2024", "10.9012/cite2.2024"]
  }
}

CRITICAL: Use ONLY citation-graph.json for citation tracking. Do NOT create custom files like forward_citation_pmids.txt or citation_analysis.md. All findings go in SUMMARY.md.

5. Process Queue

Add relevant citations to processing queue:

{
  "doi": "10.5678/referenced.2020",
  "title": "Referenced Paper",
  "relevance_score": 7,
  "source": "backward_from:10.1234/example.2023",
  "context": "Method citation - describes IC50 measurement protocol"
}

Then:

  • Evaluate using evaluating-paper-relevance skill
  • If relevant, extract data and potentially traverse its citations too

Smart Traversal Limits

To avoid explosion:

  • Only traverse papers scoring ≥ 7 in initial evaluation
  • Only follow citations scoring ≥ 5 in relevance filtering
  • Limit traversal depth to 2 levels (original → references → references of references)
  • Check with user after every 50 papers total

Breadth-first strategy:

  1. Get all references + citations for current paper
  2. Filter and score them
  3. Add high-scoring ones to queue
  4. Process next paper in queue
  5. Repeat until queue empty or hit limit

Progress Reporting

Report as you traverse:

🔗 Analyzing citations for: "Original Paper Title"
   → Found 45 references, 12 look relevant
   → Found 23 citing papers, 8 look relevant
   → Adding 20 papers to queue

📄 [51/127] Following reference: "Method for measuring IC50"
   Source: Referenced by original paper in Methods section
   Abstract score: 7 → Fetching full text...

API Rate Limiting

Semantic Scholar limits:

  • Free tier: 100 requests per 5 minutes
  • With API key: 1000 requests per 5 minutes

Be efficient:

  • Request multiple fields in one call (?fields=title,abstract,externalIds,year)
  • Use limit=100 to get more results per request
  • Cache responses - don't re-fetch same paper

If rate limited:

  • Wait 5 minutes
  • Report to user: "⏸️ Rate limited by Semantic Scholar API. Waiting 5 minutes..."
  • Consider getting API key for higher limits

Integration with Other Skills

After traversing citations:

  1. Queue now has N new papers to evaluate
  2. For each, use evaluating-paper-relevance skill
  3. If relevant, extract to SUMMARY.md
  4. If highly relevant (≥9), traverse its citations too
  5. Update citation-graph.json to track relationships

Quick Reference

Task API Endpoint
Get paper by DOI GET /graph/v1/paper/DOI:{doi}?fields=paperId,title
Get references GET /graph/v1/paper/{paperId}/references?fields=contexts,title,abstract,externalIds
Get citations GET /graph/v1/paper/{paperId}/citations?fields=title,abstract,externalIds
Check if processed Look up DOI in papers-reviewed.json
Filter relevance Score based on context/title/intent/recency

Relevance Filtering Checklist

Before adding citation to queue:

  • Check if already in papers-reviewed.json (skip if yes)
  • Score based on context/title keywords (need ≥ 5)
  • Verify external ID (DOI or PMID) exists
  • Add source tracking ("backward_from:DOI" or "forward_from:DOI")
  • Add to queue with metadata

Common Mistakes

Not tracking all evaluated papers: Only adding relevant papers to papers-reviewed.json → Add EVERY paper after evaluation to prevent re-review Creating custom analysis files: Making forward_citation_pmids.txt, CITATION_ANALYSIS.md, etc. → Use ONLY citation-graph.json and SUMMARY.md Following all citations: Exponential explosion → Filter before adding to queue Ignoring context: Citation might be tangential → Read context strings Not deduplicating: Re-process same papers → Always check papers-reviewed.json before and after evaluation Too deep: Following 5+ levels → Limit to 2 levels, check with user Missing forward citations: Only checking references → Use both backward and forward No rate limiting awareness: API blocks you → Add delays, handle 429 errors

Example Workflow

1. User asks: "Find selectivity data for BTK inhibitors"
2. Search finds Paper A (score: 9, has great IC50 data)
3. Traverse citations for Paper A:
   - References: 45 total, 12 relevant (mention "selectivity", "IC50")
   - Citations: 23 total, 8 relevant (newer papers on BTK)
4. Add 20 papers to queue
5. Evaluate first queued paper (score: 8)
6. Extract data, traverse its citations (add 5 more)
7. Continue until queue empty or user says stop

Next Steps

After traversing citations:

  • Process queued papers with evaluating-paper-relevance
  • Update SUMMARY.md with new findings
  • Check if reached checkpoint (50 papers or 5 minutes)
  • If checkpoint: ask user to continue or stop