Claude Code Plugins

Community-maintained marketplace

Feedback

Scrape competition websites, extract structured data, and auto-persist to GitHub issues. Creates issues for new competitions, adds comments for duplicates.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name comp-scout-scrape
description Scrape competition websites, extract structured data, and auto-persist to GitHub issues. Creates issues for new competitions, adds comments for duplicates.

Competition Scraper

Scrape creative writing competitions from Australian aggregator sites and automatically persist to GitHub.

What This Skill Does

  1. Scrapes competitions.com.au and netrewards.com.au
  2. Extracts structured data (dates, prompts, prizes)
  3. Checks for duplicates against existing GitHub issues (by URL and title similarity)
  4. Creates issues for NEW competitions only
  5. Adds comments to existing issues when same competition found on another site
  6. Skips competitions that are already tracked

The scraper already filters out sponsored/lottery ads. Your job is to check for duplicates, then persist only new competitions.

What Counts as "New"

A competition is NEW if:

  • Its URL is not found in any existing issue body (check the full body text, not just the primary URL field)
  • AND its normalized title is <80% similar to all existing issue titles

A competition is a DUPLICATE if:

  • Its URL appears anywhere in an existing issue (body text, comments) → already tracked, skip
  • Its normalized title is >80% similar to an existing issue title → likely same competition, skip
  • Same competition found on a different aggregator site → add comment to existing issue noting the alternate URL

Note: An issue body may contain multiple URLs (one per aggregator site). When checking for duplicates, search the entire issue body for the scraped URL, not just a specific field.

Word Limit Clarification

"25WOL" is a category name, NOT a filter. Competitions with 25, 50, or 100 word limits are all valid creative writing competitions - persist them all (if new).

Prerequisites

pip install playwright
playwright install chromium

Also requires:

  • gh CLI authenticated
  • Target repository for competition data (not this skills repo)

Workflow

Step 1: Determine Target Repository

The target repo stores competition issues. Specify or get from config:

# From workspace config (if hiivmind-pulse-gh initialized)
TARGET_REPO=$(yq '.repositories[0].full_name' .hiivmind/github/config.yaml 2>/dev/null)

# Or use default/specified
TARGET_REPO="${TARGET_REPO:-discreteds/competition-data}"

Step 2: Scrape Listings

Run the scraper to get structured competition data:

python skills/comp-scout-scrape/scraper.py listings

Output:

{
  "competitions": [
    {
      "url": "https://competitions.com.au/win-example/",
      "site": "competitions.com.au",
      "title": "Win a $500 Gift Card",
      "normalized_title": "500 gift card",
      "brand": "Example Brand",
      "prize_summary": "$500",
      "prize_value": 500,
      "closing_date": "2024-12-31"
    }
  ],
  "scrape_date": "2024-12-09",
  "errors": []
}

Step 3: Check for Existing Issues

For each scraped competition, check if it already exists:

# Get all open competition issues
gh issue list -R "$TARGET_REPO" \
  --label "competition" \
  --state open \
  --json number,title,body \
  --limit 200

Match by:

  1. URL in issue body (exact match = definite duplicate)
  2. Normalized title similarity (>80% = likely duplicate)

Step 4: Fetch Details for New Competitions

For competitions not already tracked, get full details:

python skills/comp-scout-scrape/scraper.py detail "https://competitions.com.au/win-example/"

For multiple new competitions, use batch mode:

echo '{"urls": ["url1", "url2", ...]}' | python skills/comp-scout-scrape/scraper.py details-batch

Step 4.5: Apply Auto-Tagging Rules (NOT Filtering)

IMPORTANT: Auto-tagging is for LABELING issues, not for skipping/excluding competitions.

Check competitions against user preferences from the data repo's CLAUDE.md to determine which labels to apply.

  1. Fetch preferences:
gh api repos/$TARGET_REPO/contents/CLAUDE.md -H "Accept: application/vnd.github.raw" 2>/dev/null
  1. Parse the Detection Keywords section for tagging rules

  2. For each competition, check if title/prize matches any keywords:

For each tag_rule in [for-kids, cruise]:
  For each keyword in tag_rule.keywords:
    If keyword.lower() in (competition.title + competition.prize_summary).lower():
      Add tag_rule.label to issue labels
  1. ALL competitions are ALWAYS persisted as issues. Tagged competitions:
    • Get the relevant label applied (e.g., for-kids, cruise)
    • Are closed immediately with explanation comment
    • But they ARE STILL CREATED as issues (for record-keeping and potential review)

Step 5: Auto-Persist Results

For New Competitions → Create Issue

gh issue create -R "$TARGET_REPO" \
  --title "$TITLE" \
  --label "competition" \
  --label "25wol" \
  --body "$(cat <<'EOF'
## Competition Details

**URL:** {url}
**Brand:** {brand}
**Prize:** {prize_summary}
**Word Limit:** {word_limit} words
**Closes:** {closing_date}
**Draw Date:** {draw_date}
**Winners Notified:** {notification_info}

## Prompt

> {prompt}

---
*Scraped from {site} on {scrape_date}*
EOF
)"

Then set milestone by closing month:

gh issue edit $ISSUE_NUMBER -R "$TARGET_REPO" --milestone "December 2024"

For Duplicates → Add Comment

If competition URL found on another site:

gh issue comment $EXISTING_ISSUE -R "$TARGET_REPO" --body "$(cat <<'EOF'
### Also found on {other_site}

**URL:** {url}
**Title on this site:** {title}
*Discovered: {date}*
EOF
)"

For Filtered Competitions → Create Issue + Close

If competition matched auto-filter keywords:

# Create the issue first (for record-keeping)
ISSUE_URL=$(gh issue create -R "$TARGET_REPO" \
  --title "$TITLE" \
  --label "competition" \
  --label "25wol" \
  --label "$FILTER_LABEL" \
  --body "...")

# Extract issue number
ISSUE_NUMBER=$(echo "$ISSUE_URL" | grep -oE '[0-9]+$')

# Close with explanation
gh issue close $ISSUE_NUMBER -R "$TARGET_REPO" --comment "$(cat <<'EOF'
Auto-filtered: matches '$KEYWORD' in $FILTER_RULE preferences.

See CLAUDE.md in this repository for filter settings.
EOF
)"

Step 6: Report Results

Present confirmation to user:

✅ Scrape complete!

**Created 3 new issues:**
- #42: Win a $500 Coles Gift Card (closes Dec 31)
- #43: Win a Trip to Bali (closes Jan 15)
- #44: Win a Year's Supply of Coffee (closes Dec 20)

**Auto-filtered 2 (created + closed):**
- #45: Win Lego Set (for-kids: matched "Lego")
- #46: Win P&O Cruise (cruise: matched "P&O")

**Found 2 duplicates (added as comments):**
- #38: Win Woolworths Gift Cards (also on netrewards.com.au)
- #39: Win Dreamworld Experience (also on netrewards.com.au)

**Skipped 7 already tracked**

IMPORTANT: Do NOT ask "Would you like me to analyze these?" at the end. When invoked by comp-scout-daily, the workflow will automatically invoke analyze/compose skills next. Report results and stop.

Output Fields

Listing Output

Field Type Description
url string Full URL to competition detail page
site string Source site (competitions.com.au or netrewards.com.au)
title string Competition title as displayed
normalized_title string Lowercase, prefixes stripped, for matching
brand string Sponsor/brand name (if available)
prize_summary string Prize description or value badge
prize_value int/null Numeric value in dollars
closing_date string/null YYYY-MM-DD format

Detail Output

All listing fields plus:

Field Type Description
prompt string The actual competition question/prompt
word_limit int Maximum words (default 25)
entry_method string How to submit entry
winner_notification object/null Notification details from JSON-LD
scraped_at string ISO timestamp of scrape

Winner Notification Object

Field Type Description
notification_text string Raw notification text
notification_date string/null Specific date if mentioned
notification_days int/null Days after close/draw
selection_text string How winners are selected
selection_date string/null When judging occurs

Title Normalization

Titles are normalized for deduplication:

  1. Lowercase
  2. Strip prefixes: "Win ", "Win a ", "Win an ", "Win the ", "Win 1 of "
  3. Remove punctuation
  4. Collapse whitespace

Example:

Original: "Win a $500 Coles Gift Card"
Normalized: "500 coles gift card"

Example Session

User: Scrape competitions

Claude: I'll scrape competitions and persist new ones to GitHub.

[Runs: python skills/comp-scout-scrape/scraper.py listings]

Found 12 competitions from both sites.

[Runs: gh issue list -R discreteds/competition-data --label competition --json number,title,body]

Checking against 45 existing issues...
- 3 are new
- 2 are duplicates (same competition, different source)
- 7 already tracked

Fetching details for 3 new competitions...

[Creates issues and adds comments]

✅ Scrape complete!

**Created 3 new issues:**
- #46: Win a $500 Coles Gift Card (closes Dec 31)
  - Milestone: December 2024
- #47: Win a Trip to Bali (closes Jan 15)
  - Milestone: January 2025
- #48: Win a Year's Supply of Coffee (closes Dec 20)
  - Milestone: December 2024

**Added 2 duplicate comments:**
- #38: Also found on netrewards.com.au
- #39: Also found on netrewards.com.au

CLI Commands Reference

# Scrape all listing pages
python skills/comp-scout-scrape/scraper.py listings

# Get full details for one competition
python skills/comp-scout-scrape/scraper.py detail "URL"

# Get full details for multiple competitions (batch mode)
echo '{"urls": ["url1", "url2"]}' | python skills/comp-scout-scrape/scraper.py details-batch

# Debug: just get URLs
python skills/comp-scout-scrape/scraper.py urls

Batch Details Output

{
  "details": [
    {
      "url": "...",
      "title": "...",
      "prompt": "Tell us in 25 words...",
      "word_limit": 25,
      ...
    }
  ],
  "scrape_date": "2024-12-09",
  "errors": []
}

Persistence Details

This skill handles all GitHub persistence. The separate comp-scout-persist skill is deprecated - its functionality is merged here.

Issue Creation Template

## Competition Details

**URL:** {url}
**Brand:** {brand}
**Prize:** {prize_summary}
**Word Limit:** {word_limit} words
**Closes:** {closing_date}
**Draw Date:** {draw_date}
**Winners Notified:** {notification_info}

## Prompt

> {prompt}

---
*Scraped from {site} on {scrape_date}*

Labels

Label Description Auto-applied
competition All competition issues Always
25wol 25 words or less type Always
for-kids Auto-filtered (kids competitions) When keyword matches
cruise Auto-filtered (cruise competitions) When keyword matches
closing-soon Closes within 3 days By separate check
entry-drafted Entry has been composed By comp-scout-compose
entry-submitted Entry has been submitted Manually

Milestones

Issues are assigned to milestones by closing date month:

  • "December 2024"
  • "January 2025"
  • etc.
# Create milestone if needed
gh api repos/$TARGET_REPO/milestones \
  --method POST \
  --field title="$MONTH_YEAR" \
  --field due_on="$LAST_DAY_OF_MONTH"

# Assign to issue
gh issue edit $ISSUE_NUMBER -R "$TARGET_REPO" --milestone "$MONTH_YEAR"

Duplicate Comment Template

### Also found on {other_site}

**URL:** {url}
**Title on this site:** {title}
*Discovered: {date}*

Filtered Issue Handling

When a competition matches filter keywords:

  1. Issue is created (for record-keeping)
  2. Filter label is applied (e.g., for-kids)
  3. Issue is immediately closed with explanation
gh issue close $ISSUE_NUMBER -R "$TARGET_REPO" \
  --comment "Auto-filtered: matches '$KEYWORD' in $FILTER_RULE preferences."

Integration

This skill is invoked by comp-scout-daily as the first step in the workflow.

After scraping, you can:

  • Use comp-scout-analyze to generate entry strategies
  • Use comp-scout-compose to write actual entries
  • Both will auto-persist their results as comments on the issue