Claude Code Plugins

Community-maintained marketplace

Feedback

named-entity-extractor

@dkyazzentwatwa/chatgpt-skills
3
0

Extract named entities (people, organizations, locations, dates) from text using NLP. Use for document analysis, information extraction, or data enrichment.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name named-entity-extractor
description Extract named entities (people, organizations, locations, dates) from text using NLP. Use for document analysis, information extraction, or data enrichment.

Named Entity Extractor

Extract named entities from text including people, organizations, locations, dates, and more.

Features

  • Entity Types: People, organizations, locations, dates, money, percentages
  • Multiple Models: spaCy for accuracy, regex for speed
  • Batch Processing: Process multiple documents
  • Entity Linking: Group same entities across text
  • Export: JSON, CSV output formats
  • Visualization: Entity highlighting

Quick Start

from entity_extractor import EntityExtractor

extractor = EntityExtractor()

text = "Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976."

entities = extractor.extract(text)
for entity in entities:
    print(f"{entity['text']}: {entity['type']}")

# Output:
# Apple Inc.: ORG
# Steve Jobs: PERSON
# Cupertino: GPE
# California: GPE
# 1976: DATE

CLI Usage

# Extract from text
python entity_extractor.py --text "Steve Jobs founded Apple in California."

# Extract from file
python entity_extractor.py --input document.txt

# Batch process folder
python entity_extractor.py --input ./documents/ --output entities.csv

# Filter by entity type
python entity_extractor.py --input document.txt --types PERSON,ORG

# Use regex mode (faster, less accurate)
python entity_extractor.py --input document.txt --mode regex

# JSON output
python entity_extractor.py --input document.txt --json

API Reference

EntityExtractor Class

class EntityExtractor:
    def __init__(self, mode: str = "spacy", model: str = "en_core_web_sm")

    # Extraction
    def extract(self, text: str) -> list
    def extract_file(self, filepath: str) -> list
    def extract_batch(self, folder: str) -> dict

    # Filtering
    def filter_entities(self, entities: list, types: list) -> list
    def get_unique_entities(self, entities: list) -> list
    def group_by_type(self, entities: list) -> dict

    # Analysis
    def entity_frequency(self, text: str) -> dict
    def find_relationships(self, text: str) -> list

    # Export
    def to_csv(self, entities: list, output: str) -> str
    def to_json(self, entities: list, output: str) -> str
    def highlight_text(self, text: str) -> str

Entity Types

Standard Entity Types (spaCy)

Type Description Example
PERSON People, including fictional "Steve Jobs"
ORG Companies, agencies, institutions "Apple Inc."
GPE Countries, cities, states "California"
LOC Non-GPE locations, mountains, water "Pacific Ocean"
DATE Dates, periods "January 2024"
TIME Times "3:30 PM"
MONEY Monetary values "$1.5 million"
PERCENT Percentages "20%"
PRODUCT Products "iPhone"
EVENT Events "World Cup"
WORK_OF_ART Books, songs, etc. "The Great Gatsby"
LAW Laws, regulations "GDPR"
LANGUAGE Languages "English"
NORP Nationalities, groups "American"

Regex Mode Entities

Faster extraction with regex patterns:

Type Description
EMAIL Email addresses
PHONE Phone numbers
URL Web URLs
DATE Common date formats
MONEY Currency amounts
PERCENTAGE Percentages

Output Format

Entity Result

{
    "text": "Steve Jobs",
    "type": "PERSON",
    "start": 10,
    "end": 20,
    "confidence": 0.95
}

Full Extraction Result

{
    "text": "Original text...",
    "entities": [
        {"text": "Steve Jobs", "type": "PERSON", "start": 10, "end": 20},
        {"text": "Apple Inc.", "type": "ORG", "start": 30, "end": 40}
    ],
    "summary": {
        "total_entities": 5,
        "unique_entities": 4,
        "by_type": {
            "PERSON": 2,
            "ORG": 1,
            "GPE": 2
        }
    }
}

Filtering and Grouping

Filter by Type

entities = extractor.extract(text)

# Get only people and organizations
filtered = extractor.filter_entities(entities, ["PERSON", "ORG"])

Get Unique Entities

# Remove duplicates, keep first occurrence
unique = extractor.get_unique_entities(entities)

Group by Type

grouped = extractor.group_by_type(entities)

# Returns:
{
    "PERSON": ["Steve Jobs", "Tim Cook"],
    "ORG": ["Apple Inc."],
    "GPE": ["California", "Cupertino"]
}

Entity Frequency

frequency = extractor.entity_frequency(text)

# Returns:
{
    "Steve Jobs": {"count": 5, "type": "PERSON"},
    "Apple": {"count": 8, "type": "ORG"},
    "California": {"count": 2, "type": "GPE"}
}

Batch Processing

Process Folder

results = extractor.extract_batch("./documents/")

# Returns:
{
    "doc1.txt": {
        "entities": [...],
        "summary": {...}
    },
    "doc2.txt": {
        "entities": [...],
        "summary": {...}
    }
}

Export to CSV

extractor.to_csv(results, "entities.csv")

# Creates CSV with columns:
# filename, entity_text, entity_type, start, end

Text Highlighting

Generate HTML with highlighted entities:

html = extractor.highlight_text(text)

# Returns HTML with colored spans for each entity type

Example Workflows

Document Analysis

extractor = EntityExtractor()

# Analyze a document
text = open("article.txt").read()
result = extractor.extract(text)

# Get key people mentioned
people = extractor.filter_entities(result, ["PERSON"])
print(f"People mentioned: {len(people)}")

# Get frequency
freq = extractor.entity_frequency(text)
top_entities = sorted(freq.items(), key=lambda x: x[1]["count"], reverse=True)[:10]

Contact Information Extraction

extractor = EntityExtractor(mode="regex")

text = """
Contact John Smith at john.smith@example.com
or call (555) 123-4567.
"""

entities = extractor.extract(text)
# Finds: EMAIL, PHONE entities

Content Tagging

extractor = EntityExtractor()

articles = ["article1.txt", "article2.txt", "article3.txt"]
tags = {}

for article in articles:
    entities = extractor.extract_file(article)
    tags[article] = extractor.get_unique_entities(entities)

Dependencies

  • spacy>=3.7.0
  • pandas>=2.0.0
  • en_core_web_sm (spaCy model)

Note: Run python -m spacy download en_core_web_sm to install the model.