company-product-context

name	company-product-context
description	Compiles comprehensive company product context from PDF documents, web research, and industry knowledge

Company Product Context Compiler

This skill extracts information from company PDF documents, conducts web research, and synthesizes industry knowledge to create a comprehensive company product context report.

Copy this checklist and track your progress:

Company Product Context Progress:
- [ ] Step 1: Gather company materials and identify sources
- [ ] Step 2: Extract information from PDF documents
- [ ] Step 3: Structure extracted data
- [ ] Step 4: Conduct web research and validation
- [ ] Step 5: Synthesize industry knowledge
- [ ] Step 6: Compile comprehensive product context
- [ ] Step 7: Generate final report
- [ ] Step 8: Export deliverables

Step 1: Gather company materials and identify sources

Collect all available company information:

Required Inputs:

Company PDF documents (annual reports, product sheets, presentations, etc.)
Company name and website URL
Industry/sector information
Specific products or services to focus on (if applicable)

Actions:

Request all relevant PDF files from user
Confirm company name, website, and primary industry
Ask about specific focus areas or products of interest
Identify any competitive context needed

Expected in INPUT_DIR:

*.pdf - Company documents
company_info.txt - Basic company details (optional)

Step 2: Extract information from PDF documents

Extract structured information from all provided PDF files.

Use the Python script for PDF extraction:

import os
import re
from pathlib import Path
import PyPDF2
import json

def extract_pdf_content(pdf_path):
    """Extract text content from PDF file."""
    text_content = []
    metadata = {}
    
    try:
        with open(pdf_path, 'rb') as file:
            pdf_reader = PyPDF2.PdfReader(file)
            
            # Extract metadata
            if pdf_reader.metadata:
                metadata = {
                    'title': pdf_reader.metadata.get('/Title', ''),
                    'author': pdf_reader.metadata.get('/Author', ''),
                    'subject': pdf_reader.metadata.get('/Subject', ''),
                    'pages': len(pdf_reader.pages)
                }
            else:
                metadata = {'pages': len(pdf_reader.pages)}
            
            # Extract text from all pages
            for page_num, page in enumerate(pdf_reader.pages, 1):
                try:
                    text = page.extract_text()
                    if text.strip():
                        text_content.append({
                            'page': page_num,
                            'text': text
                        })
                except Exception as e:
                    print(f"Error extracting page {page_num}: {e}")
                    
    except Exception as e:
        print(f"Error reading PDF {pdf_path}: {e}")
        return None
    
    return {
        'filename': os.path.basename(pdf_path),
        'metadata': metadata,
        'content': text_content
    }

def extract_key_sections(text):
    """Extract key sections from text based on common headers."""
    sections = {
        'company_overview': [],
        'products_services': [],
        'business_model': [],
        'market_position': [],
        'financials': [],
        'technology': [],
        'customers': [],
        'strategy': [],
        'other': []
    }
    
    # Keywords for section identification
    keywords = {
        'company_overview': ['about us', 'company overview', 'who we are', 'introduction', 'history'],
        'products_services': ['products', 'services', 'solutions', 'offerings', 'portfolio'],
        'business_model': ['business model', 'revenue model', 'how we work', 'operations'],
        'market_position': ['market', 'industry', 'competitive', 'position', 'landscape'],
        'financials': ['financial', 'revenue', 'earnings', 'profit', 'growth'],
        'technology': ['technology', 'platform', 'infrastructure', 'technical', 'innovation'],
        'customers': ['customers', 'clients', 'partners', 'case study', 'testimonial'],
        'strategy': ['strategy', 'vision', 'mission', 'goals', 'objectives', 'roadmap']
    }
    
    lines = text.split('\n')
    current_section = 'other'
    
    for line in lines:
        line_lower = line.lower().strip()
        
        # Check if line is a section header
        for section, section_keywords in keywords.items():
            if any(keyword in line_lower for keyword in section_keywords):
                if len(line_lower) < 100:  # Likely a header
                    current_section = section
                    break
        
        if line.strip():
            sections[current_section].append(line)
    
    return sections

def analyze_company_info(extracted_data):
    """Analyze extracted data for key company information."""
    analysis = {
        'company_name': '',
        'industry': '',
        'products': [],
        'key_terms': [],
        'metrics': [],
        'urls': [],
        'emails': []
    }
    
    all_text = ''
    for doc in extracted_data:
        for page in doc['content']:
            all_text += page['text'] + '\n'
    
    # Extract URLs
    url_pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
    analysis['urls'] = list(set(re.findall(url_pattern, all_text)))
    
    # Extract emails
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    analysis['emails'] = list(set(re.findall(email_pattern, all_text)))
    
    # Extract potential metrics (numbers with units/context)
    metrics_pattern = r'\$?\d+\.?\d*\s*(?:million|billion|trillion|k|M|B|%|percent|users|customers|employees)'
    analysis['metrics'] = re.findall(metrics_pattern, all_text, re.IGNORECASE)
    
    return analysis

def main():
    input_dir = os.environ.get('INPUT_DIR', '/tmp')
    output_dir = '/tmp/extracted_data'
    os.makedirs(output_dir, exist_ok=True)
    
    # Find all PDF files
    pdf_files = list(Path(input_dir).glob('*.pdf'))
    
    if not pdf_files:
        print("No PDF files found in input directory")
        return
    
    print(f"Found {len(pdf_files)} PDF file(s)")
    
    extracted_data = []
    
    for pdf_file in pdf_files:
        print(f"\nProcessing: {pdf_file.name}")
        data = extract_pdf_content(str(pdf_file))
        
        if data:
            extracted_data.append(data)
            
            # Extract sections from content
            all_text = '\n'.join([page['text'] for page in data['content']])
            sections = extract_key_sections(all_text)
            
            # Save individual file data
            output_file = output_dir + f"/{pdf_file.stem}_extracted.json"
            with open(output_file, 'w', encoding='utf-8') as f:
                json.dump({
                    'metadata': data['metadata'],
                    'sections': {k: '\n'.join(v) for k, v in sections.items() if v},
                    'full_text': all_text
                }, f, indent=2, ensure_ascii=False)
            
            print(f"✓ Extracted {len(data['content'])} pages")
            print(f"✓ Saved to: {output_file}")
    
    # Analyze all extracted data
    if extracted_data:
        analysis = analyze_company_info(extracted_data)
        
        analysis_file = output_dir + '/company_analysis.json'
        with open(analysis_file, 'w', encoding='utf-8') as f:
            json.dump(analysis, f, indent=2, ensure_ascii=False)
        
        print(f"\n✓ Company analysis saved to: {analysis_file}")
        print(f"✓ Found {len(analysis['urls'])} URLs")
        print(f"✓ Found {len(analysis['emails'])} email addresses")
        print(f"✓ Found {len(analysis['metrics'])} metrics")
    
    print(f"\n✓ Extraction complete. All data saved to: {output_dir}")

if __name__ == '__main__':
    main()

Execute the extraction:

python3 /tmp/company-product-context/extract_pdfs.py

Outputs:

/tmp/extracted_data/[filename]_extracted.json - Structured data per PDF
/tmp/extracted_data/company_analysis.json - Aggregated analysis

Step 3: Structure extracted data

Organize the extracted information into a structured format.

Review extracted data:

# List all extracted files
ls -la /tmp/extracted_data/

# Review company analysis
cat /tmp/extracted_data/company_analysis.json | jq '.'

# Review individual extractions
for file in /tmp/extracted_data/*_extracted.json; do
    echo "=== $(basename $file) ==="
    cat "$file" | jq '.metadata, .sections | keys'
done

Manually review and note:

Company name and full legal name
Core products and services
Business model and revenue streams
Target customers and market segments
Key differentiators
Technology stack or platform details
Financial highlights
Strategic initiatives

Step 4: Conduct web research and validation

Note: This step requires web search capabilities. Based on extracted information:

Research focus areas:

Company verification: Confirm company details, recent news, press releases
Product information: Latest product updates, feature sets, pricing
Market position: Industry reports, analyst coverage, competitive landscape
Customer base: Case studies, testimonials, major clients
Technology: Tech stack, integrations, API documentation
Recent developments: Funding rounds, partnerships, acquisitions

Search queries to execute:

"[Company Name] official website"
"[Company Name] products and services"
"[Company Name] company overview"
"[Company Name] industry analysis"
"[Company Name] competitors"
"[Company Name] case studies"
"[Company Name] recent news"
"[Company Name] technology stack"

Document findings in:

# Create research notes file
cat > /tmp/extracted_data/web_research.md << 'EOF'
# Web Research Findings

## Official Sources
- Website: [URL]
- LinkedIn: [URL]
- Documentation: [URL]

## Company Overview
[Key findings from official sources]

## Products & Services
[Detailed product information]

## Market Position
[Industry context and competitive landscape]

## Recent Developments
[News, funding, partnerships]

## Technology Details
[Technical architecture, integrations]

## Customer Information
[Target market, case studies, testimonials]

## Additional Insights
[Other relevant findings]
EOF

Edit this file with your research findings.

Step 5: Synthesize industry knowledge

Apply industry expertise and context to enrich the company profile.

Industry analysis framework:

Create an industry context document:

cat > /tmp/extracted_data/industry_context.md << 'EOF'
# Industry Context Analysis

## Industry Overview
- Industry: [Name]
- Market size: [Data]
- Growth rate: [Data]
- Key trends: [List]

## Competitive Landscape
- Major players: [List]
- Market segments: [Description]
- Competitive dynamics: [Analysis]

## Industry Challenges
1. [Challenge 1]
2. [Challenge 2]
3. [Challenge 3]

## Innovation Trends
1. [Trend 1]
2. [Trend 2]
3. [Trend 3]

## Regulatory Environment
[Relevant regulations and compliance requirements]

## Future Outlook
[Industry predictions and trajectory]

## Company Position in Industry
- Market segment: [Position]
- Competitive advantages: [List]
- Challenges faced: [List]
- Opportunities: [List]
EOF

Key considerations:

Industry-specific terminology and concepts
Regulatory requirements and compliance standards
Common business models in the industry
Typical customer pain points
Standard technology solutions
Industry best practices

Step 6: Compile comprehensive product context

Synthesize all gathered information into a structured product context document.

Use the compilation script:

import json
import os
from datetime import datetime
from pathlib import Path

def load_json_files(directory):
    """Load all JSON files from directory."""
    data = {}
    json_files = Path(directory).glob('*.json')
    
    for file in json_files:
        with open(file, 'r', encoding='utf-8') as f:
            data[file.stem] = json.load(f)
    
    return data

def load_markdown_files(directory):
    """Load all markdown files from directory."""
    data = {}
    md_files = Path(directory).glob('*.md')
    
    for file in md_files:
        with open(file, 'r', encoding='utf-8') as f:
            data[file.stem] = f.read()
    
    return data

def compile_product_context(json_data, markdown_data):
    """Compile comprehensive product context."""
    
    context = {
        'metadata': {
            'generated_date': datetime.now().isoformat(),
            'sources': list(json_data.keys()) + list(markdown_data.keys())
        },
        'company_profile': {},
        'products_and_services': {},
        'business_model': {},
        'market_analysis': {},
        'technology_platform': {},
        'customer_information': {},
        'strategic_context': {},
        'key_insights': []
    }
    
    # Extract company profile
    if 'company_analysis' in json_data:
        analysis = json_data['company_analysis']
        context['company_profile'] = {
            'name': analysis.get('company_name', ''),
            'industry': analysis.get('industry', ''),
            'website': analysis['urls'][0] if analysis.get('urls') else '',
            'contact': analysis['emails'][0] if analysis.get('emails') else '',
            'key_metrics': analysis.get('metrics', [])
        }
    
    # Compile product information
    products = []
    for key, data in json_data.items():
        if '_extracted' in key and 'sections' in data:
            sections = data['sections']
            if 'products_services' in sections:
                products.append(sections['products_services'])
    
    context['products_and_services'] = {
        'description': '\n\n'.join(products) if products else '',
        'categories': []
    }
    
    # Add web research
    if 'web_research' in markdown_data:
        context['web_research'] = markdown_data['web_research']
    
    # Add industry context
    if 'industry_context' in markdown_data:
        context['industry_analysis'] = markdown_data['industry_context']
    
    return context

def generate_narrative_report(context):
    """Generate narrative report from context data."""
    
    report = f"""# Company Product Context Report

**Generated:** {context['metadata']['generated_date']}

---

## Executive Summary

[This section provides a high-level overview of the company, its products, and market position.]

---

## Company Profile

### Overview
{context['company_profile'].get('name', 'Company Name')} operates in the {context['company_profile'].get('industry', 'industry')} sector.

**Key Details:**
- **Website:** {context['company_profile'].get('website', 'N/A')}
- **Industry:** {context['company_profile'].get('industry', 'N/A')}
- **Contact:** {context['company_profile'].get('contact', 'N/A')}

### Key Metrics
"""
    
    metrics = context['company_profile'].get('key_metrics', [])
    if metrics:
        for metric in metrics[:10]:  # Top 10 metrics
            report += f"- {metric}\n"
    else:
        report += "- [No metrics extracted]\n"
    
    report += """

---

## Products and Services

### Product Portfolio
"""
    
    products_desc = context['products_and_services'].get('description', '')
    if products_desc:
        report += products_desc
    else:
        report += "[Product information to be populated from extracted data]\n"
    
    report += """

### Service Offerings
[Service details from extracted information]

---

## Business Model

### Revenue Streams
[Revenue model and monetization strategy]

### Value Proposition
[Core value delivered to customers]

### Key Partnerships
[Strategic partnerships and ecosystem]

---

## Market Analysis

### Target Market
[Primary customer segments and market focus]

### Competitive Landscape
[Key competitors and market positioning]

### Market Opportunity
[Market size, growth potential, and trends]

---

## Technology Platform

### Technical Architecture
[Technology stack and infrastructure]

### Integration Capabilities
[APIs, integrations, and interoperability]

### Innovation Focus
[R&D initiatives and technological advantages]

---

## Customer Information

### Customer Profile
[Ideal customer profile and segments]

### Use Cases
[Common use cases and applications]

### Case Studies
[Notable customer implementations]

---

## Strategic Context

### Vision and Mission
[Company vision and mission statements]

### Strategic Priorities
[Current strategic initiatives and focus areas]

### Growth Strategy
[Expansion plans and growth initiatives]

---

## Industry Context

"""
    
    if 'industry_analysis' in context:
        report += context['industry_analysis']
    else:
        report += "[Industry analysis to be added]\n"
    
    report += """

---

## Web Research Findings

"""
    
    if 'web_research' in context:
        report += context['web_research']
    else:
        report += "[Web research findings to be added]\n"
    
    report += """

---

## Key Insights and Recommendations

### Strengths
1. [Key strength 1]
2. [Key strength 2]
3. [Key strength 3]

### Opportunities
1. [Opportunity 1]
2. [Opportunity 2]
3. [Opportunity 3]

### Challenges
1. [Challenge 1]
2. [Challenge 2]
3. [Challenge 3]

### Recommendations
1. [Recommendation 1]
2. [Recommendation 2]
3. [Recommendation 3]

---

## Appendix

### Sources
"""
    
    for source in context['metadata']['sources']:
        report += f"- {source}\n"
    
    report += """

### Methodology
This report was compiled using:
1. PDF document extraction and analysis
2. Web research and validation
3. Industry knowledge synthesis
4. Structured data compilation

---

*Report generated by Company Product Context Compiler*
"""
    
    return report

def main():
    data_dir = '/tmp/extracted_data'
    
    # Load all data
    print("Loading extracted data...")
    json_data = load_json_files(data_dir)
    markdown_data = load_markdown_files(data_dir)
    
    print(f"✓ Loaded {len(json_data)} JSON files")
    print(f"✓ Loaded {len(markdown_data)} Markdown files")
    
    # Compile context
    print("\nCompiling product context...")
    context = compile_product_context(json_data, markdown_data)
    
    # Save structured context
    context_file = '/tmp/product_context.json'
    with open(context_file, 'w', encoding='utf-8') as f:
        json.dump(context, f, indent=2, ensure_ascii=False)
    
    print(f"✓ Structured context saved to: {context_file}")
    
    # Generate narrative report
    print("\nGenerating narrative report...")
    report = generate_narrative_report(context)
    
    report_file = '/tmp/product_context_report.md'
    with open(report_file, 'w', encoding='utf-8') as f:
        f.write(report)
    
    print(f"✓ Narrative report saved to: {report_file}")
    
    print("\n✓ Product context compilation complete!")

if __name__ == '__main__':
    main()

Execute compilation:

python3 /tmp/company-product-context/compile_context.py

Outputs:

/tmp/product_context.json - Structured data format
/tmp/product_context_report.md - Narrative report

Step 7: Generate final report

Review and enhance the generated report with manual insights and analysis.

Review the report:

cat /tmp/product_context_report.md

Enhancement steps:

Fill in placeholders: Replace bracketed placeholders with actual information
Add analysis: Include your expert analysis and insights
Verify accuracy: Cross-reference with source materials
Add context: Include industry-specific context and implications
Enhance narrative: Improve flow and readability
Add visualizations: Consider adding diagrams or charts (describe them textually)

Create enhanced version:

Edit /tmp/product_context_report.md to add:

Executive summary with key takeaways
Deeper analysis of competitive positioning
Strategic recommendations
Risk assessment
Opportunity identification
Implementation considerations

Step 8: Export deliverables

Package and export all deliverables for the user.

Export script:

#!/bin/bash

OUTPUT_DIR=${OUTPUT_DIR:-/tmp/output}
EXPORT_DIR="/tmp/company_context_deliverables"

# Create export directory
mkdir -p "$EXPORT_DIR"

# Copy main deliverables
echo "Packaging deliverables..."

cp /tmp/product_context_report.md "$EXPORT_DIR/"
cp /tmp/product_context.json "$EXPORT_DIR/"

# Copy extracted data
mkdir -p "$EXPORT_DIR/raw_data"
cp -r /tmp/extracted_data/* "$EXPORT_DIR/raw_data/"

# Create summary document
cat > "$EXPORT_DIR/README.md" << 'EOF'
# Company Product Context Deliverables

## Contents

### Main Reports
1. **product_context_report.md** - Comprehensive narrative report
2. **product_context.json** - Structured data format

### Raw Data
- **raw_data/** - All extracted and intermediate data
  - PDF extractions (JSON format)
  - Company analysis
  - Web research notes
  - Industry context analysis

## How to Use

### The Narrative Report
Open `product_context_report.md` in any markdown viewer or text editor.
This is your primary deliverable with comprehensive analysis.

### The Structured Data
`product_context.json` contains machine-readable structured data
that can be imported into other systems or databases.

### Raw Data
The raw_data folder contains all intermediate processing files
for reference and verification purposes.

## Next Steps

1. Review the product context report
2. Validate information against your knowledge
3. Share with relevant stakeholders
4. Use as basis for strategic planning
5. Update as company evolves

---

Generated by Company Product Context Compiler
EOF

# Create archive
cd /tmp
tar -czf company_context_deliverables.tar.gz company_context_deliverables/

# Copy to output
cp -r "$EXPORT_DIR" "$OUTPUT_DIR/"
cp company_context_deliverables.tar.gz "$OUTPUT_DIR/"

echo "✓ Deliverables packaged"
echo "✓ Location: $EXPORT_DIR"
echo "✓ Archive: company_context_deliverables.tar.gz"
echo ""
echo "Files ready for download:"
ls -lh "$EXPORT_DIR"

Execute export:

bash /tmp/company-product-context/export_deliverables.sh

Final deliverables:

📄 Product Context Report (Markdown)
📊 Structured Data (JSON)
📁 Raw Extracted Data
📦 Complete Archive (tar.gz)
📖 README with usage instructions

Usage Examples

Example 1: Technology Company

Inputs: Annual report PDF, product documentation PDF Focus: SaaS products, B2B market positioning Output: Comprehensive context with tech stack analysis

Example 2: Manufacturing Company

Inputs: Company brochure PDF, investor presentation Focus: Product lines, supply chain, market segments Output: Detailed product portfolio and market analysis

Example 3: Consulting Firm

Inputs: Service offerings PDF, case studies PDF Focus: Service capabilities, client types, differentiators Output: Service context with competitive positioning

Tips for Best Results

Provide multiple PDFs: More sources = richer context
Include diverse documents: Annual reports, product sheets, presentations, case studies
Specify focus areas: Guide the analysis to your needs
Review and enhance: Generated report is a starting point for your expert analysis
Update web research: Manually add recent information not in PDFs
Validate metrics: Cross-check extracted numbers for accuracy
Add industry context: Leverage your domain expertise
Customize sections: Tailor the report structure to your needs

Troubleshooting

PDF extraction issues

Ensure PDFs are text-based (not scanned images)
For image-based PDFs, OCR preprocessing may be needed
Large PDFs may take longer to process

Missing information

Not all PDFs contain all sections
Use web research to fill gaps
Add manual notes to placeholder sections

Data accuracy

Always verify extracted metrics
Cross-reference multiple sources
Use web research for validation

Customization Options

Modify extraction patterns

Edit extract_pdfs.py to customize:

Section identification keywords
Metric extraction patterns
Data structure organization

Customize report template

Edit compile_context.py to modify:

Report sections and structure
Analysis framework
Output formatting

Add industry-specific sections

Extend the report template with:

Compliance and regulatory analysis
Technology architecture deep-dive
Financial modeling
Risk assessment frameworks

Integration Possibilities

This skill can be integrated with:

CRM systems (import structured company data)
Sales enablement platforms
Competitive intelligence databases
Market research tools
Strategic planning frameworks

Export the JSON format for easy integration with other systems.

Install Skill

SKILL.md