| name | deep-research |
| description | Conduct comprehensive web research with source fetching, content extraction, and synthesis into structured reports |
| version | 1.0.0 |
| dependencies | python>=3.8, requests>=2.28.0 |
| tags | research, web, analysis, reports |
Deep Research Skill
This skill conducts comprehensive research by searching the web, fetching full page content from multiple sources, extracting key data points, and preparing structured data for synthesis into detailed reports with citations.
When to Use
- User asks for research, analysis, or investigation on a topic
- Questions require verification from multiple sources
- Market data, price trends, or financial analysis needed
- Reports with proper source citations are required
- Qualitative reasoning across multiple data points needed
Quick Start
result = deep_research(
query="silver prices India last 30 days",
min_sources=10,
output_format="markdown"
)
Resources
- Methodology Guide: See methodology.md for detailed research methodology
- URL Validator: Use validate_sources.py to pre-validate URLs
Process
- Search Phase: Execute web search to find relevant sources (minimum 10 URLs)
- Rank Phase: Score URLs by relevance - see methodology.md
- Fetch Phase: Retrieve full page content from top sources via Firecrawl
- Extract Phase: Parse and extract key data points from each source
- Prepare Phase: Structure extracted data for LLM synthesis
Configuration
The following environment variables control behavior:
DEEP_RESEARCH_MIN_SOURCES: Minimum sources to fetch (default: 10)DEEP_RESEARCH_MAX_SOURCES: Maximum sources to fetch (default: 15)DEEP_RESEARCH_SEARCH_RESULTS: Initial search results to retrieve (default: 20)
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query |
string | Yes | - | The research question or topic |
min_sources |
integer | No | 10 | Minimum sources to fetch |
max_sources |
integer | No | 15 | Maximum sources to fetch |
output_format |
string | No | "markdown" | "markdown", "json", or "summary" |
search_depth |
string | No | "standard" | "quick", "standard", or "thorough" |
include_raw_content |
boolean | No | false | Include raw fetched content |
Output Schema
{
"success": true,
"query": "original query",
"sources": [
{
"url": "https://...",
"title": "Page Title",
"domain": "example.com",
"content_summary": "Extracted content...",
"relevance_score": 0.85,
"fetch_status": "success"
}
],
"source_count": 10,
"failed_sources": 0,
"extracted_facts": [
{"fact": "...", "source_index": 0, "confidence": 0.9}
],
"synthesis_context": {
"combined_content": "...",
"source_citations": ["[1]", "[2]"],
"synthesis_instructions": "..."
}
}
Scripts
URL Validation Script
Before fetching, validate URLs to avoid blocked domains:
python scripts/validate_sources.py https://example.com https://another.com
Returns JSON with validation results including blocked domains and paywall warnings.
Example Usage
Query: "Find silver prices in India for the last 30 days with qualitative analysis"
Expected behavior:
- Search for "silver prices India December 2024 30 days trends"
- Fetch content from top 10 financial/commodity websites
- Extract price data, dates, trends, expert opinions
- Return structured data for comprehensive report generation
Advanced Topics
For detailed information on:
- Query expansion strategies - see methodology.md#search-query-generation
- Source ranking algorithm - see methodology.md#source-ranking-algorithm
- Fact extraction patterns - see methodology.md#fact-extraction-patterns
- Error handling - see methodology.md#error-handling
Safety Considerations
This skill makes external network requests to:
- Web search APIs (DuckDuckGo/Tavily)
- Firecrawl API for content scraping
- Target websites for content retrieval
Rate limiting and respectful crawling practices are enforced.