Text Summarizer
Create concise summaries from long text documents using extractive summarization. Identifies and extracts the most important sentences while preserving meaning.
Quick Start
from scripts.text_summarizer import TextSummarizer
# Summarize text
summarizer = TextSummarizer()
summary = summarizer.summarize(long_text, ratio=0.2) # 20% of original
print(summary)
# Summarize file
summary = summarizer.summarize_file("article.txt", num_sentences=5)
Features
- Extractive Summarization: Selects key sentences from original text
- Length Control: By ratio, sentence count, or word count
- Multiple Algorithms: TextRank, LSA, frequency-based
- Key Points: Extract bullet-point summaries
- Batch Processing: Summarize multiple documents
- Preserve Structure: Maintains sentence order option
API Reference
Initialization
summarizer = TextSummarizer(
method="textrank", # textrank, lsa, frequency
language="english"
)
Summarization
# By ratio (20% of original length)
summary = summarizer.summarize(text, ratio=0.2)
# By sentence count
summary = summarizer.summarize(text, num_sentences=5)
# By word count
summary = summarizer.summarize(text, max_words=100)
Key Points Extraction
# Get bullet points
points = summarizer.extract_key_points(text, num_points=5)
for point in points:
print(f"• {point}")
Batch Processing
# Summarize multiple texts
texts = [text1, text2, text3]
summaries = summarizer.summarize_batch(texts, ratio=0.2)
# Summarize files in directory
summaries = summarizer.summarize_directory("./articles/", ratio=0.3)
Options
# Preserve original sentence order
summary = summarizer.summarize(text, preserve_order=True)
# Include title/first sentence
summary = summarizer.summarize(text, include_first=True)
# Minimum sentence length filter
summarizer.min_sentence_length = 10
CLI Usage
# Summarize text file
python text_summarizer.py --input article.txt --ratio 0.2
# Specific sentence count
python text_summarizer.py --input article.txt --sentences 5
# Extract key points
python text_summarizer.py --input article.txt --points 5
# Batch process
python text_summarizer.py --input-dir ./docs --output-dir ./summaries --ratio 0.3
# Output to file
python text_summarizer.py --input article.txt --output summary.txt --ratio 0.2
CLI Arguments
| Argument |
Description |
Default |
--input |
Input file path |
Required |
--output |
Output file path |
stdout |
--input-dir |
Directory of files |
- |
--output-dir |
Output directory |
- |
--ratio |
Summary ratio (0.0-1.0) |
0.2 |
--sentences |
Number of sentences |
- |
--words |
Maximum words |
- |
--points |
Extract N key points |
- |
--method |
Algorithm to use |
textrank |
--preserve-order |
Keep sentence order |
False |
Examples
News Article Summary
summarizer = TextSummarizer()
article = """
[Long news article text...]
"""
# Get a 3-sentence summary
summary = summarizer.summarize(article, num_sentences=3)
print("Summary:")
print(summary)
# Get key points
points = summarizer.extract_key_points(article, num_points=5)
print("\nKey Points:")
for i, point in enumerate(points, 1):
print(f"{i}. {point}")
Research Paper Abstract
summarizer = TextSummarizer(method="lsa")
paper = open("research_paper.txt").read()
# Create abstract-length summary
abstract = summarizer.summarize(paper, max_words=250)
print(abstract)
Meeting Notes Summary
summarizer = TextSummarizer()
notes = """
Meeting started at 2pm. John presented Q3 results showing 15% growth.
Sarah raised concerns about supply chain delays affecting Q4 projections.
The team discussed mitigation strategies including dual-sourcing.
Budget allocation for marketing was approved at $50k.
Next steps include vendor outreach by Friday.
Follow-up meeting scheduled for next Tuesday.
"""
summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)
print("Summary:", summary)
print("\nAction Items:")
for point in points:
print(f"• {point}")
Batch Document Summarization
summarizer = TextSummarizer()
import os
for filename in os.listdir("./documents"):
if filename.endswith(".txt"):
text = open(f"./documents/{filename}").read()
summary = summarizer.summarize(text, ratio=0.2)
with open(f"./summaries/{filename}", "w") as f:
f.write(summary)
print(f"Summarized: {filename}")
Algorithm Comparison
| Algorithm |
Speed |
Quality |
Best For |
| TextRank |
Medium |
High |
General text |
| LSA |
Fast |
Good |
Technical docs |
| Frequency |
Fast |
Medium |
Quick summaries |
Dependencies
nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0
Limitations
- Extractive only (doesn't paraphrase or generate new text)
- Works best with well-structured text (paragraphs, clear sentences)
- Very short texts may not summarize well
- Doesn't understand context deeply (may miss nuance)