| name | survey-analyzer |
| description | Analyze survey responses with Likert scale analysis, cross-tabulations, sentiment scoring, and frequency distributions with visualizations. |
Survey Analyzer
Comprehensive survey data analysis with Likert scales, cross-tabs, and sentiment analysis.
Features
- Likert Scale Analysis: Agreement scale scoring and visualization
- Cross-Tabulation: Relationship analysis between categorical variables
- Frequency Analysis: Response distributions and percentages
- Sentiment Scoring: Text response sentiment analysis
- Open-Ended Analysis: Theme extraction from text responses
- Statistical Tests: Chi-square, correlations, significance testing
- Visualizations: Bar charts, heatmaps, word clouds, distribution plots
- Report Generation: Comprehensive PDF/HTML reports
Quick Start
from survey_analyzer import SurveyAnalyzer
analyzer = SurveyAnalyzer()
# Load survey data
analyzer.load_csv('survey_responses.csv')
# Analyze Likert scale question
results = analyzer.likert_analysis('satisfaction', scale_type='agreement')
print(f"Mean score: {results['mean_score']:.2f}")
# Cross-tabulation
crosstab = analyzer.crosstab('age_group', 'product_preference')
print(crosstab)
# Generate report
analyzer.generate_report('survey_report.pdf')
CLI Usage
# Analyze Likert scale
python survey_analyzer.py --data survey.csv --likert satisfaction --output results.pdf
# Cross-tabulation
python survey_analyzer.py --data survey.csv --crosstab age_group product --output crosstab.png
# Sentiment analysis
python survey_analyzer.py --data survey.csv --sentiment comments --output sentiment.html
# Full report
python survey_analyzer.py --data survey.csv --report --output full_report.pdf
API Reference
SurveyAnalyzer Class
class SurveyAnalyzer:
def __init__(self)
# Data Loading
def load_csv(self, filepath, **kwargs) -> 'SurveyAnalyzer'
def load_data(self, data: pd.DataFrame) -> 'SurveyAnalyzer'
# Likert Scale Analysis
def likert_analysis(self, column, scale_type='agreement') -> Dict
def likert_comparison(self, columns: List[str]) -> pd.DataFrame
def plot_likert(self, column, output, scale_type='agreement') -> str
# Frequency Analysis
def frequency_table(self, column) -> pd.DataFrame
def multiple_choice(self, column, delimiter=',') -> pd.DataFrame
def plot_frequencies(self, column, output, top_n=None) -> str
# Cross-Tabulation
def crosstab(self, row_var, col_var, normalize=None) -> pd.DataFrame
def chi_square_test(self, row_var, col_var) -> Dict
def plot_crosstab(self, row_var, col_var, output) -> str
# Sentiment Analysis
def sentiment_analysis(self, column) -> pd.DataFrame
def sentiment_summary(self, column) -> Dict
def plot_sentiment(self, column, output) -> str
# Open-Ended Analysis
def word_frequency(self, column, top_n=20) -> pd.DataFrame
def word_cloud(self, column, output) -> str
def extract_themes(self, column, n_themes=5) -> List[str]
# Statistics
def satisfaction_score(self, columns: List[str]) -> Dict
def response_rate(self) -> Dict
def demographics_summary(self, columns: List[str]) -> pd.DataFrame
# Reporting
def generate_report(self, output, format='pdf') -> str
def summary(self) -> str
Likert Scale Analysis
Standard Scales
# 5-point agreement scale
analyzer.likert_analysis('satisfaction', scale_type='agreement')
# 1=Strongly Disagree, 2=Disagree, 3=Neutral, 4=Agree, 5=Strongly Agree
# 5-point frequency scale
analyzer.likert_analysis('usage', scale_type='frequency')
# 1=Never, 2=Rarely, 3=Sometimes, 4=Often, 5=Always
# Custom scale
analyzer.likert_analysis('rating', scale_type='custom',
labels=['Poor', 'Fair', 'Good', 'Excellent'])
Results
results = analyzer.likert_analysis('satisfaction')
# {
# 'mean_score': 4.2,
# 'median': 4,
# 'mode': 5,
# 'distribution': {1: 2, 2: 5, 3: 15, 4: 40, 5: 38},
# 'percentages': {1: 2%, 2: 5%, 3: 15%, 4: 40%, 5: 38%},
# 'top_2_box': 78%, # % Agree + Strongly Agree
# 'bottom_2_box': 7% # % Disagree + Strongly Disagree
# }
Visualization
# Stacked bar chart
analyzer.plot_likert('satisfaction', 'likert_chart.png')
# Compare multiple questions
analyzer.likert_comparison(['quality', 'value', 'service'])
analyzer.plot_likert_comparison(['quality', 'value', 'service'],
'comparison.png')
Frequency Analysis
Single Choice
freq = analyzer.frequency_table('age_group')
# Count Percentage
# 18-24 45 22.5%
# 25-34 78 39.0%
# 35-44 52 26.0%
# 45+ 25 12.5%
# Plot
analyzer.plot_frequencies('age_group', 'age_distribution.png')
Multiple Choice
For questions allowing multiple selections:
# Data format: "Option A, Option B, Option C"
results = analyzer.multiple_choice('features_liked', delimiter=',')
# Count Percentage
# Price 120 60%
# Quality 95 47.5%
# Design 80 40%
# Durability 70 35%
analyzer.plot_frequencies('features_liked', 'features.png', top_n=10)
Cross-Tabulation
Basic Cross-Tab
crosstab = analyzer.crosstab('age_group', 'satisfaction')
# Satisfied Neutral Dissatisfied
# 18-24 30 10 5
# 25-34 60 15 3
# 35-44 40 8 4
# 45+ 18 5 2
# With percentages
crosstab_pct = analyzer.crosstab('age_group', 'satisfaction',
normalize='index') # Row percentages
Statistical Testing
result = analyzer.chi_square_test('age_group', 'satisfaction')
# {
# 'statistic': 12.45,
# 'p_value': 0.014,
# 'significant': True,
# 'interpretation': 'There is a significant relationship between
# age_group and satisfaction (p=0.014)'
# }
Visualization
# Heatmap
analyzer.plot_crosstab('age_group', 'satisfaction', 'crosstab_heatmap.png')
Sentiment Analysis
Analyze open-ended text responses:
# Analyze all comments
sentiment_df = analyzer.sentiment_analysis('comments')
# comment polarity sentiment
# 0 "Great product!" 0.8 Positive
# 1 "Could be better" 0.1 Neutral
# 2 "Very disappointed" -0.6 Negative
# Summary
summary = analyzer.sentiment_summary('comments')
# {
# 'positive': 65%,
# 'neutral': 20%,
# 'negative': 15%,
# 'avg_polarity': 0.35
# }
# Visualize
analyzer.plot_sentiment('comments', 'sentiment_distribution.png')
Open-Ended Analysis
Word Frequency
words = analyzer.word_frequency('comments', top_n=20)
# Word Frequency
# 0 great 45
# 1 quality 38
# 2 price 32
# ...
Word Cloud
analyzer.word_cloud('comments', 'wordcloud.png')
Theme Extraction
themes = analyzer.extract_themes('feedback', n_themes=5)
# ['product quality', 'customer service', 'pricing',
# 'delivery speed', 'user experience']
Satisfaction Metrics
Net Promoter Score (NPS)
nps = analyzer.nps_score('recommendation') # 0-10 scale
# {
# 'promoters': 65%, # 9-10
# 'passives': 25%, # 7-8
# 'detractors': 10%, # 0-6
# 'nps': 55
# }
Overall Satisfaction
satisfaction = analyzer.satisfaction_score([
'product_quality',
'customer_service',
'value_for_money',
'ease_of_use'
])
# {
# 'overall_score': 4.3,
# 'category_scores': {...},
# 'satisfaction_rate': 86% # % scoring 4-5
# }
Demographics Analysis
demographics = analyzer.demographics_summary([
'age_group',
'gender',
'location',
'income_range'
])
# Returns frequency tables for each demographic variable
Response Rate Analysis
response_rate = analyzer.response_rate()
# {
# 'total_respondents': 200,
# 'completion_rate': 85%,
# 'average_time': '5m 30s',
# 'dropout_points': {
# 'question_5': 8%,
# 'question_12': 5%
# }
# }
Report Generation
Comprehensive Report
analyzer.generate_report('survey_report.pdf', format='pdf')
Report includes:
- Executive summary
- Response rate and demographics
- Question-by-question analysis
- Likert scale visualizations
- Cross-tabulations
- Sentiment analysis
- Key findings and recommendations
Custom Report Sections
analyzer.set_report_sections([
'executive_summary',
'demographics',
'likert_questions',
'cross_tabs',
'sentiment',
'recommendations'
])
Advanced Features
Filter by Segment
# Analyze subset of responses
analyzer.filter('age_group', '25-34')
results = analyzer.likert_analysis('satisfaction')
analyzer.clear_filter()
Compare Segments
comparison = analyzer.compare_segments(
segment_col='age_group',
metric_col='satisfaction'
)
# Shows how different segments scored the metric
Trend Analysis
For longitudinal surveys:
trends = analyzer.trend_analysis(
metric='satisfaction',
time_col='survey_date',
period='month'
)
analyzer.plot_trends(trends, 'satisfaction_trend.png')
Dependencies
- pandas>=2.0.0
- numpy>=1.24.0
- scipy>=1.10.0
- textblob>=0.17.0
- matplotlib>=3.7.0
- seaborn>=0.12.0
- wordcloud>=1.9.0
- reportlab>=4.0.0