| name | github-repo-analysis |
| description | Analyze GitHub repositories to extract insights about commit frequency, outstanding contributors, release timeline, and project health metrics. Use when users request repository analysis, commit history investigation, contributor identification, release tracking, or development activity assessment for any GitHub project. |
GitHub Repository Analysis
This skill guides analysis of GitHub repositories to extract meaningful insights about development activity, contributor patterns, and release cycles.
Core Analysis Capabilities
Commit Frequency Analysis
- Extract commit history via GitHub API or git commands
- Calculate commit frequency by time period (daily, weekly, monthly)
- Identify patterns in development activity
- Detect periods of high/low activity
- Generate time-series visualizations of commit trends
Outstanding Contributors Analysis
- Identify top contributors by commit count
- Calculate contribution distribution (percentage per contributor)
- Analyze commit patterns by contributor
- Track first-time vs. recurring contributors
- Generate contributor leaderboards
Release Timeline Analysis
- Extract release/tag history
- Calculate time between releases
- Identify release patterns and cycles
- Track version numbering schemes
- Map releases to major commit periods
Implementation Approaches
Approach 1: GitHub API (Recommended for Public Repos)
Use GitHub's REST or GraphQL API for efficient data retrieval:
import requests
from datetime import datetime
def analyze_commits(owner, repo, token=None):
headers = {'Authorization': f'token {token}'} if token else {}
url = f'https://api.github.com/repos/{owner}/{repo}/commits'
all_commits = []
page = 1
while True:
response = requests.get(url, headers=headers, params={'page': page, 'per_page': 100})
commits = response.json()
if not commits:
break
all_commits.extend(commits)
page += 1
return all_commits
def analyze_contributors(commits):
contributor_stats = {}
for commit in commits:
author = commit['commit']['author']['name']
contributor_stats[author] = contributor_stats.get(author, 0) + 1
return sorted(contributor_stats.items(), key=lambda x: x[1], reverse=True)
def analyze_releases(owner, repo, token=None):
headers = {'Authorization': f'token {token}'} if token else {}
url = f'https://api.github.com/repos/{owner}/{repo}/releases'
response = requests.get(url, headers=headers)
return response.json()
Benefits:
- No repository cloning needed
- Efficient pagination
- Access to additional metadata
- Rate limiting: 60 requests/hour without auth, 5000 with token
Approach 2: Local Git Repository
Use git commands for detailed analysis when repository is already cloned:
# Get commit history with timestamps
git log --pretty=format:"%h|%an|%ae|%ad|%s" --date=iso > commits.txt
# Count commits by author
git shortlog -sn --all
# Get all tags/releases
git tag -l --sort=-version:refname
# Commit frequency by week
git log --pretty=format:"%ad" --date=short | awk '{print $1}' | uniq -c
# Commits per month
git log --pretty=format:"%ad" --date=format:"%Y-%m" | sort | uniq -c
Approach 3: Hybrid Approach
Combine both methods for comprehensive analysis:
- Use API for releases and high-level stats
- Clone repository for detailed commit analysis
- Use git commands for advanced filtering
Analysis Workflow
Repository Identification
- Parse GitHub URL or accept owner/repo parameters
- Validate repository exists and is accessible
Data Collection
- Fetch commit history (API or git log)
- Retrieve release/tag information
- Collect contributor metadata
Data Processing
- Parse timestamps and author information
- Group commits by time periods
- Calculate statistics and metrics
Insight Generation
- Identify top contributors with percentages
- Calculate commit frequency trends
- Map release timeline with intervals
- Detect anomalies or interesting patterns
Visualization & Reporting
- Create charts/graphs for trends
- Generate summary statistics
- Present findings in structured format
Key Metrics to Calculate
Commit Metrics
- Total commits
- Commits per day/week/month
- Average commits per active period
- Longest streak of daily commits
- Periods of inactivity
Contributor Metrics
- Total unique contributors
- Top N contributors (typically top 5-10)
- Contribution percentage per contributor
- One-time vs. recurring contributors
- New contributors over time
Release Metrics
- Total releases
- Time between releases (min, max, average)
- Release frequency trend
- Semantic versioning patterns
- Pre-release vs. stable releases
Output Formats
Summary Report
Repository: owner/repo
Analysis Period: YYYY-MM-DD to YYYY-MM-DD
Commit Activity:
- Total Commits: N
- Active Contributors: N
- Average Commits/Week: N
- Most Active Period: YYYY-MM
Top Contributors:
1. Name (N commits, X%)
2. Name (N commits, X%)
...
Recent Releases:
- v1.2.3 (YYYY-MM-DD) - N days since previous
- v1.2.2 (YYYY-MM-DD) - N days since previous
...
Detailed Analytics
- Time-series data in CSV/JSON format
- Visualization-ready datasets
- Contributor breakdown by time period
- Release calendar with annotations
Common Patterns & Tips
Handle rate limiting: Always check API rate limit headers and implement exponential backoff
Large repositories: For repos with 10k+ commits, consider:
- Analyzing recent history only (e.g., last 12 months)
- Sampling commits rather than processing all
- Using shallow clones for git-based analysis
Privacy considerations: GitHub API exposes public data only; private repos require authentication
Timezone handling: Normalize all timestamps to UTC for consistent analysis
Bot commits: Filter out automated commits (dependabot, renovate) for human contributor analysis
Email normalization: Same contributor may use different email addresses; consider consolidation
Error Handling
- Repository not found: Verify owner/repo spelling
- Rate limit exceeded: Implement retry logic or use authentication
- Empty history: Check if repository has been initialized
- API changes: GitHub API versioning may affect endpoints