name	career-growth
description	Portfolio building, technical interviews, job search strategies, and continuous learning
sasmp_version	1.3.0
bonded_agent	01-data-engineer
bond_type	SUPPORT_BOND
skill_version	2.0.0
last_updated	2025-01
complexity	foundational
estimated_mastery_hours	40
prerequisites
unlocks

Career Growth

Name: career-growth
Author: pluginagentmarketplace

Professional development strategies for data engineering career advancement.

Quick Start

# Data Engineer Portfolio Checklist

## Required Projects (Pick 3-5)
- [ ] End-to-end ETL pipeline (Airflow + dbt)
- [ ] Real-time streaming project (Kafka/Spark Streaming)
- [ ] Data warehouse design (Snowflake/BigQuery)
- [ ] ML pipeline with MLOps (MLflow)
- [ ] API for data access (FastAPI)

## Documentation Template
Each project should include:
1. Problem statement
2. Architecture diagram
3. Tech stack justification
4. Challenges & solutions
5. Results/metrics
6. GitHub link with clean code

Core Concepts

1. Technical Interview Preparation

# Common coding patterns for data engineering interviews

# 1. SQL Window Functions
"""
Write a query to find the running total of sales by month,
and the percentage change from the previous month.
"""
sql = """
SELECT
    month,
    sales,
    SUM(sales) OVER (ORDER BY month) AS running_total,
    100.0 * (sales - LAG(sales) OVER (ORDER BY month))
        / NULLIF(LAG(sales) OVER (ORDER BY month), 0) AS pct_change
FROM monthly_sales
ORDER BY month;
"""

# 2. Data Processing - Find duplicates
def find_duplicates(data: list[dict], key: str) -> list[dict]:
    """Find duplicate records based on a key."""
    seen = {}
    duplicates = []
    for record in data:
        k = record[key]
        if k in seen:
            duplicates.append(record)
        else:
            seen[k] = record
    return duplicates

# 3. Implement rate limiter
from collections import defaultdict
import time

class RateLimiter:
    def __init__(self, max_requests: int, window_seconds: int):
        self.max_requests = max_requests
        self.window = window_seconds
        self.requests = defaultdict(list)

    def is_allowed(self, user_id: str) -> bool:
        now = time.time()
        # Remove old requests
        self.requests[user_id] = [
            t for t in self.requests[user_id]
            if now - t < self.window
        ]
        if len(self.requests[user_id]) < self.max_requests:
            self.requests[user_id].append(now)
            return True
        return False

# 4. Design question: Data pipeline for e-commerce
"""
Requirements:
- Process 1M orders/day
- Real-time dashboard updates
- Historical analytics

Architecture:
1. Ingestion: Kafka for real-time events
2. Processing: Spark Streaming for aggregations
3. Storage: Delta Lake for ACID, Snowflake for analytics
4. Serving: Redis for real-time metrics, API for dashboards
"""

2. Resume Optimization

## Data Engineer Resume Template

### Summary
Data Engineer with X years of experience building scalable data pipelines
processing Y TB/day. Expert in [Spark/Airflow/dbt]. Reduced pipeline
latency by Z% at [Company].

### Experience Format (STAR Method)
**Senior Data Engineer** | Company | 2022-Present
- **Situation**: Legacy ETL system processing 500GB daily with 4-hour latency
- **Task**: Redesign for real-time analytics
- **Action**: Built Spark Streaming pipeline with Delta Lake, implemented
  incremental processing
- **Result**: Reduced latency to 5 minutes, cut infrastructure costs by 40%

### Skills Section
**Languages**: Python, SQL, Scala
**Frameworks**: Spark, Airflow, dbt, Kafka
**Databases**: PostgreSQL, Snowflake, MongoDB, Redis
**Cloud**: AWS (Glue, EMR, S3), GCP (BigQuery, Dataflow)
**Tools**: Docker, Kubernetes, Terraform, Git

### Quantify Everything
- "Built data pipeline" → "Built pipeline processing 2TB/day with 99.9% uptime"
- "Improved performance" → "Reduced query time from 30min to 30sec (60x improvement)"

3. Interview Questions to Ask

## Questions for Data Engineering Interviews

### About the Team
- What does a typical data pipeline look like here?
- How do you handle data quality issues?
- What's the tech stack? Any planned migrations?

### About the Role
- What would success look like in 6 months?
- What's the biggest data challenge the team faces?
- How do data engineers collaborate with data scientists?

### About Engineering Practices
- How do you handle schema changes in production?
- What's your approach to testing data pipelines?
- How do you manage technical debt?

### Red Flags to Watch For
- "We don't have time for testing"
- "One person handles all the data infrastructure"
- "We're still on [very outdated technology]"
- Vague answers about on-call and incident response

4. Learning Path by Experience Level

## Career Progression

### Junior (0-2 years)
Focus Areas:
- SQL proficiency (complex queries, optimization)
- Python for data processing
- One cloud platform deeply (AWS/GCP)
- Git and basic CI/CD
- Understanding ETL patterns

### Mid-Level (2-5 years)
Focus Areas:
- Distributed systems (Spark)
- Data modeling (dimensional, Data Vault)
- Orchestration (Airflow)
- Infrastructure as Code
- Data quality frameworks

### Senior (5+ years)
Focus Areas:
- System design and architecture
- Cost optimization at scale
- Team leadership and mentoring
- Cross-functional collaboration
- Vendor evaluation and selection

### Staff/Principal (8+ years)
Focus Areas:
- Organization-wide data strategy
- Building data platforms
- Technical roadmap ownership
- Industry thought leadership

Resources

Learning Platforms

Interview Prep

Community

Books

"Fundamentals of Data Engineering" - Reis & Housley
"Designing Data-Intensive Applications" - Kleppmann
"The Data Warehouse Toolkit" - Kimball

Best Practices

# ✅ DO:
- Build public projects on GitHub
- Write technical blog posts
- Contribute to open source
- Network at meetups/conferences
- Keep skills current (follow trends)

# ❌ DON'T:
- Apply without tailoring resume
- Neglect soft skills
- Stop learning after getting hired
- Ignore feedback from interviews
- Burn bridges when leaving jobs

Skill Certification Checklist:

Have 3+ portfolio projects on GitHub
Can explain system design decisions
Can solve SQL problems efficiently
Have updated LinkedIn and resume
Active in data engineering community

career-growth

Install Skill

SKILL.md

Career Growth

Quick Start

Core Concepts

1. Technical Interview Preparation

2. Resume Optimization

3. Interview Questions to Ask

4. Learning Path by Experience Level

Resources

Learning Platforms

Interview Prep

Community

Books

Best Practices