name	project-architecture-patterns
description	Architecture patterns for rRNA-Phylo project including FastAPI backend design, service organization, async task processing with Celery, Pydantic schemas, testing with pytest, Biopython integration, and API design conventions. Covers project structure, dependency injection, error handling, and configuration management.

Project Architecture Patterns

Purpose

Establish consistent architecture patterns for the rRNA-Phylo project, covering backend design, service organization, API conventions, and testing strategies.

When to Use

This skill activates when:

Setting up new services or modules
Designing API endpoints
Implementing background tasks
Working with configuration or dependencies
Writing tests
Integrating external tools (HMMER, BLAST, alignment tools)

Tech Stack

Core Backend

FastAPI: Modern async web framework
Pydantic: Data validation and settings
SQLAlchemy: ORM for database operations
Celery: Distributed task queue for long-running jobs
Redis: Message broker and cache
PostgreSQL: Primary database (SQLite for dev)

Scientific Computing

Biopython: Sequence parsing and analysis
NumPy/Pandas: Numerical computing and data manipulation
scikit-learn: ML baseline models
PyTorch: Deep learning (Phase 2+)

Testing & Quality

pytest: Testing framework
pytest-asyncio: Async test support
pytest-cov: Coverage reporting
httpx: Async HTTP client for API tests
ruff: Fast linter
black: Code formatter

Project Structure

backend/
├── app/
│   ├── __init__.py
│   ├── main.py                    # FastAPI application
│   ├── config.py                  # Settings (via Pydantic)
│   │
│   ├── models/                    # SQLAlchemy ORM models
│   │   ├── __init__.py
│   │   ├── base.py               # Base model class
│   │   ├── job.py                # Job model
│   │   ├── sequence.py           # Sequence model
│   │   └── tree.py               # Phylogenetic tree model
│   │
│   ├── schemas/                   # Pydantic schemas (API contracts)
│   │   ├── __init__.py
│   │   ├── common.py             # Shared schemas
│   │   ├── rrna.py               # rRNA detection schemas
│   │   ├── phylo.py              # Phylogenetics schemas
│   │   └── job.py                # Job status schemas
│   │
│   ├── api/                       # API routes
│   │   ├── __init__.py
│   │   ├── deps.py               # Dependency injection
│   │   └── v1/
│   │       ├── __init__.py
│   │       ├── router.py         # Main router
│   │       ├── rrna.py           # rRNA endpoints
│   │       ├── phylo.py          # Phylogenetics endpoints
│   │       └── jobs.py           # Job management
│   │
│   ├── services/                  # Business logic layer
│   │   ├── rrna/                 # rRNA detection service
│   │   ├── phylo/                # Phylogenetics service
│   │   └── sequences/            # Sequence processing
│   │
│   ├── workers/                   # Celery tasks
│   │   ├── __init__.py
│   │   ├── celery_app.py         # Celery config
│   │   └── tasks.py              # Task definitions
│   │
│   ├── core/                      # Core utilities
│   │   ├── __init__.py
│   │   ├── errors.py             # Custom exceptions
│   │   └── logging.py            # Logging setup
│   │
│   └── db/                        # Database utilities
│       ├── __init__.py
│       ├── session.py            # DB session management
│       └── migrations/           # Alembic migrations
│
├── tests/
│   ├── conftest.py               # Pytest fixtures
│   ├── unit/                     # Unit tests
│   ├── integration/              # Integration tests
│   └── fixtures/                 # Test data
│
├── requirements.txt
├── pyproject.toml
└── Dockerfile

Core Patterns

1. Configuration Management

Pattern: Use Pydantic Settings for type-safe configuration.

# app/config.py
from pydantic_settings import BaseSettings
from functools import lru_cache

class Settings(BaseSettings):
    """Application settings."""

    # API Settings
    api_v1_prefix: str = "/api/v1"
    project_name: str = "rRNA-Phylo"
    debug: bool = False

    # Database
    database_url: str = "sqlite:///./rrna_phylo.db"

    # Celery
    celery_broker_url: str = "redis://localhost:6379/0"
    celery_result_backend: str = "redis://localhost:6379/0"

    # External Tools
    hmmer_path: str = "/usr/local/bin/hmmsearch"
    blast_path: str = "/usr/local/bin/blastn"

    # Data Paths
    hmm_profiles_dir: str = "./data/hmm_profiles"
    blast_db_dir: str = "./data/blast_dbs"

    # ML Models
    ml_models_dir: str = "./models"

    class Config:
        env_file = ".env"
        case_sensitive = False

@lru_cache()
def get_settings() -> Settings:
    """Get cached settings instance."""
    return Settings()

Usage in dependencies:

# app/api/deps.py
from fastapi import Depends
from app.config import Settings, get_settings

async def get_config() -> Settings:
    return get_settings()

2. Service Layer Pattern

Pattern: Encapsulate business logic in service classes.

# app/services/rrna/detector.py
from typing import List, Optional
from app.schemas.rrna import RRNADetectionResult, RRNAType
from app.config import Settings

class RRNADetectorService:
    """Service for detecting rRNA in sequences."""

    def __init__(self, settings: Settings):
        self.settings = settings
        self.hmm_detector = HMMDetector(settings.hmm_profiles_dir)
        self.pattern_detector = PatternDetector()

    async def detect(
        self,
        sequence: str,
        methods: List[str] = ["hmm", "pattern"],
        min_confidence: float = 0.8
    ) -> List[RRNADetectionResult]:
        """
        Detect rRNA in sequence using multiple methods.

        Args:
            sequence: DNA/RNA sequence string
            methods: Detection methods to use
            min_confidence: Minimum confidence threshold

        Returns:
            List of detection results
        """
        results = []

        if "hmm" in methods:
            hmm_results = await self.hmm_detector.detect(sequence)
            results.extend(hmm_results)

        if "pattern" in methods:
            pattern_results = await self.pattern_detector.detect(sequence)
            results.extend(pattern_results)

        # Filter by confidence
        results = [r for r in results if r.confidence >= min_confidence]

        # Deduplicate and merge
        return self._merge_results(results)

    def _merge_results(
        self,
        results: List[RRNADetectionResult]
    ) -> List[RRNADetectionResult]:
        """Merge overlapping detections."""
        # Implementation here
        pass

Usage in API:

# app/api/v1/rrna.py
from fastapi import APIRouter, Depends
from app.services.rrna.detector import RRNADetectorService
from app.api.deps import get_rrna_detector

router = APIRouter()

@router.post("/detect")
async def detect_rrna(
    sequence: str,
    detector: RRNADetectorService = Depends(get_rrna_detector)
):
    """Detect rRNA in sequence."""
    results = await detector.detect(sequence)
    return {"results": results}

3. Pydantic Schemas (API Contracts)

Pattern: Define clear input/output schemas with validation.

# app/schemas/rrna.py
from pydantic import BaseModel, Field, validator
from typing import List, Optional
from enum import Enum

class RRNAType(str, Enum):
    """Supported rRNA types."""
    SSU_16S = "16S"
    SSU_18S = "18S"
    LSU_23S = "23S"
    LSU_28S = "28S"
    FIVE_S = "5S"
    FIVE_EIGHT_S = "5.8S"

class DetectionMethod(str, Enum):
    """Detection methods."""
    HMM = "hmm"
    BLAST = "blast"
    PATTERN = "pattern"
    ML = "ml"

class RRNADetectionRequest(BaseModel):
    """Request schema for rRNA detection."""

    sequence: str = Field(..., min_length=100, max_length=100000)
    methods: List[DetectionMethod] = Field(
        default=[DetectionMethod.HMM, DetectionMethod.PATTERN]
    )
    min_confidence: float = Field(default=0.8, ge=0.0, le=1.0)

    @validator("sequence")
    def validate_sequence(cls, v):
        """Ensure sequence contains only valid nucleotides."""
        valid_chars = set("ACGTUNacgtun-")
        if not set(v).issubset(valid_chars):
            raise ValueError("Sequence contains invalid characters")
        return v.upper()

class ConservedRegion(BaseModel):
    """Conserved region within detected rRNA."""
    name: str
    start: int
    end: int
    sequence: str
    confidence: float

class RRNADetectionResult(BaseModel):
    """Result schema for rRNA detection."""

    rrna_type: RRNAType
    start: int
    end: int
    length: int
    confidence: float
    method: DetectionMethod
    score: float

    # Quality metrics
    completeness: float = Field(..., ge=0.0, le=1.0)
    quality: str = Field(..., pattern="^(high|medium|low|very_low)$")

    # Optional details
    conserved_regions: Optional[List[ConservedRegion]] = None
    secondary_structure: Optional[str] = None

    class Config:
        json_schema_extra = {
            "example": {
                "rrna_type": "16S",
                "start": 0,
                "end": 1542,
                "length": 1542,
                "confidence": 0.95,
                "method": "hmm",
                "score": 1250.5,
                "completeness": 0.98,
                "quality": "high"
            }
        }

4. Dependency Injection

Pattern: Use FastAPI's dependency injection for services.

# app/api/deps.py
from typing import Generator
from fastapi import Depends
from sqlalchemy.orm import Session

from app.db.session import SessionLocal
from app.config import Settings, get_settings
from app.services.rrna.detector import RRNADetectorService

# Database dependency
def get_db() -> Generator:
    """Get database session."""
    db = SessionLocal()
    try:
        yield db
    finally:
        db.close()

# Service dependencies
def get_rrna_detector(
    settings: Settings = Depends(get_settings)
) -> RRNADetectorService:
    """Get rRNA detector service."""
    return RRNADetectorService(settings)

def get_phylo_service(
    settings: Settings = Depends(get_settings),
    db: Session = Depends(get_db)
):
    """Get phylogenetic analysis service."""
    from app.services.phylo.tree_builder import PhyloService
    return PhyloService(settings, db)

5. Error Handling

Pattern: Use custom exceptions and FastAPI exception handlers.

# app/core/errors.py
class RRNAPhyloException(Exception):
    """Base exception for rRNA-Phylo."""
    pass

class SequenceValidationError(RRNAPhyloException):
    """Raised when sequence validation fails."""
    pass

class DetectionError(RRNAPhyloException):
    """Raised when rRNA detection fails."""
    pass

class AlignmentError(RRNAPhyloException):
    """Raised when sequence alignment fails."""
    pass

class TreeBuildingError(RRNAPhyloException):
    """Raised when tree building fails."""
    pass

# app/main.py
from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
from app.core.errors import RRNAPhyloException

app = FastAPI()

@app.exception_handler(RRNAPhyloException)
async def rrna_phylo_exception_handler(
    request: Request,
    exc: RRNAPhyloException
):
    """Handle custom exceptions."""
    return JSONResponse(
        status_code=400,
        content={
            "error": exc.__class__.__name__,
            "message": str(exc)
        }
    )

6. Async Task Processing (Celery)

Pattern: Use Celery for long-running tasks.

# app/workers/celery_app.py
from celery import Celery
from app.config import get_settings

settings = get_settings()

celery_app = Celery(
    "rrna_phylo",
    broker=settings.celery_broker_url,
    backend=settings.celery_result_backend
)

celery_app.conf.update(
    task_serializer="json",
    accept_content=["json"],
    result_serializer="json",
    timezone="UTC",
    enable_utc=True,
    task_track_started=True,
    task_time_limit=3600,  # 1 hour max
)

# app/workers/tasks.py
from app.workers.celery_app import celery_app
from app.services.phylo.tree_builder import PhyloService

@celery_app.task(bind=True)
def build_phylogenetic_tree(
    self,
    sequences: List[dict],
    method: str,
    parameters: dict
):
    """
    Build phylogenetic tree (long-running task).

    Args:
        self: Task instance (for progress updates)
        sequences: List of sequences
        method: Tree building method
        parameters: Method parameters
    """
    try:
        # Update progress
        self.update_state(state="PROGRESS", meta={"stage": "alignment"})

        service = PhyloService()

        # Align sequences
        alignment = service.align_sequences(sequences)

        self.update_state(state="PROGRESS", meta={"stage": "tree_building"})

        # Build tree
        tree = service.build_tree(alignment, method, parameters)

        return {"tree": tree, "status": "completed"}

    except Exception as e:
        self.update_state(state="FAILURE", meta={"error": str(e)})
        raise

API integration:

# app/api/v1/phylo.py
from fastapi import APIRouter, BackgroundTasks
from app.workers.tasks import build_phylogenetic_tree

router = APIRouter()

@router.post("/tree")
async def create_tree(request: TreeBuildRequest):
    """Submit tree building job."""

    # Submit Celery task
    task = build_phylogenetic_tree.delay(
        sequences=request.sequences,
        method=request.method,
        parameters=request.parameters
    )

    return {
        "job_id": task.id,
        "status": "submitted"
    }

@router.get("/tree/{job_id}")
async def get_tree_status(job_id: str):
    """Get tree building job status."""
    task = build_phylogenetic_tree.AsyncResult(job_id)

    return {
        "job_id": job_id,
        "status": task.state,
        "result": task.result if task.ready() else None
    }

Testing Patterns

1. Test Structure

# tests/conftest.py
import pytest
from fastapi.testclient import TestClient
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from app.main import app
from app.db.session import Base, get_db

# Test database
SQLALCHEMY_TEST_URL = "sqlite:///./test.db"

@pytest.fixture(scope="session")
def test_engine():
    """Create test database engine."""
    engine = create_engine(SQLALCHEMY_TEST_URL)
    Base.metadata.create_all(bind=engine)
    yield engine
    Base.metadata.drop_all(bind=engine)

@pytest.fixture(scope="function")
def test_db(test_engine):
    """Create test database session."""
    TestSessionLocal = sessionmaker(bind=test_engine)
    db = TestSessionLocal()
    try:
        yield db
    finally:
        db.close()

@pytest.fixture
def client(test_db):
    """Create test client."""
    def override_get_db():
        yield test_db

    app.dependency_overrides[get_db] = override_get_db
    yield TestClient(app)
    app.dependency_overrides.clear()

@pytest.fixture
def sample_16s_sequence():
    """Sample 16S rRNA sequence for testing."""
    return "AGAGTTTGATCCTGGCTCAG..."  # Truncated

2. Unit Test Example

# tests/unit/test_rrna_detector.py
import pytest
from app.services.rrna.detector import RRNADetectorService

@pytest.mark.asyncio
async def test_detect_16s_rrna(sample_16s_sequence, get_settings):
    """Test 16S rRNA detection."""
    detector = RRNADetectorService(get_settings())

    results = await detector.detect(sample_16s_sequence)

    assert len(results) > 0
    assert results[0].rrna_type == "16S"
    assert results[0].confidence >= 0.8

3. Integration Test Example

# tests/integration/test_api.py
def test_detect_rrna_endpoint(client, sample_16s_sequence):
    """Test rRNA detection API endpoint."""
    response = client.post(
        "/api/v1/rrna/detect",
        json={
            "sequence": sample_16s_sequence,
            "methods": ["hmm", "pattern"],
            "min_confidence": 0.8
        }
    )

    assert response.status_code == 200
    data = response.json()
    assert "results" in data
    assert len(data["results"]) > 0

API Design Conventions

1. RESTful Endpoints

# rRNA Detection
POST   /api/v1/rrna/detect           # Detect rRNA
GET    /api/v1/rrna/types             # List supported types

# Phylogenetics
POST   /api/v1/phylo/align            # Align sequences
POST   /api/v1/phylo/tree             # Build tree
POST   /api/v1/phylo/bootstrap        # Bootstrap analysis
POST   /api/v1/phylo/consensus        # Consensus tree

# Jobs
GET    /api/v1/jobs                   # List jobs
GET    /api/v1/jobs/{id}              # Get job status
DELETE /api/v1/jobs/{id}              # Cancel job

# Health
GET    /health                        # Health check
GET    /metrics                       # Prometheus metrics

2. Response Format

# Success response
{
    "success": true,
    "data": { ... },
    "metadata": {
        "timestamp": "2025-11-20T12:00:00Z",
        "version": "1.0.0"
    }
}

# Error response
{
    "success": false,
    "error": {
        "code": "VALIDATION_ERROR",
        "message": "Invalid sequence format",
        "details": { ... }
    }
}

Best Practices

✅ DO

Use type hints everywhere
Write docstrings for public APIs
Use Pydantic for validation
Implement proper logging
Write tests for new features
Use async/await for I/O operations
Separate concerns (routes, services, models)
Use dependency injection
Handle errors gracefully
Document API with examples

❌ DON'T

Mix business logic with API routes
Use synchronous I/O in async functions
Hardcode configuration values
Skip input validation
Ignore exceptions
Write god classes
Couple services tightly
Skip tests
Use global state
Block the event loop

External Tool Integration Pattern

# app/services/rrna/hmm.py
import subprocess
from pathlib import Path
from typing import Optional

class HMMDetector:
    """HMM-based rRNA detection using HMMER."""

    def __init__(self, profiles_dir: str, hmmsearch_path: str = "hmmsearch"):
        self.profiles_dir = Path(profiles_dir)
        self.hmmsearch_path = hmmsearch_path

    async def detect(self, sequence: str, rrna_type: str) -> dict:
        """Run HMMER search."""

        # Create temp files
        with tempfile.NamedTemporaryFile(mode='w', suffix='.fasta') as seq_file:
            seq_file.write(f">query\n{sequence}\n")
            seq_file.flush()

            profile = self.profiles_dir / f"{rrna_type}.hmm"

            # Run HMMER
            result = subprocess.run(
                [
                    self.hmmsearch_path,
                    "--tblout", "/dev/stdout",
                    str(profile),
                    seq_file.name
                ],
                capture_output=True,
                text=True,
                timeout=300
            )

            if result.returncode != 0:
                raise DetectionError(f"HMMER failed: {result.stderr}")

            # Parse output
            return self._parse_hmmer_output(result.stdout)

Related Skills: rRNA-prediction-patterns, ml-integration-patterns

Line Count: < 500 lines ✅

project-architecture-patterns

Install Skill

SKILL.md