name	data-layer-architecture
description	Bronze Layer（LLM抽出ログ層）とGold Layer（確定データ層）の2層アーキテクチャ設計。LLM抽出結果の履歴管理と人間修正の保護を実現。抽出処理の実装、ExtractionLogの使用、is_manually_verifiedフラグの扱いに関するガイダンスを提供。

Data Layer Architecture（Bronze Layer / Gold Layer）

Purpose

LLM抽出結果と確定データを分離する2層アーキテクチャの設計ガイド。AIの抽出履歴を保持しながら、人間の修正を保護する。

When to Activate

このスキルは以下の場合にアクティベートされます：

LLM抽出処理を新規実装する時
ExtractionLogエンティティを使用する時
is_manually_verifiedフラグを扱う時
抽出結果からGoldエンティティを更新する時
Bronze/Gold Layerの設計について質問された時

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                    Bronze Layer（抽出ログ層）                 │
│                                                              │
│  - LLM抽出結果を追記専用（Immutable）で保存                   │
│  - 精度分析・トレーサビリティのための履歴                      │
│  - テーブル: extraction_logs                                 │
└─────────────────────────────────────────────────────────────┘
                              │
                              │ is_manually_verified = false の場合のみ反映
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Gold Layer（確定データ層）                  │
│                                                              │
│  - アプリケーションが参照する唯一の正解データ                   │
│  - 人間の修正が優先される                                     │
│  - テーブル: statements, politicians, speakers, etc.         │
└─────────────────────────────────────────────────────────────┘

Design Principles

1. Bronze Layer（抽出ログ層）

目的: LLM抽出結果の履歴保持、精度分析
特性: 追記専用（Immutable）、削除・更新なし
用途:
- AIモデル改善の検証データ
- 抽出精度の時系列分析
- デバッグ・トラブルシューティング

2. Gold Layer（確定データ層）

目的: ユーザーに提供する正解データ
特性: 人間の修正が最優先
用途:
- Streamlit UIでの表示
- API経由でのデータ提供
- レポート・分析の基礎データ

Data Flow

LLM抽出実行
    │
    ▼
┌────────────────────────────┐
│ ExtractionLog に必ず保存    │  ← Bronze Layer（常に履歴として残る）
│ （Immutable）               │
└────────────────────────────┘
    │
    ▼
┌────────────────────────────┐
│ is_manually_verified?      │
└────────────────────────────┘
    │                │
    │ false          │ true
    │（未検証）       │（検証済み）
    ▼                ▼
┌──────────┐    ┌──────────────────┐
│ Gold更新  │    │ Gold更新しない    │
│          │    │（人間の修正を保護）│
└──────────┘    └──────────────────┘

Target Entities

抽出オブジェクト（Bronze）	→	確定オブジェクト（Gold）
StatementExtraction	→	Statement（発言）
PoliticianExtraction	→	Politician（政治家）
SpeakerExtraction	→	Speaker（話者）
ConferenceMemberExtraction	→	ConferenceMember（会議体メンバー）
ParliamentaryGroupMemberExtraction	→	ParliamentaryGroupMember（議員団メンバー）

Key Components

ExtractionLog Entity

class EntityType(Enum):
    STATEMENT = "statement"
    POLITICIAN = "politician"
    SPEAKER = "speaker"
    CONFERENCE_MEMBER = "conference_member"
    PARLIAMENTARY_GROUP_MEMBER = "parliamentary_group_member"

@dataclass
class ExtractionLog:
    id: UUID
    entity_type: EntityType          # どのエンティティの抽出か
    entity_id: UUID                  # 対象GoldエンティティのID
    pipeline_version: str            # "gemini-2.0-flash-v1" など
    extracted_data: dict             # AIが出した生データ（JSON）
    confidence_score: Optional[float]
    extraction_metadata: dict        # モデル名、トークン数等
    created_at: datetime             # Immutable

Gold Entity Common Fields

# 全Goldエンティティ（Statement, Politician等）に追加
is_manually_verified: bool = False      # 人間が検証済みか
latest_extraction_log_id: Optional[UUID] # 最新の抽出ログへの参照

Behavior Rules

状態	AI再抽出時の動作	理由
`is_manually_verified = false`	Gold Layerを更新	最新AIの精度向上を反映
`is_manually_verified = true`	Gold Layerは更新しない	人間の判断を最優先
（両方）	Bronze Layerには常に保存	履歴・分析用

Implementation Guide

Adding New Extraction Process

UpdateEntityFromExtractionUseCase を継承して実装
抽出結果は必ず ExtractionLog に保存
is_manually_verified フラグをチェックしてから Gold を更新

class UpdateStatementFromExtractionUseCase(UpdateEntityFromExtractionUseCase):
    async def execute(
        self,
        entity_id: UUID,
        extraction_result: StatementExtractionResult,
        pipeline_version: str
    ) -> UpdateEntityResult:
        # 1. Bronze Layer: 抽出ログを必ず保存
        log = ExtractionLog(
            entity_type=EntityType.STATEMENT,
            entity_id=entity_id,
            pipeline_version=pipeline_version,
            extracted_data=extraction_result.to_dict(),
            ...
        )
        log_id = await self._extraction_log_repo.save(log)

        # 2. Gold Layer: 人間修正済みならスキップ
        entity = await self._statement_repo.find_by_id(entity_id)
        if entity.is_manually_verified:
            return UpdateEntityResult(updated=False, reason="manually_verified")

        # 3. Gold Layer: 未検証なら更新
        entity.update_from_extraction(extraction_result)
        entity.latest_extraction_log_id = log_id
        await self._statement_repo.save(entity)

        return UpdateEntityResult(updated=True)

Handling Human Modifications

Gold エンティティを直接更新
is_manually_verified = true をセット
以降のAI再抽出では上書きされない

async def mark_as_verified(entity_id: UUID) -> None:
    entity = await repo.find_by_id(entity_id)
    entity.is_manually_verified = True
    await repo.save(entity)

Quick Checklist

新しい抽出処理を実装する際のチェックリスト：

ExtractionLog にログを保存しているか
is_manually_verified をチェックしているか
latest_extraction_log_id を更新しているか
pipeline_version を適切に設定しているか
エラー時もログが保存されるか
単体テストを書いたか

Related Issues

Product Goal: #813
Implementation PBIs: #861〜#872

References

ARCHITECTURE.md: System architecture overview
DOMAIN_MODEL.md: Domain entities
DATABASE_SCHEMA.md: Database structure

data-layer-architecture

Install Skill

SKILL.md

Data Layer Architecture（Bronze Layer / Gold Layer）

Purpose

When to Activate

Architecture Overview

Design Principles

1. Bronze Layer（抽出ログ層）

2. Gold Layer（確定データ層）

Data Flow

Target Entities

Key Components

ExtractionLog Entity

Gold Entity Common Fields

Behavior Rules

Implementation Guide

Adding New Extraction Process

Handling Human Modifications

Quick Checklist

Related Issues

References