| name | adding-api-sources |
| description | Use when implementing a new data source adapter for metapyle, before writing any source code |
Adding API Sources to Metapyle
Overview
Add new financial data source adapters following TDD and established patterns. Each source provides fetch() and get_metadata() methods with lazy imports for optional dependencies.
Core principle: Use brainstorming skill first for design decisions, then implement following established patterns.
Workflow
- Design - Use
brainstormingskill to decide data model mapping - Plan - Use
writing-plansskill for implementation plan - Implement - Follow TDD with subagents (see Quick Reference)
Design Questions (Brainstorming Phase)
Before coding, answer these questions using the brainstorming skill:
| Question | Why It Matters |
|---|---|
What maps to symbol? |
Primary identifier (ticker, bbid, series ID) |
What maps to field? |
Secondary identifier if needed (PX_LAST, dataset::column) |
Need params field? |
Extra filters (tenor, location, deltaStrike) |
| Authentication model? | External (user calls auth) or internal (credentials passed) |
| Batch strategy? | Single call for all symbols, or group by some key? |
| Column naming? | Symbol only, or symbol::field for uniqueness? |
| Metadata available? | What can get_metadata() return? |
Quick Reference
| Step | Files | Key Actions |
|---|---|---|
| 1. Branch | — | git checkout -b feature/<source>-source |
| 2. Skeleton | sources/<source>.py |
Lazy import + class with NotImplementedError |
| 3. Export | sources/__init__.py |
Add import + __all__ |
| 4. Tests | tests/unit/test_sources_<source>.py |
Mock-based tests (RED) |
| 5. Implement | sources/<source>.py |
fetch() then get_metadata() (GREEN) |
| 6. Config | pyproject.toml |
Optional dep + mypy ignore |
| 7. Verify | — | pytest, mypy, ruff |
Batch Fetch API
Sources receive batched requests via Sequence[FetchRequest]:
from collections.abc import Sequence
from metapyle.sources.base import BaseSource, FetchRequest, make_column_name, register_source
@register_source("<source>")
class <Source>Source(BaseSource):
def fetch(
self,
requests: Sequence[FetchRequest],
start: str,
end: str,
) -> pd.DataFrame:
"""
Parameters
----------
requests : Sequence[FetchRequest]
Each has: symbol, field (optional), path (optional), params (optional)
start, end : str
ISO dates (YYYY-MM-DD)
Returns
-------
pd.DataFrame
DatetimeIndex, columns named via make_column_name(symbol, field)
"""
if not requests:
return pd.DataFrame()
# ... implementation
FetchRequest Fields
@dataclass(frozen=True, slots=True, kw_only=True)
class FetchRequest:
symbol: str # Required - primary identifier
field: str | None = None # Optional - e.g., "PX_LAST", "dataset::col"
path: str | None = None # Optional - for localfile source
params: dict[str, Any] | None = None # Optional - extra filters
Column Naming
Always use make_column_name() for output columns:
from metapyle.sources.base import make_column_name
# In fetch(), rename columns:
for req in requests:
col_name = make_column_name(req.symbol, req.field) # "AAPL::PX_LAST" or "AAPL"
result[col_name] = data[req.symbol]
Batch Grouping Pattern
When API requires grouping (e.g., by dataset):
def fetch(self, requests: Sequence[FetchRequest], start: str, end: str) -> pd.DataFrame:
# Group by some key (dataset_id, field type, etc.)
groups: dict[str, list[FetchRequest]] = {}
for req in requests:
key = extract_key(req.field) # Your grouping logic
groups.setdefault(key, []).append(req)
# Fetch each group (potentially in parallel)
result_dfs: list[pd.DataFrame] = []
for key, group_requests in groups.items():
symbols = [req.symbol for req in group_requests]
df = api.batch_fetch(key, symbols, start, end)
result_dfs.append(df)
# Merge results
result = result_dfs[0]
for df in result_dfs[1:]:
result = result.join(df, how="outer")
return result
Lazy Import Pattern
_LIB_AVAILABLE: bool | None = None
_lib_modules: dict[str, Any] = {}
def _get_lib() -> dict[str, Any]:
"""Lazy import of library modules."""
global _LIB_AVAILABLE, _lib_modules
if _LIB_AVAILABLE is None:
try:
from library import Module1, Module2
_lib_modules = {"Module1": Module1, "Module2": Module2}
_LIB_AVAILABLE = True
except (ImportError, Exception):
_lib_modules = {}
_LIB_AVAILABLE = False
return _lib_modules
Exception Handling
try:
data = api.fetch(symbols, start, end)
except (FetchError, NoDataError):
raise # Re-raise our exceptions as-is
except Exception as e:
logger.error("fetch_failed: symbols=%s, error=%s", symbols, str(e))
raise FetchError(f"API error: {e}") from e
if data.empty:
raise NoDataError(f"No data returned for {symbols}")
Test Pattern
class TestSourceFetch:
def test_single_request(self) -> None:
with patch("metapyle.sources.<source>._get_lib") as mock_get:
mock_lib = {"API": MagicMock()}
mock_lib["API"].fetch.return_value = mock_data
mock_get.return_value = mock_lib
source = <Source>Source()
requests = [FetchRequest(symbol="SYM", field="FIELD")]
df = source.fetch(requests, "2024-01-01", "2024-12-31")
assert "SYM::FIELD" in df.columns
assert isinstance(df.index, pd.DatetimeIndex)
pyproject.toml
[project.optional-dependencies]
<source> = ["<library>"]
[[tool.mypy.overrides]]
module = ["<library>", "<library>.*"]
ignore_missing_imports = true
Common Mistakes
| Mistake | Fix |
|---|---|
Wrong fetch() signature |
Must be fetch(requests: Sequence[FetchRequest], start, end) |
| Import at module level | Use lazy import pattern with _get_lib() |
| Manual column naming | Use make_column_name(symbol, field) |
| f-strings in logging | Use logger.debug("msg: %s", var) |
| Missing empty request check | Return pd.DataFrame() if not requests |
| Catching exceptions silently | Re-raise FetchError/NoDataError, wrap others |
TDD Order
- RED: Write test for
_get_lib()(library not installed) - GREEN: Implement lazy import
- RED: Write test for single request fetch
- GREEN: Implement basic fetch
- RED: Write test for batch fetch
- GREEN: Implement batch handling
- RED: Write error handling tests
- GREEN: Implement error handling
- VERIFY: Run full test suite, ruff, mypy