| name | instance-resource-design |
| description | Guide for designing Instance resources in OptAIC. Use when creating DatasetInstance, SignalInstance, ExperimentInstance, ModelInstance, PortfolioOptimizerInstance, or BacktestInstance. Covers definition references, config patterns, composition, flow execution pairing, and scheduling. |
Instance Resource Design Patterns
Guide for designing Instance resources that configure and execute Definition plugins.
When to Use
Apply when:
- Creating configured dataset/signal/model instances
- Designing composition patterns (Pipeline + Store + Accessor)
- Implementing scheduling and freshness tracking
- Pairing Flow Execution Resources with Instances
- Building special cases like BacktestInstance (no definition)
Core Concept: Configured Usage
Instances reference Definitions and provide runtime configuration:
Instance = Configured Usage
├── definition_resource_id # Which Definition to use
├── definition_version_id # Pinned version (optional)
├── config_json # Runtime configuration
├── schedule_json # Cron/refresh schedule
├── upstream_refs # Connected upstream resources
└── flow_execution_handles # Prefect deployments, MLflow experiments
Instance ↔ Flow Pairing
Critical Concept: When an Instance is created, Flow Execution Resources are also created.
Flow Execution Resources are static Prefect deployments (or equivalent orchestration handles) that are:
- Created when Instance is created
- Paired 1:1 or 1:N with Instance (some Instances have multiple flows)
- Stored as handles in the Instance extension table
- The "execution capability" vs Runs which are "execution activities"
DatasetInstance creation:
├── Create Resource record
├── Create extension table record
├── Create Prefect deployment for refresh flow
└── Store deployment_id in instance.prefect_deployment_id
See references/flow-pairing.md.
Instance Types
| Type | Parent | Definition Ref | Flow Count | Notes |
|---|---|---|---|---|
DatasetInstance |
Project | PipelineDef + StoreDef + AccessorDef | 1 | refresh_flow |
SignalInstance |
Project | Inherits from DatasetInstance | 1 | Promoted dataset |
ExperimentInstance |
Project | OpDef/OpMacroDef | 1 | preview_flow |
ModelInstance |
Project | MLModuleDef | 3 | train/infer/monitor |
PortfolioOptimizerInstance |
Project | PortfolioOptimizerDef | 1 | optimize_flow |
BacktestInstance |
Project | None | 1 | Fixed procedure |
Multi-Flow Instances
Some Instance types have multiple Flow Execution Resources:
ModelInstance:
├── training_flow → TrainingRun activities
├── inference_flow → InferenceRun activities
└── monitoring_flow → MonitoringRun activities
Instance Extension Table:
├── prefect_training_deployment_id
├── prefect_inference_deployment_id
├── prefect_monitoring_deployment_id
├── mlflow_experiment_id (training tracking)
├── mlflow_registered_model_name (after promotion)
└── evidently_project_id (monitoring dashboard)
Lineage is Flow-to-Flow
Dependencies track flow statuses, not instance relationships:
DatasetInstance.refresh_flow
↓ depends on
UpstreamDataset.refresh_flow status = READY
Lineage checking uses check_upstream_freshness() to verify all upstream
flow statuses before executing a downstream flow.
Status Aggregation
Instance status aggregates from its Flow(s):
# Single-flow Instance (DatasetInstance)
instance.status = flow.status
# Multi-flow Instance (ModelInstance)
instance.status = aggregate([
training_flow.status,
inference_flow.status,
monitoring_flow.status,
])
# Uses min-severity: READY only if ALL flows are READY
Definition specifies the status_aggregation_contract:
{
"status_aggregation_contract": {
"aggregation_method": "min_severity",
"status_priority": ["ERROR", "STALE", "RUNNING", "READY"]
}
}
Composition Pattern
DatasetInstance composes multiple definitions:
DatasetInstance
├── pipeline_instance_id → PipelineInstance → PipelineDef
├── store_instance_id → StoreInstance → StoreDef
└── accessor_instance_id → AccessorInstance → AccessorDef
See references/composition.md.
Config Structure
instance_metadata = {
"definition_resource_id": "uuid",
"definition_version_id": "uuid (optional)",
"config_json": {
"symbols": ["AAPL", "MSFT", "GOOGL"],
"start_date": "2020-01-01",
"lookback_days": 252
},
"schedule_json": {
"type": "cron",
"expression": "0 6 * * 1-5",
"timezone": "America/New_York"
},
"upstream_refs": [
{"resource_id": "uuid", "role": "input"},
{"resource_id": "uuid", "role": "covariance"}
]
}
Special Case: BacktestInstance
BacktestInstance has no Definition - the backtest procedure is fixed:
backtest_instance = {
"type": "BacktestInstance",
"name": "Q1_2024_Backtest",
"metadata_json": {
# No definition_resource_id
"assets_json": {
"universe": ["SPY", "QQQ", "IWM"],
"benchmark": "SPY"
},
"signals_json": {
"primary": "uuid-of-signal-instance",
"secondary": ["uuid-1", "uuid-2"]
},
"date_range_json": {
"start": "2024-01-01",
"end": "2024-03-31"
},
"config_json": {
"rebalance_frequency": "daily",
"transaction_costs": 0.001,
"slippage_model": "linear"
}
}
}
Implementation Checklist
- Reference parent Definition via
definition_resource_id - Pin version if reproducibility needed (
definition_version_id) - Design
config_jsonmatching Definition'sparameters_schema - Track
upstream_refsfor lineage - Add freshness tracking fields if scheduled
- Create extension table in
libs/db/models/ - Create Flow Execution Resources on Instance creation
- Create Prefect deployment(s) for each flow type
- Store deployment IDs in extension table
- Register with external systems (MLflow, EvidentlyAI)
- Implement status aggregation if multi-flow Instance
- Set up real-time subscriptions via Centrifugo
Reference Files
- Composition - Dataset composition pattern
- Examples - Complete Instance examples
- Scheduling - Schedule configuration
- Flow Pairing - Flow Execution Resource pairing