| name | cube-semantic-layer |
| description | Research-driven Cube semantic layer development. Injects research steps for data modeling, dbt integration, metrics, dimensions, REST API, GraphQL API, and SQL API (Postgres wire protocol). Use when building semantic layer, consumption APIs, metrics layer, Cube integration tests, or working with Cube MEASURE() syntax and psycopg2 connections. |
Cube Semantic Layer Development (Research-Driven)
Philosophy
This skill does NOT prescribe specific semantic models or API patterns. Instead, it guides you to:
- Research the current Cube version and capabilities
- Discover existing Cube data models and dbt integrations
- Validate your implementations against Cube documentation
- Verify integration with dbt models and consumption applications
Pre-Implementation Research Protocol
Step 1: Verify Runtime Environment
ALWAYS run this first:
# Check if Cube is installed
cube --version 2>/dev/null || echo "Cube CLI not found"
# Check if running locally
curl -s http://localhost:4000/readyz || echo "Cube API not reachable"
Critical Questions to Answer:
- Is Cube running locally or remotely?
- What version is available?
- What database connection is configured?
- Is dbt integration enabled?
Step 2: Research SDK State (if unfamiliar)
When to research: If you encounter unfamiliar Cube features or need to validate patterns
Research queries (use WebSearch):
- "Cube [feature] documentation 2025" (e.g., "Cube dbt integration 2025")
- "Cube REST API query format 2025"
- "Cube data modeling metrics dimensions 2025"
Official documentation: https://cube.dev/docs
Key documentation sections:
- Data Modeling: https://cube.dev/docs/product/data-modeling
- dbt Integration: https://cube.dev/docs/product/data-modeling/recipes/dbt
- REST API: https://cube.dev/docs/product/apis-integrations/rest-api
- cube_dbt package: https://cube.dev/docs/product/data-modeling/reference/cube_dbt
Step 3: Discover Existing Patterns
BEFORE creating new Cube models, search for existing implementations:
# Find Cube model files
find . -name "*.yml" -path "*/model/*" -o -name "*.yaml" -path "*/model/*"
find . -name "*.js" -path "*/model/*"
# Find dbt integration usage
rg "cube_dbt|from_dbt" --type yaml --type py
# Find REST API usage
rg "cube.*api|/cubejs-api" --type py --type js
Key questions:
- What Cube models (cubes) already exist?
- Is dbt integration configured? (
cube_dbtpackage usage) - What metrics and dimensions are defined?
- What API endpoints are exposed?
Step 4: Validate Against Architecture
Check architecture docs for integration requirements:
- Read
/docs/for consumption layer requirements - Understand dbt model → Cube cube mapping
- Verify API requirements for downstream applications
- Check governance metadata propagation
Implementation Guidance (Not Prescriptive)
Cube Data Model Concepts
Core concepts: Cubes, measures, dimensions, joins, pre-aggregations
Entity hierarchy:
Cube Project
├── Cubes (semantic models on top of tables/views)
│ ├── Measures (metrics: SUM, COUNT, AVG, etc.)
│ ├── Dimensions (attributes for grouping/filtering)
│ ├── Joins (relationships between cubes)
│ └── Pre-aggregations (materialized rollups)
└── Views (organize cubes for consumers)
Research questions:
- What cubes should be created? (one per dbt model? logical grouping?)
- What measures are needed? (KPIs, aggregations)
- What dimensions? (time, categorical, hierarchical)
- What joins between cubes?
dbt Integration (cube_dbt package)
Core concept: Generate Cube models from dbt manifest.json
Research questions:
- Should Cube models be generated from dbt? (vs hand-written)
- Where is dbt manifest.json located?
- What dbt models should be exposed in Cube?
- How should dbt model metadata be enriched in Cube?
SDK features to research:
cube_dbtpackage: dbt integration utilitiescube_dbt.load_dbt_project(): Load dbt metadatacube_dbt.dbt_model_to_cube(): Convert dbt model to Cube cube- manifest.json parsing: Extract dbt model metadata
- dbt meta tags: Propagate to Cube dimensions/measures
Cubes (Semantic Models)
Core concept: Cubes define the semantic layer on top of physical tables
Research questions:
- What is the SQL definition? (table, view, subquery)
- What measures should be exposed?
- What dimensions?
- What joins to other cubes?
- What pre-aggregations for performance?
SDK features to research:
- Cube definition: YAML or JavaScript
sqlproperty: Define data source (table, view, SQL)- Measures:
type(sum, count, avg, min, max, count_distinct, etc.) - Dimensions:
type(string, number, time, boolean, geo) - Joins:
relationship(one_to_one, one_to_many, many_to_one) - Pre-aggregations: Materialized rollups for performance
Measures (Metrics)
Core concept: Measures are aggregatable metrics
Research questions:
- What KPIs need to be calculated?
- What aggregation type? (sum, count, average, distinct count)
- Are there calculated measures? (ratios, percentages)
- What filters apply to measures?
SDK features to research:
- Measure types:
sum,count,avg,min,max,count_distinct,number - Calculated measures:
sqlproperty with expressions - Filters:
filtersproperty on measures - Rollup measures: Aggregating pre-aggregations
Dimensions (Attributes)
Core concept: Dimensions are attributes for grouping and filtering
Research questions:
- What categorical dimensions? (status, category, region)
- What time dimensions? (date, timestamp, granularity)
- What hierarchical dimensions? (geography, org structure)
- Should dimensions be hidden or exposed?
SDK features to research:
- Dimension types:
string,number,time,boolean,geo - Time dimensions:
granularities(day, week, month, year) - Subquery dimensions: Complex SQL expressions
- Case expressions: Conditional logic in dimensions
Joins (Relationships)
Core concept: Joins define relationships between cubes
Research questions:
- What relationships exist? (user → orders, product → sales)
- What join type? (one-to-one, one-to-many, many-to-one)
- What join SQL?
- Are joins bidirectional?
SDK features to research:
- Join definition:
relationship,sqlproperty - Join types:
one_to_one,one_to_many,many_to_one - Join SQL: Foreign key relationships
- Many-to-many: Through tables
Pre-aggregations (Performance)
Core concept: Pre-aggregations are materialized rollups for query performance
Research questions:
- What queries are slow?
- What aggregations are common? (daily rollups, by category)
- What materialization schedule?
- What storage for pre-aggregations?
SDK features to research:
- Pre-aggregation definition: Specify measures, dimensions, granularity
- Refresh strategies:
every,update_window - Partitioning: Time-based partitions
- External storage: Store in data warehouse
REST API (Consumption)
Core concept: Cube exposes REST API for data queries
Research questions:
- What API endpoints are needed?
- What authentication?
- What query patterns? (measures + dimensions, filters, time ranges)
- How should API be consumed? (Python, JavaScript, BI tools)
SDK features to research:
- Query format: JSON with
measures,dimensions,filters,timeDimensions - Authentication: JWT tokens, API keys
- Response format:
data(result set) +schema(column metadata) - Pagination:
limit,offset
GraphQL API (Consumption)
Core concept: Cube exposes GraphQL API for flexible querying
Research questions:
- What queries are supported?
- What mutations? (none - read-only)
- What authentication?
- How to query multiple cubes?
SDK features to research:
- Endpoint:
/cubejs-api/graphql(NOT/graphql) - Query format:
cubequery withmeasures,dimensions,filters - Authentication: Same as REST API (JWT, API keys)
- Introspection: Schema discovery via
__schemaquery
Example GraphQL query:
query {
cube(
measures: ["Orders.count", "Orders.totalAmount"]
dimensions: ["Orders.status"]
filters: [{ dimension: "Orders.status", operator: "equals", values: ["completed"] }]
) {
Orders {
count
totalAmount
status
}
}
}
SQL API (Postgres Wire Protocol)
Core concept: Cube exposes SQL API via Postgres wire protocol for BI tool integration
Research questions:
- What port is SQL API on? (default: 15432)
- What authentication? (user/password)
- What SQL dialect? (Postgres-compatible with Cube extensions)
- What BI tools are supported? (Tableau, Looker, Metabase, psycopg2)
SDK features to research:
- Port:
CUBEJS_PG_SQL_PORT(default 15432) - Authentication:
CUBEJS_SQL_USER,CUBEJS_SQL_PASSWORD - Connection: Standard Postgres clients (psycopg2, JDBC, etc.)
- Query syntax: Use
MEASURE()function for aggregates - Read-only: INSERT/UPDATE/DELETE not supported
CRITICAL - Cube SQL syntax differs from standard SQL:
-- Cube SQL API syntax (NOT standard Postgres)
SELECT
MEASURE(count) AS order_count,
MEASURE(total_amount) AS total
FROM Orders
WHERE status = 'completed'
GROUP BY 1
-- Dimensions referenced directly (no quotes)
SELECT status, MEASURE(count) FROM Orders GROUP BY 1
-- Time dimensions with granularity
SELECT
DATE_TRUNC('month', created_at) AS month,
MEASURE(count)
FROM Orders
GROUP BY 1
Python psycopg2 example:
import psycopg2
conn = psycopg2.connect(
host="localhost",
port=15432,
user="cube",
password="cube_password",
database="cube"
)
cursor = conn.cursor()
cursor.execute("SELECT MEASURE(count) FROM Orders")
results = cursor.fetchall()
Docker configuration:
environment:
CUBEJS_PG_SQL_PORT: "15432"
CUBEJS_SQL_USER: ${CUBEJS_SQL_USER:-cube}
CUBEJS_SQL_PASSWORD: ${CUBEJS_SQL_PASSWORD:-cube_password}
Validation Workflow
Before Implementation
- ✅ Verified Cube availability (local or remote)
- ✅ Searched for existing Cube models and dbt integration
- ✅ Read architecture docs for consumption layer requirements
- ✅ Identified dbt models to expose in semantic layer
- ✅ Researched unfamiliar Cube features
During Implementation
- ✅ Using
cube_dbtfor dbt integration (if applicable) - ✅ Proper cube definitions with measures and dimensions
- ✅ Joins correctly defined between cubes
- ✅ Pre-aggregations for performance optimization
- ✅ Metadata from dbt propagated to Cube
- ✅ Governance metadata (classifications) exposed
After Implementation
- ✅ Verify Cube models load without errors
- ✅ Test REST API queries (
/cubejs-api/v1/load) - ✅ Test GraphQL API queries (
/cubejs-api/graphql) - ✅ Test SQL API queries (port 15432,
MEASURE()syntax) - ✅ Verify measures calculate correctly
- ✅ Test dimension filtering and grouping
- ✅ Check pre-aggregation materialization
- ✅ Validate integration with downstream apps (BI tools, dashboards)
Context Injection (For Future Claude Instances)
When this skill is invoked, you should:
Verify runtime state (don't assume):
cube --version curl -s http://localhost:4000/readyzDiscover existing patterns (don't invent):
find . -name "*.yml" -path "*/model/*" rg "cube_dbt" --type yamlResearch when uncertain (don't guess):
- Use WebSearch for "Cube [feature] documentation 2025"
- Check official docs: https://cube.dev/docs
Validate against architecture (don't assume requirements):
- Read relevant architecture docs in
/docs/ - Understand dbt model mapping to Cube cubes
- Check consumption API requirements
- Read relevant architecture docs in
Check dbt integration (if applicable):
- Verify dbt manifest.json location
- Check
cube_dbtpackage usage - Understand dbt meta tag propagation
Quick Reference: Common Research Queries
Use these WebSearch queries when encountering specific needs:
- dbt integration: "Cube dbt integration cube_dbt package 2025"
- Data modeling: "Cube data modeling cubes measures dimensions 2025"
- Measures: "Cube measures metrics types documentation 2025"
- Dimensions: "Cube dimensions time dimensions 2025"
- Joins: "Cube joins relationships between cubes 2025"
- Pre-aggregations: "Cube pre-aggregations materialization 2025"
- REST API: "Cube REST API query format 2025"
- GraphQL API: "Cube GraphQL API query format endpoint 2025"
- SQL API: "Cube SQL API Postgres wire protocol MEASURE syntax 2025"
- SQL API psycopg2: "Cube SQL API Python psycopg2 connection 2025"
- Python client: "Cube Python REST API client 2025"
- Integration tests: "Cube integration testing pytest httpx 2025"
Integration Points to Research
dbt → Cube Integration
Key question: How are dbt models exposed in Cube semantic layer?
Research areas:
cube_dbtpackage installation and configuration- manifest.json loading (
load_dbt_project()) - dbt model → Cube cube conversion
- dbt meta tags → Cube dimensions/measures metadata
- Selective model exposure (which dbt models to include)
Cube → Consumption Applications
Key question: How do applications query the Cube semantic layer?
Research areas:
- REST API query format (JSON)
- SQL API for BI tools (Tableau, Looker, Metabase)
- GraphQL API for frontend apps
- JavaScript SDK for web applications
- Authentication and access control
Governance Metadata Propagation
Key question: How should governance metadata flow to Cube?
Research areas:
- dbt meta tags → Cube dimension metadata
- Classification exposure in Cube API
- Access control (row-level, column-level)
- Audit logging for API queries
Cube Development Workflow
Local Development
# Install Cube CLI
npm install -g cubejs-cli
# Create new Cube project (if needed)
cubejs create my-cube-project -d postgres
# Start Cube development server
npm run dev
# Access Cube Playground
open http://localhost:4000
Using dbt Integration
# Install cube_dbt in Cube project
# In model/schema.yml
cubes:
- name: my_cube
public: true
sql: >
SELECT * FROM {{ dbt.ref('my_dbt_model') }}
measures:
- name: count
type: count
dimensions:
- name: id
sql: id
type: number
REST API Query
import requests
# Query Cube REST API
response = requests.post(
"http://localhost:4000/cubejs-api/v1/load",
json={
"query": {
"measures": ["orders.count", "orders.total_amount"],
"dimensions": ["orders.status"],
"timeDimensions": [{
"dimension": "orders.created_at",
"granularity": "day",
"dateRange": "last 30 days"
}]
}
},
headers={"Authorization": "Bearer <token>"}
)
data = response.json()
print(data["data"]) # Result set
print(data["annotation"]) # Schema metadata
References
- Cube Documentation: Official documentation
- dbt Integration: Using Cube with dbt
- cube_dbt package: dbt integration reference
- REST API Reference: REST API documentation
- GitHub Repository: Cube source code
- cube_dbt GitHub: dbt integration package
Remember: This skill provides research guidance, NOT prescriptive semantic models. Always:
- Verify Cube availability and version
- Discover existing Cube models and dbt integration patterns
- Research Cube capabilities when needed (use WebSearch liberally)
- Validate against actual consumption API requirements
- Test REST API queries and measure calculations before considering complete
- Understand dbt → Cube integration for metadata propagation