name

mosaic-ai-agent

description

Expert guidance for building production-ready tool-calling agents with Databricks Mosaic AI Agent Framework. Use when users need to create agents that orchestrate multiple data sources or APIs, implement LangChain-based agentic workflows, design Foundation Model-powered tool selection, optimize agent prompts and decision-making, or architect multi-tool agent systems. Covers agent architecture patterns, tool design best practices, Foundation Model integration, and common pitfalls.

Mosaic AI Agent Builder

Build production-ready tool-calling agents that intelligently orchestrate data sources and APIs using Databricks Foundation Models and LangChain.

Core Concepts

What is a Tool-Calling Agent?

A tool-calling agent uses an LLM to:

Understand user intent
Decide which tool(s) to call
Execute selected tools
Synthesize results into a response

Key advantage: Dynamic routing - the LLM adapts to each query instead of following rigid logic.

When to Use Agents vs. Direct LLM Calls

Use agents when:

Query complexity varies (some need 1 tool, others need 3+)
Tool selection depends on nuanced intent
You need multi-step reasoning
Tools can be composed in different ways

Use direct LLM calls when:

Single, predictable tool usage
Deterministic routing logic
Low latency is critical
Cost optimization is paramount

Problem-Solution Patterns

Problem 1: Agent Calls Wrong Tools

Symptoms:

Agent uses web search instead of internal data source
Calls inventory tool for customer behavior questions
Skips relevant tools entirely

Root causes:

Vague tool descriptions
Overlapping tool responsibilities
Insufficient examples in docstrings

Solution:

# ❌ BAD - Vague description
@tool
def query_database(question: str) -> str:
    """Query the database"""
    pass

# ✅ GOOD - Specific with examples
@tool
def query_customer_behavior(question: str) -> str:
    """
    Query customer behavior analytics for purchase patterns and preferences.
    
    Use this tool when users ask about:
    - Product trends: "What products are trending?"
    - Shopping channels: "Which channels do customers prefer?"
    - Customer segments: "Which segments respond to promotions?"
    - Purchase patterns: "When do customers typically buy?"
    
    Do NOT use for:
    - Inventory levels (use query_inventory instead)
    - External market data (use web_search instead)
    """
    pass

Best practices:

Include 3-5 concrete example questions
Explicitly list what NOT to use the tool for
Use domain-specific terminology
Keep descriptions under 150 words

Problem 2: Agent Gets Stuck in Loops

Symptoms:

Calls same tool repeatedly with identical queries
Exceeds max iterations
Never reaches a final answer

Root causes:

Tool returns errors without guidance
Ambiguous tool outputs
Missing synthesis instructions

Solution:

# Configure executor with proper limits
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=5,  # Prevent infinite loops
    handle_parsing_errors=True,  # Gracefully handle errors
    early_stopping_method="generate"  # Force answer after max iterations
)

# Ensure tools return actionable results
@tool
def query_data(question: str) -> str:
    try:
        result = fetch_data(question)
        if not result:
            return "No data found. Try rephrasing or use a different time range."
        return result
    except Exception as e:
        return f"Query failed: {str(e)}. Consider checking data availability."

Problem 3: Poor Multi-Tool Synthesis

Symptoms:

Agent lists tool outputs separately without analysis
Contradictory information not resolved
Missing insights from combining data

Root causes:

Weak system prompt
LLM not instructed to synthesize
Temperature too low

Solution:

system_prompt = """You are a data analysis assistant with access to multiple tools.

CRITICAL: When you call multiple tools, you MUST:
1. Identify connections and patterns across tool results
2. Resolve any contradictions with reasoning
3. Provide unified insights, not separate summaries
4. Highlight actionable recommendations

Example of good synthesis:
"Based on customer behavior data (Tool 1), Products X and Y are trending.
However, inventory analysis (Tool 2) shows 60-day supply of both—well above
the 30-day target. This indicates overstock risk despite high demand.
Recommendation: Launch promotions to clear inventory while demand is strong."

Example of bad synthesis:
"Tool 1 says products are trending. Tool 2 says inventory is high."
"""

# Use appropriate temperature
llm = ChatDatabricks(
    endpoint="databricks-meta-llama-3-1-70b-instruct",
    temperature=0.3,  # Balance creativity and consistency
    max_tokens=2000
)

Problem 4: Slow Agent Response Times

Symptoms:

Queries take >30 seconds
Users abandon before completion
High costs from unnecessary tool calls

Root causes:

Sequential tool execution
Redundant tool calls
No caching

Solutions:

Strategy 1: Implement caching

from functools import lru_cache

@lru_cache(maxsize=100)
def query_customer_behavior_cached(question: str) -> str:
    """Cached version of customer behavior queries"""
    return query_customer_behavior(question)

Strategy 2: Use streaming for better UX

# Return intermediate results to user
for step in agent_executor.stream({"input": query}):
    if "intermediate_step" in step:
        print(f"Calling tool: {step['intermediate_step'][0].tool}...")

Strategy 3: Optimize tool implementation

# Ensure tools don't do unnecessary work
@tool
def query_inventory(question: str) -> str:
    # Add query caching at data source level
    # Use efficient query patterns
    # Return concise summaries, not raw data
    pass

Agent Architecture Patterns

Pattern 1: Single-Domain Agent

Use case: All tools access same domain (e.g., only internal DBs)

tools = [
    query_sales_db,
    query_inventory_db,
    query_customer_db
]

system_prompt = """You are an internal data analyst.
All tools access company databases. Choose tools based on data domain."""

Pros: Simple, fast tool selection
Cons: Can't incorporate external data

Pattern 2: Multi-Domain Agent

Use case: Mix of internal and external sources

tools = [
    query_internal_data,  # Genie rooms
    search_web,           # External API
    query_company_docs    # Document search
]

system_prompt = """You are an analyst with internal and external data access.

Prioritization:
1. Check internal tools first for company-specific data
2. Use external tools for market trends, events, competitor info
3. Combine sources when appropriate"""

Pros: Comprehensive answers
Cons: More complex tool selection

Pattern 3: Specialized Sub-Agents

Use case: Complex domains with distinct sub-workflows

# Main coordinator agent
coordinator_tools = [
    delegate_to_analyst_agent,
    delegate_to_forecasting_agent,
    delegate_to_reporting_agent
]

# Each sub-agent has its own tools and expertise
# Use only when orchestration complexity justifies it

Foundation Model Selection

Model Comparison for Agents

Model	Best For	Tradeoffs
Llama 3.1 70B	Balanced performance, cost	Good tool selection, moderate speed
Llama 3.1 405B	Complex reasoning, multiple tools	Slower, more expensive
DBRX Instruct	Fast responses, simple routing	Less sophisticated reasoning
Claude Sonnet	Excellent tool use, synthesis	Higher cost, external API

Configuration Guidelines

# For simple agents (2-3 tools, clear boundaries)
llm = ChatDatabricks(
    endpoint="databricks-dbrx-instruct",
    temperature=0.1,
    max_tokens=1500
)

# For complex agents (5+ tools, nuanced decisions)
llm = ChatDatabricks(
    endpoint="databricks-meta-llama-3-1-70b-instruct",
    temperature=0.2,
    max_tokens=2500
)

Tool Design Best Practices

Principle 1: Single Responsibility

Each tool should do ONE thing well.

# ❌ BAD - Tool does too much
@tool
def query_all_data(question: str) -> str:
    """Query any data source based on the question"""
    pass

# ✅ GOOD - Focused tools
@tool
def query_customer_behavior(question: str) -> str:
    """Query customer behavior data"""
    pass

@tool
def query_inventory_status(question: str) -> str:
    """Query inventory levels"""
    pass

Principle 2: Clear Inputs/Outputs

Make tool interfaces obvious.

# ❌ BAD - Ambiguous signature
@tool
def get_data(input: str) -> str:
    pass

# ✅ GOOD - Clear semantics
@tool
def query_sales_by_region(
    region: str,
    start_date: str,
    end_date: str
) -> str:
    """
    Args:
        region: Geographic region (e.g., "North America", "EMEA")
        start_date: ISO format (e.g., "2024-01-01")
        end_date: ISO format (e.g., "2024-12-31")
    
    Returns:
        Sales summary with total revenue and top products
    """
    pass

Principle 3: Error Handling

Tools should fail gracefully.

@tool
def query_external_api(query: str) -> str:
    try:
        response = call_api(query)
        if not response:
            return "No results found. Try a different query."
        return response
    except TimeoutError:
        return "API timeout. The service may be temporarily unavailable."
    except Exception as e:
        return f"Error: {str(e)}. Please try again or contact support."

Prompt Engineering for Agents

System Prompt Structure

system_prompt = """
[Role Definition]
You are a [specific role] with access to [tools description].

[Capabilities]
Your tools allow you to:
- [Capability 1]
- [Capability 2]

[Decision Guidelines]
When selecting tools:
1. [Guideline 1]
2. [Guideline 2]

[Synthesis Instructions]
When combining tool results:
- [Instruction 1]
- [Instruction 2]

[Output Format]
Always provide:
- [Element 1]
- [Element 2]
"""

Few-Shot Examples in Prompts

system_prompt = """You are an analyst with customer and inventory tools.

Example 1:
User: "What products are trending?"
Reasoning: Customer behavior question → use query_customer_behavior
Action: Call query_customer_behavior("trending products")

Example 2:
User: "Trending products at risk of overstock?"
Reasoning: Needs both demand and supply data
Action: Call query_customer_behavior + query_inventory_status, then synthesize

Use this reasoning pattern for all queries."""

Testing & Iteration

Test Cases for Agent Validation

test_cases = [
    # Single tool - unambiguous
    {
        "query": "What products are trending?",
        "expected_tools": ["query_customer_behavior"],
        "expected_not_called": ["query_inventory", "web_search"]
    },
    # Multi-tool - requires synthesis
    {
        "query": "Trending products at risk of overstock?",
        "expected_tools": ["query_customer_behavior", "query_inventory"],
        "tool_order": "any"
    },
    # Edge case - ambiguous query
    {
        "query": "Tell me about products",
        "expected_behavior": "ask_clarification"
    }
]

Debugging with Verbose Mode

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,  # Shows LLM reasoning
    return_intermediate_steps=True
)

result = agent_executor.invoke({"input": "your query"})

# Inspect tool calls
for step in result['intermediate_steps']:
    tool_name = step[0].tool
    tool_input = step[0].tool_input
    tool_output = step[1]
    print(f"Tool: {tool_name}\nInput: {tool_input}\nOutput: {tool_output}\n")

Common Pitfalls

Pitfall 1: Over-Engineering

Mistake: Creating 20+ micro-tools
Fix: Start with 3-5 tools, split only when tool descriptions exceed 200 words

Pitfall 2: Under-Specifying Tools

Mistake: Assuming LLM "knows" when to use tools
Fix: Explicit examples and counter-examples in docstrings

Pitfall 3: Ignoring Latency

Mistake: Not optimizing for response time
Fix: Profile tool execution, implement caching, consider async patterns

Pitfall 4: No Evaluation

Mistake: Deploying without systematic testing
Fix: Create test suite with expected tool selections (see agent-mlops skill)

Integration with Databricks

Using Databricks Foundation Models

from langchain_community.chat_models import ChatDatabricks

llm = ChatDatabricks(
    endpoint="databricks-meta-llama-3-1-70b-instruct",
    temperature=0.1,
    max_tokens=2000
)

Common Endpoints

databricks-meta-llama-3-1-70b-instruct - Recommended for most agents
databricks-meta-llama-3-1-405b-instruct - Complex reasoning
databricks-dbrx-instruct - Fast, simple routing

Quick Reference

Minimum viable agent:

from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_community.chat_models import ChatDatabricks
from langchain.tools import tool

@tool
def my_tool(query: str) -> str:
    """Clear description with examples"""
    return "result"

llm = ChatDatabricks(endpoint="databricks-meta-llama-3-1-70b-instruct")
agent = create_tool_calling_agent(llm, [my_tool], prompt)
executor = AgentExecutor(agent=agent, tools=[my_tool])
result = executor.invoke({"input": "user query"})

Related Skills

genie-integration: Integrate Genie rooms as agent tools
agent-mlops: Deploy and monitor agents in production

mosaic-ai-agent

Install Skill

SKILL.md

Mosaic AI Agent Builder

Core Concepts

What is a Tool-Calling Agent?

When to Use Agents vs. Direct LLM Calls

Problem-Solution Patterns

Problem 1: Agent Calls Wrong Tools

Problem 2: Agent Gets Stuck in Loops

Problem 3: Poor Multi-Tool Synthesis

Problem 4: Slow Agent Response Times

Agent Architecture Patterns

Pattern 1: Single-Domain Agent

Pattern 2: Multi-Domain Agent

Pattern 3: Specialized Sub-Agents

Foundation Model Selection

Model Comparison for Agents

Configuration Guidelines

Tool Design Best Practices

Principle 1: Single Responsibility

Principle 2: Clear Inputs/Outputs

Principle 3: Error Handling

Prompt Engineering for Agents

System Prompt Structure

Few-Shot Examples in Prompts

Testing & Iteration

Test Cases for Agent Validation

Debugging with Verbose Mode

Common Pitfalls

Pitfall 1: Over-Engineering

Pitfall 2: Under-Specifying Tools

Pitfall 3: Ignoring Latency

Pitfall 4: No Evaluation

Integration with Databricks

Using Databricks Foundation Models

Common Endpoints

Quick Reference

Related Skills