| description | Build and deploy production-ready generative AI agents using Vertex AI, Gemini models, and Google Cloud infrastructure with RAG, function calling, and multi-modal capabilities |
| name | vertex-agent-builder |
| allowed-tools | Read, Write, Edit, Grep, Bash |
| license | MIT |
| author | Jeremy Longshore <jeremy@intentsolutions.io> |
| version | 1.0.0 |
Vertex AI Agent Builder Skill
Overview
This skill provides production-ready scaffolding and deployment for generative AI agents using Google Cloud's Vertex AI platform. Built on actual Google Cloud source code from GoogleCloudPlatform/generative-ai and agent-starter-pack repositories, it offers:
- Gemini Model Integration (1.5 Pro, 1.5 Flash, experimental models)
- Multi-modal Capabilities (text, image, video, audio processing)
- RAG Implementation (Retrieval Augmented Generation with Vector Search)
- Function Calling (Tool use with Gemini models)
- Production Deployment (Cloud Run, Vertex AI Endpoints)
- Evaluation Framework (AutoSxS, ROUGE, custom metrics)
- Observability (Cloud Logging, Monitoring, Trace)
Installation
# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
gcloud init
# Install Python SDK
pip install google-cloud-aiplatform>=1.38.0
pip install vertexai>=1.46.0
pip install google-generativeai>=0.3.2
# Clone source repositories
git clone https://github.com/GoogleCloudPlatform/generative-ai.git
git clone https://github.com/GoogleCloudPlatform/agent-starter-pack.git
git clone https://github.com/GoogleCloudPlatform/vertex-ai-samples.git
# Install this plugin
/plugin install jeremy-vertex-ai@jeremylongshore
Quick Start (5 Minutes)
Create Your First Vertex AI Agent
import vertexai
from vertexai.generative_models import GenerativeModel, ChatSession
from vertexai.preview.generative_models import grounding
# Initialize Vertex AI
vertexai.init(project="your-project-id", location="us-central1")
# Create Gemini model
model = GenerativeModel(
"gemini-1.5-pro-002",
system_instruction="""You are a helpful AI assistant that can:
- Search the web for information
- Analyze documents and images
- Execute Python code
- Call external APIs
"""
)
# Start chat session
chat = model.start_chat()
# Send message with grounding
response = chat.send_message(
"What are the latest developments in quantum computing?",
generation_config={
"temperature": 0.3,
"max_output_tokens": 2048,
"top_p": 0.95,
},
safety_settings={
"HARM_CATEGORY_HATE_SPEECH": "BLOCK_MEDIUM_AND_ABOVE",
"HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_MEDIUM_AND_ABOVE",
},
tools=[grounding.Tool(google_search_retrieval=grounding.GoogleSearchRetrieval())]
)
print(response.text)
Agent Patterns
1. Production Agent with Agent Builder
Based on GoogleCloudPlatform/agent-starter-pack:
from vertexai.preview import agents
from vertexai.preview.generative_models import Tool, FunctionDeclaration
import functions_framework
class ProductionAgent:
"""
Production-ready agent using Vertex AI Agent Builder
"""
def __init__(self, project_id: str, location: str = "us-central1"):
self.project_id = project_id
self.location = location
# Initialize agent
self.agent = agents.Agent.create(
project=project_id,
location=location,
display_name="production-agent",
description="Production-ready agent with tools and RAG",
model="gemini-1.5-pro-002",
tools=self._create_tools(),
system_instruction=self._get_system_instruction()
)
def _create_tools(self):
"""Create function calling tools"""
return [
Tool(function_declarations=[
FunctionDeclaration(
name="search_knowledge_base",
description="Search internal knowledge base",
parameters={
"type": "object",
"properties": {
"query": {"type": "string"},
"top_k": {"type": "integer", "default": 5}
}
}
),
FunctionDeclaration(
name="execute_sql",
description="Execute SQL query on BigQuery",
parameters={
"type": "object",
"properties": {
"query": {"type": "string"},
"dataset": {"type": "string"}
}
}
),
FunctionDeclaration(
name="send_email",
description="Send email notification",
parameters={
"type": "object",
"properties": {
"to": {"type": "string"},
"subject": {"type": "string"},
"body": {"type": "string"}
}
}
)
])
]
def _get_system_instruction(self):
"""Get system instruction for agent"""
return """You are a production AI agent that helps users with:
1. Information retrieval from knowledge bases
2. Data analysis using BigQuery
3. Automated communications
Always verify user intent before executing critical operations.
Provide clear explanations of what you're doing and why.
"""
async def process_request(self, user_input: str):
"""Process user request"""
session = self.agent.start_session()
response = await session.send_message(user_input)
# Handle function calls
if response.function_calls:
for func_call in response.function_calls:
result = await self._execute_function(
func_call.name,
func_call.args
)
# Send function result back to model
response = await session.send_message(
content=None,
function_responses=[{
"name": func_call.name,
"response": result
}]
)
return response.text
async def _execute_function(self, name: str, args: dict):
"""Execute function call"""
if name == "search_knowledge_base":
return await self._search_kb(args["query"], args.get("top_k", 5))
elif name == "execute_sql":
return await self._execute_bigquery(args["query"], args["dataset"])
elif name == "send_email":
return await self._send_email(args["to"], args["subject"], args["body"])
# Deploy as Cloud Function
@functions_framework.http
def agent_endpoint(request):
"""HTTP Cloud Function for agent"""
agent = ProductionAgent(
project_id="your-project",
location="us-central1"
)
request_json = request.get_json()
response = agent.process_request(request_json["message"])
return {"response": response}
2. RAG-Enhanced Agent
Based on GoogleCloudPlatform/generative-ai/gemini/use-cases/retrieval-augmented-generation:
from vertexai.language_models import TextEmbeddingModel
from vertexai.preview import rag
from google.cloud import aiplatform
class RAGAgent:
"""
Agent with Retrieval Augmented Generation using Vertex AI Search
"""
def __init__(self, project_id: str, corpus_name: str):
self.project_id = project_id
# Create RAG corpus
self.corpus = rag.create_corpus(
display_name=corpus_name,
description="Knowledge base for RAG"
)
# Initialize embedding model
self.embedding_model = TextEmbeddingModel.from_pretrained(
"text-embedding-004"
)
# Initialize Gemini model
self.model = GenerativeModel("gemini-1.5-pro-002")
async def import_documents(self, gcs_uris: list):
"""Import documents into RAG corpus"""
import_job = rag.import_files(
corpus_name=self.corpus.name,
gcs_uris=gcs_uris,
chunk_size=512,
chunk_overlap=100
)
# Wait for import to complete
import_job.wait()
return import_job
async def query_with_rag(self, query: str, similarity_top_k: int = 5):
"""Query with RAG retrieval"""
# Retrieve relevant chunks
retrieval_response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=self.corpus.name,
similarity_top_k=similarity_top_k
)
],
query=query
)
# Build context from retrieved chunks
context = "\n\n".join([
f"Source: {chunk.source}\nContent: {chunk.text}"
for chunk in retrieval_response.contexts
])
# Generate response with context
prompt = f"""Based on the following context, answer the question.
Context:
{context}
Question: {query}
Answer:"""
response = self.model.generate_content(
prompt,
generation_config={
"temperature": 0.2,
"max_output_tokens": 1024,
}
)
return {
"answer": response.text,
"sources": [chunk.source for chunk in retrieval_response.contexts],
"confidence": retrieval_response.attribution_score
}
3. Multi-Modal Agent
Based on GoogleCloudPlatform/generative-ai/gemini/multimodality:
from vertexai.generative_models import GenerativeModel, Part
import vertexai.preview.generative_models as generative_models
class MultiModalAgent:
"""
Agent that processes text, images, video, and audio
"""
def __init__(self):
self.model = GenerativeModel("gemini-1.5-flash-002")
async def analyze_image(self, image_path: str, prompt: str):
"""Analyze image with text prompt"""
image = Part.from_uri(
uri=f"gs://your-bucket/{image_path}",
mime_type="image/jpeg"
)
response = self.model.generate_content([prompt, image])
return response.text
async def analyze_video(self, video_path: str, prompt: str):
"""Analyze video content"""
video = Part.from_uri(
uri=f"gs://your-bucket/{video_path}",
mime_type="video/mp4"
)
response = self.model.generate_content(
[prompt, video],
generation_config={
"temperature": 0.4,
"max_output_tokens": 2048,
}
)
return response.text
async def analyze_document(self, pdf_path: str):
"""Extract and analyze PDF document"""
document = Part.from_uri(
uri=f"gs://your-bucket/{pdf_path}",
mime_type="application/pdf"
)
prompt = """Extract all key information from this document:
1. Main topics covered
2. Key findings or conclusions
3. Important data or statistics
4. Actionable recommendations
"""
response = self.model.generate_content([prompt, document])
return response.text
async def generate_code(self, description: str, language: str = "python"):
"""Generate code from natural language"""
prompt = f"""Generate {language} code for the following requirement:
{description}
Requirements:
- Include proper error handling
- Add comprehensive comments
- Follow best practices for {language}
- Make it production-ready
"""
response = self.model.generate_content(
prompt,
generation_config={
"temperature": 0.2,
"max_output_tokens": 4096,
}
)
return response.text
Deployment Patterns
1. Cloud Run Deployment
# cloudbuild.yaml
steps:
# Build container
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/$PROJECT_ID/vertex-agent:$COMMIT_SHA', '.']
# Push to Container Registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/$PROJECT_ID/vertex-agent:$COMMIT_SHA']
# Deploy to Cloud Run
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: gcloud
args:
- 'run'
- 'deploy'
- 'vertex-agent'
- '--image'
- 'gcr.io/$PROJECT_ID/vertex-agent:$COMMIT_SHA'
- '--region'
- 'us-central1'
- '--platform'
- 'managed'
- '--allow-unauthenticated'
- '--set-env-vars'
- 'GCP_PROJECT=$PROJECT_ID'
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Run as non-root user
RUN useradd -m -u 1000 agent && chown -R agent:agent /app
USER agent
EXPOSE 8080
CMD ["gunicorn", "--bind", ":8080", "--workers", "1", "--threads", "8", "--timeout", "0", "main:app"]
2. Vertex AI Endpoint Deployment
from google.cloud import aiplatform
def deploy_to_vertex_endpoint():
"""Deploy model to Vertex AI Endpoint"""
aiplatform.init(project="your-project", location="us-central1")
# Upload model
model = aiplatform.Model.upload(
display_name="vertex-agent-model",
artifact_uri="gs://your-bucket/model",
serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest"
)
# Create endpoint
endpoint = aiplatform.Endpoint.create(
display_name="vertex-agent-endpoint",
)
# Deploy model to endpoint
endpoint.deploy(
model=model,
deployed_model_display_name="vertex-agent-v1",
machine_type="n1-standard-4",
min_replica_count=1,
max_replica_count=5,
accelerator_type="NVIDIA_TESLA_T4",
accelerator_count=1,
)
return endpoint
Evaluation Framework
Based on GoogleCloudPlatform/generative-ai/gemini/evaluation:
from vertexai.preview.evaluation import EvalTask, MetricPromptTemplate
import pandas as pd
class AgentEvaluator:
"""
Evaluate agent performance using Vertex AI evaluation framework
"""
def __init__(self):
self.eval_dataset = self._load_eval_dataset()
def _load_eval_dataset(self):
"""Load evaluation dataset"""
return pd.DataFrame([
{
"prompt": "What is the capital of France?",
"reference": "The capital of France is Paris.",
"context": "France is a country in Western Europe."
},
# Add more test cases
])
async def evaluate_accuracy(self, agent):
"""Evaluate agent accuracy"""
eval_task = EvalTask(
dataset=self.eval_dataset,
metrics=[
"rouge_l_sum",
"bleu",
"exact_match",
MetricPromptTemplate(
metric="custom_factuality",
metric_prompt_template="""Rate the factual accuracy of the response.
Reference: {reference}
Response: {response}
Score (0-1):"""
)
]
)
# Run evaluation
results = []
for _, row in self.eval_dataset.iterrows():
response = await agent.process_request(row["prompt"])
results.append({
"prompt": row["prompt"],
"reference": row["reference"],
"response": response
})
eval_result = eval_task.evaluate(
model=agent.model,
response_column="response"
)
return eval_result.summary_metrics
Monitoring & Observability
from google.cloud import monitoring_v3
from google.cloud import logging
import time
class AgentMonitor:
"""
Monitor agent performance and usage
"""
def __init__(self, project_id: str):
self.project_id = project_id
self.monitoring_client = monitoring_v3.MetricServiceClient()
self.logging_client = logging.Client()
self.logger = self.logging_client.logger("vertex-agent")
def log_request(self, request_id: str, user_input: str, response: str, latency: float):
"""Log agent request"""
self.logger.log_struct({
"request_id": request_id,
"timestamp": time.time(),
"user_input": user_input,
"response": response[:500], # Truncate long responses
"latency_ms": latency * 1000,
"model": "gemini-1.5-pro-002",
"success": True
})
def create_custom_metric(self, metric_name: str, value: float):
"""Create custom metric in Cloud Monitoring"""
project_name = f"projects/{self.project_id}"
series = monitoring_v3.TimeSeries()
series.metric.type = f"custom.googleapis.com/agent/{metric_name}"
now = time.time()
seconds = int(now)
nanos = int((now - seconds) * 10 ** 9)
interval = monitoring_v3.TimeInterval(
{"end_time": {"seconds": seconds, "nanos": nanos}}
)
point = monitoring_v3.Point({
"interval": interval,
"value": {"double_value": value}
})
series.points = [point]
self.monitoring_client.create_time_series(
name=project_name,
time_series=[series]
)
def create_dashboard(self):
"""Create monitoring dashboard"""
# Dashboard configuration
dashboard_config = {
"displayName": "Vertex Agent Dashboard",
"widgets": [
{
"title": "Request Latency",
"xyChart": {
"dataSets": [{
"timeSeriesQuery": {
"timeSeriesFilter": {
"filter": 'metric.type="custom.googleapis.com/agent/latency"'
}
}
}]
}
},
{
"title": "Error Rate",
"xyChart": {
"dataSets": [{
"timeSeriesQuery": {
"timeSeriesFilter": {
"filter": 'metric.type="custom.googleapis.com/agent/errors"'
}
}
}]
}
}
]
}
return dashboard_config
Cost Optimization
class CostOptimizedAgent:
"""
Cost-optimized agent configuration
"""
def __init__(self):
# Model selection based on task complexity
self.models = {
"simple": "gemini-1.5-flash-002", # Fastest and cheapest
"standard": "gemini-1.5-pro-002", # Balanced
"complex": "gemini-1.5-pro-002", # Most capable
}
# Caching for repeated queries
self.response_cache = {}
def select_model(self, query: str) -> str:
"""Select appropriate model based on query complexity"""
# Simple heuristic based on query length and keywords
complexity_score = len(query) / 100
if "analyze" in query or "explain" in query:
complexity_score += 0.3
if "code" in query or "implement" in query:
complexity_score += 0.4
if complexity_score < 0.3:
return self.models["simple"]
elif complexity_score < 0.7:
return self.models["standard"]
else:
return self.models["complex"]
async def process_with_caching(self, query: str):
"""Process query with response caching"""
# Check cache first
cache_key = hashlib.md5(query.encode()).hexdigest()
if cache_key in self.response_cache:
return self.response_cache[cache_key]
# Select optimal model
model_name = self.select_model(query)
model = GenerativeModel(model_name)
# Process query
response = model.generate_content(
query,
generation_config={
"temperature": 0.1, # Lower temperature for consistency
"max_output_tokens": 1024, # Limit output size
}
)
# Cache response
self.response_cache[cache_key] = response.text
return response.text
Integration with Google Cloud Services
BigQuery Integration
from google.cloud import bigquery
class BigQueryAgent:
"""Agent with BigQuery data access"""
def __init__(self, project_id: str):
self.bq_client = bigquery.Client(project=project_id)
self.model = GenerativeModel("gemini-1.5-pro-002")
async def nl2sql(self, natural_language_query: str, dataset: str):
"""Convert natural language to SQL"""
# Get table schemas
tables = self.bq_client.list_tables(dataset)
schema_info = []
for table in tables:
table_ref = self.bq_client.get_table(table.reference)
schema_info.append(f"Table: {table.table_id}\nSchema: {table_ref.schema}")
prompt = f"""Convert this natural language query to SQL:
Query: {natural_language_query}
Available tables and schemas:
{chr(10).join(schema_info)}
Return only the SQL query, no explanation.
"""
response = self.model.generate_content(prompt)
sql_query = response.text.strip()
# Execute query
query_job = self.bq_client.query(sql_query)
results = query_job.result()
return {
"sql": sql_query,
"results": [dict(row) for row in results],
"total_rows": results.total_rows
}
Cloud Storage Integration
from google.cloud import storage
class StorageAgent:
"""Agent with Cloud Storage access"""
def __init__(self, bucket_name: str):
self.storage_client = storage.Client()
self.bucket = self.storage_client.bucket(bucket_name)
async def process_documents(self, prefix: str):
"""Process all documents in a GCS prefix"""
blobs = self.bucket.list_blobs(prefix=prefix)
results = []
for blob in blobs:
if blob.name.endswith(('.pdf', '.txt', '.docx')):
# Download and process
content = blob.download_as_text()
analysis = await self.analyze_document(content)
results.append({
"file": blob.name,
"analysis": analysis
})
return results
Best Practices
1. Security
from google.cloud import secretmanager
class SecureAgent:
"""Agent with security best practices"""
def __init__(self, project_id: str):
self.secret_client = secretmanager.SecretManagerServiceClient()
self.project_id = project_id
def get_secret(self, secret_id: str) -> str:
"""Get secret from Secret Manager"""
name = f"projects/{self.project_id}/secrets/{secret_id}/versions/latest"
response = self.secret_client.access_secret_version(request={"name": name})
return response.payload.data.decode("UTF-8")
def sanitize_input(self, user_input: str) -> str:
"""Sanitize user input"""
# Remove potential injection attempts
sanitized = user_input.replace("```", "")
sanitized = sanitized.replace("<script>", "")
# Add more sanitization as needed
return sanitized
2. Error Handling
from tenacity import retry, stop_after_attempt, wait_exponential
import logging
class ResilientAgent:
"""Agent with robust error handling"""
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=4, max=60)
)
async def process_with_retry(self, request):
"""Process with automatic retry"""
try:
response = await self._process(request)
return response
except Exception as e:
logging.error(f"Processing failed: {e}")
raise
Examples Repository
Complete working examples from Google Cloud repositories:
- Text Generation -
/gemini/function-calling/ - RAG Implementation -
/gemini/use-cases/retrieval-augmented-generation/ - Multi-modal Processing -
/gemini/multimodality/ - Agent Builder -
/agent-builder/ - Production Templates - From
agent-starter-pack
Resources
- Vertex AI Documentation
- Gemini API Reference
- Agent Builder Guide
- GitHub: GoogleCloudPlatform/generative-ai
- GitHub: GoogleCloudPlatform/agent-starter-pack
Version: 1.0.0 Last Updated: October 2025 Author: Jeremy Longshore Based on: Official Google Cloud repositories License: MIT