| name | ruby-integration |
| description | This skill is for writing integrations to the Ruby SDK. Claude acts as the engineer implementing LLM provider or agentic framework integrations. Use when adding support for OpenAI-like providers, Anthropic-like providers, or agent frameworks. Covers TDD workflow, comprehensive testing (streaming/non-streaming/tokens/multimodal), defensive coding, MCP validation, and StandardRB compliance. |
Writing Ruby SDK Integrations
This skill is for writing integrations. Claude acts as the Braintrust engineer implementing new integrations to the Ruby SDK.
Reference Integrations
Study existing integrations as examples:
- OpenAI:
lib/braintrust/trace/contrib/openai.rb(tests:test/braintrust/trace/openai_test.rb, example:examples/openai.rb) - Anthropic:
lib/braintrust/trace/contrib/anthropic.rb(tests:test/braintrust/trace/anthropic_test.rb, example:examples/anthropic.rb)
Important Notes:
- Examine the library thoroughly - Study the library's documentation and source code to identify ALL critical methods that call LLMs/AI services. Plan to trace every method that makes API calls, not just the obvious ones.
- Some integrations (e.g. ruby-llm) support multiple providers (e.g. OpenAI and Anthropic). Test all supported providers.
Core Pattern: Module Prepending
# frozen_string_literal: true
module Braintrust
module Trace
module YourProvider
def self.wrap(client = nil, tracer_provider: nil)
tracer_provider ||= ::OpenTelemetry.tracer_provider
# Idempotent wrapping: check if already wrapped
return client if client && client.instance_variable_get(:@braintrust_wrapped)
# Support class-level wrapping: wrap() with no args wraps class globally
if client.nil?
# Class wrapping: YourProvider.prepend(wrapper)
# Instance wrapping: client.singleton_class.prepend(wrapper)
end
wrapper = Module.new do
define_method(:your_api_method) do |**params|
tracer = tracer_provider.tracer("braintrust")
tracer.in_span("your_provider.operation") do |span|
# IMPORTANT: Start span FIRST (before metadata extraction) for accurate timing
# 1. Capture input
set_json_attr(span, "braintrust.input_json", extract_input(params))
# 2. Set metadata (provider, model, endpoint, all params)
set_json_attr(span, "braintrust.metadata", {
"provider" => "your_provider",
"endpoint" => "/v1/endpoint",
"model" => params[:model]
}.compact)
# 3. Call original
response = super(**params)
# 4. Capture output
set_json_attr(span, "braintrust.output_json", extract_output(response))
# 5. Capture metrics (normalized tokens)
set_json_attr(span, "braintrust.metrics", parse_usage_tokens(response.usage))
response
end
end
end
client.your_api.singleton_class.prepend(wrapper)
client.instance_variable_set(:@braintrust_wrapped, true) if client
client
end
## Code Organization
- Break large methods (>50 lines) into focused helpers
- Separate streaming/non-streaming into distinct handler methods (e.g., `handle_streaming_request`, `handle_non_streaming_request`)
- Extract metadata/input/output capture into helper methods (e.g., `extract_metadata`, `build_input_messages`, `capture_output`)
private
def self.set_json_attr(span, key, value)
span.set_attribute(key, JSON.generate(value)) if value
rescue => e
warn "Failed to serialize #{key}: #{e.message}"
end
def self.parse_usage_tokens(usage)
return {} unless usage
{
"prompt_tokens" => usage[:input_tokens] || usage[:prompt_tokens],
"completion_tokens" => usage[:output_tokens] || usage[:completion_tokens],
"tokens" => usage[:total_tokens]
}.compact
end
end
end
end
Streaming Pattern
define_method(:stream) do |**params|
tracer = tracer_provider.tracer("braintrust")
aggregated_chunks = []
span = tracer.start_span("your_provider.operation.stream")
set_json_attr(span, "braintrust.input_json", extract_input(params))
set_json_attr(span, "braintrust.metadata", extract_metadata(params))
stream = begin
super(**params)
rescue => e
span.record_exception(e)
span.status = ::OpenTelemetry::Trace::Status.error("Error: #{e.message}")
span.finish
raise
end
original_each = stream.method(:each)
stream.define_singleton_method(:each) do |&block|
original_each.call do |chunk|
aggregated_chunks << chunk
block&.call(chunk)
end
rescue => e
span.record_exception(e)
span.status = ::OpenTelemetry::Trace::Status.error("Streaming error: #{e.message}")
raise
ensure
# CRITICAL: Always finish span even if stream partially consumed
unless aggregated_chunks.empty?
aggregated = aggregate_chunks(aggregated_chunks)
set_json_attr(span, "braintrust.output_json", aggregated)
set_json_attr(span, "braintrust.metrics", parse_usage_tokens(aggregated[:usage]))
end
span.finish
end
stream
end
Examples
Write two examples:
- Customer example (
examples/your_provider.rb): Concise example demonstrating setup and basic usage - Internal example (
examples/internal/your_provider.rb): Comprehensive example using every library feature
Follow existing example patterns:
- Nest all API calls under a manual root span (see
examples/openai.rb):tracer = OpenTelemetry.tracer_provider.tracer("your-provider-example") root_span = nil response = tracer.in_span("examples/your_provider.rb") do |span| root_span = span client.your_api.call(...) # Automatically traced, nested under root_span end - Use consistent nomenclature for spans and projects
- Print permalink at end:
Braintrust::Trace.permalink(root_span)
Required Components
Do in this order:
- Appraisals FIRST: Add to
Appraisalsfile (latest + 2 recent + uninstalled), runbundle exec appraisal generate - Tests:
test/braintrust/trace/your_provider_test.rb - Integration:
lib/braintrust/trace/contrib/your_provider.rb - VCR cassettes:
test/fixtures/vcr_cassettes/your_provider/(record as you write tests) - Auto-load: Add to
lib/braintrust/trace.rbwithbegin/rescue LoadError - Example:
examples/your_provider.rb - Example:
examples/internal/your_provider.rb(comprehensive internal example) - Env var: Add to
.env.exampleif needed
Test Coverage (LLM Providers)
- ✅ Non-streaming requests (basic + attributes + metrics)
- ✅ Streaming requests (full consumption)
- ✅ Early stream termination (partial consumption)
- ✅ Error handling (exception recording)
- ✅ All critical features - Test ALL provider capabilities:
- Tool/function calling (if supported)
- Images/vision (if supported)
- System messages (if supported)
- Multiple messages/chat history (if supported)
- Any other provider-specific features
- ✅ Token usage edge cases (cached, reasoning tokens)
- ✅ Multiple APIs (if provider has multiple endpoints)
- ✅ Verify we don't change the behaviour of the integration
- ✅ LLM wrapper libraries - If tracing a library that wraps LLM providers (e.g., ruby_llm→OpenAI), verify traces match the underlying provider exactly (tools format, token format, output structure). Compare side-by-side with
BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG=1
Appraisal Configuration (Set up FIRST)
CRITICAL: Configure appraisal at the START, before writing tests. Test latest + 2 recent versions + uninstalled.
Step 1 - Add to Appraisals file:
# Appraisals file - ADD THIS FIRST
appraise "your_provider-latest" do
gem "your_provider", ">= 2.0"
end
appraise "your_provider-1.5" do
gem "your_provider", "~> 1.5.0"
end
appraise "your_provider-1.0" do
gem "your_provider", "~> 1.0.0"
end
appraise "your_provider-uninstalled" do
remove_gem "your_provider"
end
Step 2 - Generate gemfiles:
bundle exec appraisal generate
Step 3 - Use appraisal for ALL test runs:
bundle exec appraisal rake test # Run all scenarios (use this in TDD cycle)
Determine versions: Check release history, focus on API changes, include customer-likely versions.
Testing Tools & Validation
Use multiple testing approaches to validate your integration:
1. Unit Tests (Primary)
- Location:
test/braintrust/trace/your_provider_test.rb - Purpose: Test all code paths, edge cases, and error handling
- Run:
bundle exec appraisal rake test - Coverage: Track with
bundle exec rake coverage(>90% line, >80% branch)
2. Console Log Inspection
- Purpose: Quickly verify trace structure during development
- Usage:
BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG=true bundle exec ruby examples/your_provider.rb - Verify: Check span hierarchy, attributes, and parent/child relationships
3. Braintrust MCP Server (Integration Testing)
- Purpose: Query and inspect traces in the Braintrust platform
- Setup: Should be auto-configured in Docker environment
- Commands:
# List recent traces mcp__braintrust__list_recent_objects(object_type: "project_logs", limit: 10) # Inspect specific span mcp__braintrust__resolve_object(object_type: "project_logs", object_id: "span_id") # BTQL query mcp__braintrust__btql_query(query: "SELECT * FROM project_logs WHERE metadata.provider = 'your_provider'") - Verify attributes:
input,output,metadata,metrics,span_attributes.braintrust.parent,span_attributes.braintrust.org
4. Examples (Manual Testing)
- Customer example:
bundle exec ruby examples/your_provider.rb - Internal example:
bundle exec ruby examples/internal/your_provider.rb - Purpose: End-to-end validation of real API calls
Testing Workflow
- TDD cycle: Write unit test → implement → run
bundle exec appraisal rake test - Console log: Use
BRAINTRUST_ENABLE_TRACE_CONSOLE_LOG=trueto debug span structure - MCP validation: Query traces with Braintrust MCP server
- Examples: Run examples to verify end-to-end behavior
TDD Workflow (CRITICAL)
After EVERY major change: test → lint → fix → commit cycle
- Create todo list at start
- Write one failing test
- Implement minimal code to pass
- Run tests with appraisal:
bundle exec appraisal rake test - Lint:
bundle exec rake lint(fix withrake lint:fix) - Verify with MCP tools
- Refactor if needed
- Repeat cycle for: basic → attributes → streaming → errors → tokens → multimodal
Defensive Coding
- ✅ Nil checks (
return {} unless usage) - ✅ Safe navigation (
params[:model] || "unknown") - ✅ Compact hashes (
.compact) - ✅ Error handling (
begin/rescue/ensure) - ✅ JSON safety (rescue in
set_json_attr) - ✅ Graceful gem loading (
rescue LoadError)
StandardRB & CI
Lint after every change (part of TDD cycle):
bundle exec rake lint # Check StandardRB
bundle exec rake lint:fix # Auto-fix
Coverage target (check periodically):
bundle exec rake coverage # >90% line, >80% branch
CI requirements: StandardRB + tests on Ruby 3.2/3.3/3.4 + Ubuntu/macOS + all appraisal scenarios
Token Normalization
Use shared TokenParser.parse_usage_tokens(usage) in lib/braintrust/trace/token_parser.rb to normalize tokens:
prompt_tokens(input)completion_tokens(output)tokens(total, includes cache_creation_tokens)prompt_cached_tokens(if cached)prompt_cache_creation_tokens(if cache created)completion_reasoning_tokens(if reasoning)
VCR Cassettes
VCR_MODE=all bundle exec rake test # Re-record all
VCR_MODE=new_episodes bundle exec rake test # Record new only
VCR_OFF=true bundle exec rake test # Skip VCR
Reference Files
- Integrations:
lib/braintrust/trace/contrib/{openai,anthropic}.rb - Tests:
test/braintrust/trace/{openai,anthropic}_test.rb - Test helpers:
test/test_helper.rb - Examples:
examples/{openai,anthropic}.rb - Config:
Rakefile,Appraisals,.github/workflows/ci.yml