Claude Code Plugins

Community-maintained marketplace

Feedback

vm-infrastructure-ops

@terrylica/gapless-network-data
0
0

Troubleshoot and manage GCP e2-micro VM running eth-realtime-collector. Use when VM service is down, systemd failures occur, real-time data stream stops, or VM network issues arise. Keywords systemd, journalctl, eth-collector, gcloud compute.

Install Skill

1Download skill
2Enable skills in Claude

Open claude.ai/settings/capabilities and find the "Skills" section

3Upload to Claude

Click "Upload skill" and select the downloaded ZIP file

Note: Please verify skill by going through its instructions before using it.

SKILL.md

name vm-infrastructure-ops
description Troubleshoot and manage GCP e2-micro VM running eth-realtime-collector. Use when VM service is down, systemd failures occur, real-time data stream stops, or VM network issues arise. Keywords systemd, journalctl, eth-collector, gcloud compute.

VM Infrastructure Operations

Version: 1.0.0 Last Updated: 2025-11-13 Purpose: Troubleshoot and manage GCP e2-micro VM running eth-realtime-collector

When to Use

Use this skill when:

  • VM service down, "eth-collector" systemd failures
  • Real-time data stream stopped (ClickHouse not receiving blocks)
  • VM network issues, DNS resolution failures
  • Need to check service status, view logs, or restart services
  • Keywords: systemd, journalctl, eth-collector, gcloud compute

Prerequisites

  • GCP project access: eonlabs-ethereum-bq
  • VM instance: eth-realtime-collector in zone us-east1-b
  • gcloud CLI configured with appropriate credentials

Workflows

1. Check Service Status

Check if eth-collector systemd service is running:

gcloud compute ssh eth-realtime-collector --zone=us-east1-b \
  --command='sudo systemctl status eth-collector'

Expected Output (healthy):

● eth-collector.service - Ethereum Real-Time Collector
   Loaded: loaded (/etc/systemd/system/eth-collector.service; enabled)
   Active: active (running) since ...

Alternative (use provided script):

.claude/skills/vm-infrastructure-ops/scripts/check_vm_status.sh

2. View Logs (Live Tail)

Stream real-time logs from the collector service:

gcloud compute ssh eth-realtime-collector --zone=us-east1-b \
  --command='sudo journalctl -u eth-collector -f'

What to Look For:

  • "Block inserted" messages every ~12 seconds (healthy)
  • gRPC errors, DNS resolution failures (unhealthy)
  • "Connection refused" or "Metadata server unreachable" (network issues)

3. View Recent Logs (Last 100 Lines)

gcloud compute ssh eth-realtime-collector --zone=us-east1-b \
  --command='sudo journalctl -u eth-collector -n 100'

4. Restart Service

Restart the collector service after configuration changes or to recover from errors:

gcloud compute ssh eth-realtime-collector --zone=us-east1-b \
  --command='sudo systemctl restart eth-collector'

Alternative (use provided script with pre-checks):

.claude/skills/vm-infrastructure-ops/scripts/restart_collector.sh

When to Use:

  • After deploying code updates
  • Recovering from gRPC metadata validation errors
  • After Secret Manager credential updates

5. VM Hard Reset

Hard reset the VM instance (use as last resort):

gcloud compute instances reset eth-realtime-collector --zone=us-east1-b

When to Use:

  • VM network connectivity completely lost
  • DNS resolution failures
  • Metadata server unreachable
  • Service restart doesn't resolve issues

Warning: This forcefully restarts the VM. All in-memory state is lost.

6. Verify Data Flow

After restarting services, verify data is flowing to ClickHouse:

cd 
doppler run --project aws-credentials --config prd -- python3 -c "
import clickhouse_connect
import os
client = clickhouse_connect.get_client(
    host=os.environ['CLICKHOUSE_HOST'],
    port=8443,
    username='default',
    password=os.environ['CLICKHOUSE_PASSWORD'],
    secure=True
)
result = client.query('SELECT MAX(timestamp), MAX(number) FROM ethereum_mainnet.blocks FINAL')
print(f'Latest block: {result.result_rows[0][1]:,} at {result.result_rows[0][0]}')
"

Expected Output (healthy):

Latest block: 23,800,000+ at <within last 60 seconds>

Common Failure Modes

See VM Failure Modes for detailed troubleshooting guide.

Quick Reference:

Symptom Likely Cause Solution
Service status: failed gRPC metadata error Check logs, restart with .strip() fix
No blocks for >5 minutes Network connectivity Check network, reset VM if needed
DNS resolution errors Metadata server unreachable VM hard reset
"Connection refused" Service not running Restart service

Systemd Commands

See Systemd Commands Reference for complete systemd operations.

Quick Reference:

# Status
sudo systemctl status eth-collector

# Start
sudo systemctl start eth-collector

# Stop
sudo systemctl stop eth-collector

# Restart
sudo systemctl restart eth-collector

# Enable (start on boot)
sudo systemctl enable eth-collector

# Disable (don't start on boot)
sudo systemctl disable eth-collector

# View service logs
sudo journalctl -u eth-collector

# Follow logs live
sudo journalctl -u eth-collector -f

Operational History

Infrastructure Recovery (2025-11-10 07:00 UTC):

  • VM network failure detected (DNS resolution failed, metadata server unreachable)
  • Recovery: VM reset restored network connectivity
  • eth-collector service restarted with .strip() fix (gRPC metadata validation resolved)
  • Real-time data flow confirmed: blocks streaming every ~12 seconds
  • Database verified: 23.8M blocks (2015-2025), latest block within seconds

Maintainability SLO Achievement: Critical infrastructure failure (VM network down) resolved in <30 minutes (VM reset + service restart + verification).

Related Documentation

Scripts

References