name	vm-infrastructure-ops
description	Troubleshoot and manage GCP e2-micro VM running eth-realtime-collector. Use when VM service is down, systemd failures occur, real-time data stream stops, or VM network issues arise. Keywords systemd, journalctl, eth-collector, gcloud compute.

VM Infrastructure Operations

Version: 1.0.0 Last Updated: 2025-11-13 Purpose: Troubleshoot and manage GCP e2-micro VM running eth-realtime-collector

When to Use

Use this skill when:

VM service down, "eth-collector" systemd failures
Real-time data stream stopped (ClickHouse not receiving blocks)
VM network issues, DNS resolution failures
Need to check service status, view logs, or restart services
Keywords: systemd, journalctl, eth-collector, gcloud compute

Prerequisites

GCP project access: eonlabs-ethereum-bq
VM instance: eth-realtime-collector in zone us-east1-b
gcloud CLI configured with appropriate credentials

Workflows

1. Check Service Status

Check if eth-collector systemd service is running:

gcloud compute ssh eth-realtime-collector --zone=us-east1-b \
  --command='sudo systemctl status eth-collector'

Expected Output (healthy):

● eth-collector.service - Ethereum Real-Time Collector
   Loaded: loaded (/etc/systemd/system/eth-collector.service; enabled)
   Active: active (running) since ...

Alternative (use provided script):

.claude/skills/vm-infrastructure-ops/scripts/check_vm_status.sh

2. View Logs (Live Tail)

Stream real-time logs from the collector service:

gcloud compute ssh eth-realtime-collector --zone=us-east1-b \
  --command='sudo journalctl -u eth-collector -f'

What to Look For:

"Block inserted" messages every ~12 seconds (healthy)
gRPC errors, DNS resolution failures (unhealthy)
"Connection refused" or "Metadata server unreachable" (network issues)

3. View Recent Logs (Last 100 Lines)

gcloud compute ssh eth-realtime-collector --zone=us-east1-b \
  --command='sudo journalctl -u eth-collector -n 100'

4. Restart Service

Restart the collector service after configuration changes or to recover from errors:

gcloud compute ssh eth-realtime-collector --zone=us-east1-b \
  --command='sudo systemctl restart eth-collector'

Alternative (use provided script with pre-checks):

.claude/skills/vm-infrastructure-ops/scripts/restart_collector.sh

When to Use:

After deploying code updates
Recovering from gRPC metadata validation errors
After Secret Manager credential updates

5. VM Hard Reset

Hard reset the VM instance (use as last resort):

gcloud compute instances reset eth-realtime-collector --zone=us-east1-b

When to Use:

VM network connectivity completely lost
DNS resolution failures
Metadata server unreachable
Service restart doesn't resolve issues

Warning: This forcefully restarts the VM. All in-memory state is lost.

6. Verify Data Flow

After restarting services, verify data is flowing to ClickHouse:

cd 
doppler run --project aws-credentials --config prd -- python3 -c "
import clickhouse_connect
import os
client = clickhouse_connect.get_client(
    host=os.environ['CLICKHOUSE_HOST'],
    port=8443,
    username='default',
    password=os.environ['CLICKHOUSE_PASSWORD'],
    secure=True
)
result = client.query('SELECT MAX(timestamp), MAX(number) FROM ethereum_mainnet.blocks FINAL')
print(f'Latest block: {result.result_rows[0][1]:,} at {result.result_rows[0][0]}')
"

Expected Output (healthy):

Latest block: 23,800,000+ at <within last 60 seconds>

Common Failure Modes

See VM Failure Modes for detailed troubleshooting guide.

Quick Reference:

Symptom	Likely Cause	Solution
Service status: `failed`	gRPC metadata error	Check logs, restart with `.strip()` fix
No blocks for >5 minutes	Network connectivity	Check network, reset VM if needed
DNS resolution errors	Metadata server unreachable	VM hard reset
"Connection refused"	Service not running	Restart service

Systemd Commands

See Systemd Commands Reference for complete systemd operations.

Quick Reference:

# Status
sudo systemctl status eth-collector

# Start
sudo systemctl start eth-collector

# Stop
sudo systemctl stop eth-collector

# Restart
sudo systemctl restart eth-collector

# Enable (start on boot)
sudo systemctl enable eth-collector

# Disable (don't start on boot)
sudo systemctl disable eth-collector

# View service logs
sudo journalctl -u eth-collector

# Follow logs live
sudo journalctl -u eth-collector -f

Operational History

Infrastructure Recovery (2025-11-10 07:00 UTC):

VM network failure detected (DNS resolution failed, metadata server unreachable)
Recovery: VM reset restored network connectivity
eth-collector service restarted with .strip() fix (gRPC metadata validation resolved)
Real-time data flow confirmed: blocks streaming every ~12 seconds
Database verified: 23.8M blocks (2015-2025), latest block within seconds

Maintainability SLO Achievement: Critical infrastructure failure (VM network down) resolved in <30 minutes (VM reset + service restart + verification).

Scripts

`check_vm_status.sh` - Automated status check via gcloud
`restart_collector.sh` - Safe restart with pre-checks

References

`vm-failure-modes.md` - Common failure scenarios and solutions
`systemd-commands.md` - Complete systemd operations reference

vm-infrastructure-ops

Install Skill

SKILL.md