| name | atft-autonomy |
| description | Coordinate Claude Code skills with OpenAI Codex autonomous workflows for end-to-end ATFT-GAT-FAN maintenance. |
| proactive | true |
ATFT Autonomy Skill
Mission
- Seamlessly orchestrate Claude and Codex agents to keep the ATFT-GAT-FAN stack production-ready.
- Decide when to run deep Codex analyses versus fast Claude remediations.
- Maintain shared context (AGENTS.md, health snapshots, run logs) across agents.
Engagement Signals
- Requests to “run full autonomous maintenance”, “coordinate Claude and Codex”, or “schedule daily self-healing”.
- Situations where an issue spans multiple domains (dataset + training + quality).
- Need to generate or refresh
.mcp.json,AGENTS.md, or other shared configs.
Preflight Checklist
- Confirm tool availability:
command -v codex(expect npm package@openai/codex).command -v ./tools/claude-code.sh.
- Verify configuration files:
.mcp.jsonexists with filesystem/git servers (tools/codex.shrecreates if missing).AGENTS.mdup to date with latest guidelines.
- Run snapshot diagnostics:
tools/project-health-check.sh --summary(logs to_logs/health-checks/).nvidia-smianddf -hto ensure resource headroom before long autonomous jobs.
Core Playbooks
1. Full Daily Autonomy Loop
./tools/claude-code.sh --no-check "Run proactive maintenance checklist"— quick fixes + TODO updates../tools/codex.sh --max --exec "Perform deep optimization sweep across dataset, training, and research modules"— longer Codex session with MCP.- Append combined findings to
docs/ops/autonomy_log.md(capture commands run, deltas, next steps). git status --short→ review diffs, ensure no unintended drift.
2. Incident Response (multi-domain failure)
- Launch Claude for rapid triage:
./tools/claude-code.sh "Investigate failed training and prepare briefing for Codex". - Pass Claude findings to Codex:
./tools/codex.sh --exec "Read docs/ops/incident_brief.md and propose remedial plan". - Execute agreed actions (dataset rebuild, training rerun, quality checks).
- Update
docs/ops/incident_log.mdwith both agents’ actions and resolutions.
3. Scheduled Autonomous Optimization
- Add cron entry (example):
0 2 * * * cd /workspace/gogooku3 && ./tools/codex.sh --max >> _logs/daily-optimization.log 2>&1. - Pair with weekly Claude sweep:
0 7 * * MON cd /workspace/gogooku3 && ./tools/claude-code.sh --no-check >> _logs/weekly-claude.log 2>&1. - Summarize improvements weekly by aggregating
_logs/codex-sessions/*.logand_logs/claude-code/*.log; record highlights indocs/ops/weekly_autonomy_report.md.
Shared Context Management
- Ensure both agents read
CLAUDE.mdandAGENTS.mdbefore editing high-risk files. - Rotate
docs/SKILLS_GUIDE.mdandclaude/skills/when procedures change. - Store cross-agent decisions in
EXPERIMENT_STATUS.mdordocs/ops/autonomy_log.md.
Failure Handling
- Codex CLI missing →
npm install -g @openai/codex. - MCP server errors → regenerate
.mcp.jsonvia./tools/codex.shand restart run. - Conflicting edits → consolidate drafts under
docs/ops/autonomy_pending/and resolve manually. - Resource saturation → stagger Claude/Codex runs, cap concurrency in cron.
Handoff
- After combined sessions, notify stakeholders by updating
docs/ops/weekly_autonomy_report.md. - Attach session logs
_logs/codex-sessions/*.logand_logs/claude-code/*.logfor audit. - Re-sync
~/.claude/skills/so Claude picks up any procedural updates discovered during Codex runs.