| name | long-task-harness |
| description | Maintains continuity across long-running tasks that span multiple agent sessions. Use when starting or resuming a complex project that spans multiple sessions, or for tasks with many discrete features requiring iterative development. |
Long Task Harness
Structured workflows for maintaining continuity across agent sessions. Addresses the "shift change" problem where context is lost between sessions.
First-Time Setup
On first invocation, check if .long-task-harness/long-task-progress.md exists in the project.
If it doesn't exist: Initialize
python3 <SKILL_PATH>/scripts/init_harness.py
This creates a .long-task-harness/ directory containing:
long-task-progress.md- Session history and notesfeatures.json- Feature tracking with pass/fail statusinit.sh- Environment setup script (optional)
Then: Ensure persistent invocation
Check if AGENTS.md (for Codex/Droid/Cursor/Pi) or CLAUDE.md (for Claude Code) contains the harness snippet.
If not, prompt the user:
"I notice this project doesn't have long-task-harness configured for automatic invocation. Would you like me to add the following to [AGENTS.md / CLAUDE.md]?
## Multi-Session Development This project uses long-task-harness for session continuity. At session start or after context reset, invoke the skill at: <SKILL_PATH>Where would you like me to add this?"
Use the actual path where you loaded this skill from for <SKILL_PATH>.
Optional: Claude Code Hooks
For Claude Code users who want automatic enforcement:
"Would you like to install Claude Code hooks? These will:
- Remind to invoke this skill on session start
- Warn before git commits if
.long-task-harness/long-task-progress.mdnot staged"
If yes: python3 <SKILL_PATH>/scripts/claude_code_install_hooks.py
Note: These hooks are for Claude Code only. Other agents should use AGENTS.md instructions.
Session Startup Protocol
At the start of each session:
python3 <SKILL_PATH>/scripts/read_progress.py # Last 3 sessions
python3 <SKILL_PATH>/scripts/read_features.py # Incomplete features
git log --oneline -10
Then continue from "Next Steps" in the latest session entry.
During Work
- Work on one feature at a time
- Commit frequently with descriptive messages
- Update
.long-task-harness/features.jsonwhen features pass tests - Update
.long-task-harness/long-task-progress.mdbefore ending session
Session Entry Format
### Session N | YYYY-MM-DD | Commits: abc123..def456
#### Goal
[One-liner]
#### Accomplished
- [x] Task done
- [ ] Task carried forward
#### Decisions
- **[D1]** Decision made - reasoning
#### Surprises
- **[S1]** Expected X but found Y - implication
#### Next Steps
1. Priority task
Why Log Surprises?
Surprises indicate model uncertainty and contain information-dense context. If something surprised you, it could trip up the next session (or a different agent). Examples:
- [S1] Expected
auth.pyto handle OAuth, but it only does API keys. OAuth is inoauth_provider.py. - [S2] Test suite requires Docker running - not documented in README.
- [S3] Config file is gitignored but required - must copy from
config.example.yaml.
This section is optional but valuable for complex or unfamiliar codebases.
Before Ending Session
- Update
.long-task-harness/long-task-progress.mdwith session notes - Commit all changes including progress docs
- Verify tests pass
Critical Rules
- Never edit tests to make them pass - fix implementation
- Never mark features passing without testing
- Always update progress docs before ending
- Commit frequently
Scripts
| Script | Purpose |
|---|---|
init_harness.py |
Initialize project with tracking files in .long-task-harness/ |
claude_code_install_hooks.py |
Install/uninstall Claude Code hooks |
read_progress.py |
Read sessions (--list, --session N, -n 5) |
read_features.py |
Read features (--feature ID, --json) |
session_metadata.py |
Generate git metadata for session entries |
claude_code_precommit_check.py |
Pre-commit hook (warns if progress not staged) |
History Research (10+ Sessions)
For long projects, use subagents as scouts to find relevant history:
Research the history of [feature/file] in this project.
Return POINTERS (session numbers, file paths, decision refs) - not summaries.
Then read only the specific sessions identified.