Reviewed by Codex

PR #1466 board brief

Storage V2 lands with hashed identity, migration tooling, and broader stop recovery.

Replaces hash-based storage with projects/{projectId}, introduces deterministic hashed project IDs, removes the storageKey system, adds crash-safe migration with rollback, and makes ao stop/ao start recoverable.

Base: upstream/main Head: storage-redesign Diff: +7,276 / -2,423

1. Start and Stop

The corrected direction shows ao stop becoming broader, and ao start gaining a restore prompt for sessions stopped previously.

ao start

Startup now offers to restore stopped sessions

Before, upstream/main
  • Started dashboard, lifecycle worker, and ensured the orchestrator.
  • No last-stop.json recovery prompt after startup.
  • Already-running flow offered open, new orchestrator, restart, or quit.
After, PR head
  • Still starts dashboard, lifecycle worker, and orchestrator.
  • Reads last-stop.json including otherProjects field.
  • Prompts to restore ALL sessions — current project and cross-project.
  • Loads global config for cross-project session manager access.
  • Skips restoring the orchestrator when already restored by ensureOrchestrator().
ao stop

Stop now tears down all sessions across all projects

Before, upstream/main
  • Located the most recently active orchestrator session.
  • Killed only that orchestrator session.
  • Stopped dashboard and unregistered running.json.
  • Only saw sessions from the local config (1 project).
After, PR head
  • Loads global config to see ALL registered projects.
  • ao stop (no args): kills all sessions across all projects, stops parent process and dashboard.
  • ao stop <project>: kills only that project's sessions, parent process and dashboard stay alive.
  • Records killed IDs to last-stop.json with otherProjects field for cross-project restore.
  • Groups killed sessions by project in output display.
Ctrl+C

Ctrl+C performs full graceful shutdown

Before, upstream/main
  • Stopped lifecycle workers and exited immediately.
  • Left sessions alive in tmux as orphans.
  • No last-stop.json recorded.
  • running.json left stale (auto-pruned on next read).
After, PR head
  • Mirrors ao stop: kills all sessions, writes last-stop.json, unregisters.
  • 10s hard timeout ensures exit even if cleanup hangs.
  • Next ao start can restore sessions that were killed.

2. Kill and Restore Buttons

The web endpoints still call the same session-manager methods, but the PR changes metadata lifecycle semantics underneath those actions.

Dashboard action behavior

Action Before, upstream/main After, PR head Implication
Kill button Calls POST /api/sessions/:id/kill. Kill updated lifecycle and archived metadata by deleting the active metadata file with archive enabled. Same endpoint, but killed sessions remain represented as metadata with a terminated lifecycle. Idempotency checks look for terminated lifecycle state instead of archive presence. Killed sessions are easier for restore/status flows to find without archive lookup.
Restore button Restore checked active metadata first, then archived metadata. Archived restore rehydrated metadata and set status to spawning. Restore finds active metadata only, validates restorability, recreates workspace if needed, relaunches runtime, resets lifecycle to working, and clears terminal PR state. Restore becomes lifecycle-reset based and tied to the active V2 metadata record.
Done bar restore Done cards suppress Restore for merged cards using canRestore && !isMerged. Restore is shown whenever the status is not in NON_RESTORABLE_STATUSES, so more done-card states expose recovery. More completed/terminal-looking sessions can be resumed from the UI.

3. Hashed Project Identity

Project IDs become deterministic hashes instead of bare basenames, eliminating collisions across checkouts of the same repo.

New system

generateExternalId(path, originUrl?)

  • Format: {sanitized_basename}_{SHA256(path+originUrl)[0:10]}
  • Example: agent-orchestrator_a1b2c3d4e5
  • Deterministic — same path + origin always produces the same ID.
  • Collision throws instead of silently degrading.
  • Basename capped at 30 chars; parsed via id.match(/^(.+)_([0-9a-f]{10})$/).

Replaces bare basenames (e.g. agent-orchestrator) and the 12-char SHA-256 storage key from upstream/main. Same-name repos no longer need manual -1 / -2 suffixes, and re-cloning the same repo no longer produces a different storage key.

Removed

storageKey system fully removed

  • ensureProjectStorageIdentity
  • relinkProjectInGlobalConfig
  • deriveProjectStorageIdentity
  • findStorageKeyOwner, StorageKeyCollisionError
  • storageKey kept as optional schema field until ao migrate-storage strips it.
  • registerProjectInGlobalConfig now returns the effective hashed project ID.

4. migrate-storage

In the corrected diff, migrate-storage is introduced, not removed.

New command

Purpose-built migration from legacy hash storage to V2 project storage

1 Inventory Scans ~/.agent-orchestrator for legacy hash-based directories.
2 Guard Detects active tmux sessions and blocks migration unless --force is used.
3 Convert Moves sessions/worktrees into projects/{projectId} and converts key/value metadata to JSON.
4 Rollback Supports --rollback by restoring *.migrated directories and repairing git worktree references.
Before, upstream/main
  • No registered ao migrate-storage command.
  • Runtime used storage-key directories and flat key/value metadata.
  • Project relinking handled storage-key changes.
After, PR head
  • CLI registers ao migrate-storage.
  • Options: --dry-run, --force, --rollback.
  • New core module: packages/core/src/migration/storage-v2.ts.
  • Large test coverage added in migration-storage-v2.test.ts.
Safety hardening

Migration crash safety and concurrency fixes

Five post-review fixes applied to make the migration safe for production data:

1 Atomic writes All session JSON writes use atomicWriteFileSync. Crash mid-write no longer truncates files.
2 Config lock stripStorageKeysFromConfig wrapped in withFileLockSync. No more race with concurrent ao start.
3 macOS collision Case-insensitive projectId collision detection prevents silent data merge on HFS+/APFS.
4 Partial failure Stray worktree moves skip failed projects. Rollback calls repairGitWorktrees.

5. Storage Layout: Before vs Now

This is the architectural center of the PR: AO moves from storage-key/hash buckets to stable project directories with JSON session metadata.

Before, upstream/main

Storage key was the filesystem identity

~/.agent-orchestrator/
├── 7dc54da05c9e-agent-orchestrator/
│   ├── sessions/
│   │   ├── ao-79              # key=value metadata
│   │   ├── ao-orchestrator    # key=value metadata
│   │   └── archive/
│   │       └── ao-79_20260422T083732Z
│   └── worktrees/
│       └── ao-79/
├── eca743472f76-donna/
│   ├── sessions/
│   └── worktrees/
└── config.yaml
    projects.*.storageKey: 7dc54da05c9e
After, PR head

Project ID is the filesystem identity

~/.agent-orchestrator/
├── projects/
│   ├── agent-orchestrator/
│   │   ├── sessions/
│   │   │   ├── ao-79.json
│   │   │   ├── ao-orchestrator.json
│   │   │   └── .agent-report-audit/
│   │   │       └── ao-79.ndjson
│   │   ├── worktrees/
│   │   │   └── ao-79/
│   │   └── worker-prompt-ao-79.md
│   └── donna/
│       ├── sessions/
│       └── worktrees/
├── 7dc54da05c9e-agent-orchestrator.migrated/
└── config.yaml
    projects.*.projectId: agent-orchestrator

What moved where

Object
Before
Now
Projects
Project data was spread across hash/storage-key directories such as ~/.agent-orchestrator/7dc54da05c9e-agent-orchestrator.
Project data is grouped under ~/.agent-orchestrator/projects/{projectId}, for example projects/agent-orchestrator.
Sessions
Session files lived in {storageKey}/sessions/{sessionId} without a JSON extension.
Session files live in projects/{projectId}/sessions/{sessionId}.json.
Metadata
Metadata was flat key=value text. Status, runtime handle, lifecycle, and report fields were split across scalar keys.
Metadata is canonical JSON. Lifecycle, runtime handle, dashboard, agent report, and report watcher data are structured objects.
Archived sessions
Deleted/killed records moved into sessions/archive/{sessionId}_{timestamp}.
Terminated sessions remain visible as JSON metadata in sessions/. Migration flattens legacy archive files into terminated records.
Worktrees
Worktrees lived under {storageKey}/worktrees/{sessionId}, tied to the derived storage key.
Worktrees live under projects/{projectId}/worktrees/{sessionId}, tied to project identity.
Config
Global config carried storageKey per project to locate the hash bucket.
Runtime lookup is project-ID based. On successful migration, storageKey is stripped and old hash dirs are renamed to *.migrated.
Local legacy evidence Many 12hex-name.migrated dirs remain under ~/.agent-orchestrator.
Local V2 evidence ~/.agent-orchestrator/projects/agent-orchestrator/sessions/*.json exists.
Project worktrees projects/agent-orchestrator/worktrees/ao-84 shows V2 worktree placement.
Report audit sessions/.agent-report-audit/*.ndjson stores report history beside session metadata.

6. UX Changes

The PR exposes the storage-redesign workflow through CLI, dashboard, and project settings.

Start/stop

Stop becomes recoverable

ao stop records stopped sessions, and ao start can restore them interactively on the next launch.

Project identity

Storage key fades from UI

The PR removes storage key from some project settings surfaces as storage moves to projects/{projectId}.

Add project

ID collision UX returns

Add-project conflict handling moves back toward project-ID suggestions instead of storage-key reuse confirmation.

Status CLI

Report history added

ao status --reports <value> can show agent report audit history, with full or a numeric limit.

Project CLI

Relink command removed

The storage-key relink command disappears because V2 storage no longer depends on repo-origin derived storage keys.

Metadata

JSON becomes canonical

Session files become .json, with lifecycle stored as an object and status derived from lifecycle where possible.

Bug fix

Orchestrator tmux double-prefix

getOrchestratorSessionId() already returns {prefix}-orchestrator. The spawn path was prepending the prefix again, producing ao-ao-orchestrator. Fixed to use sessionId directly as the tmux name.

Worktree routing

Plugin routes to V2 layout

Added worktreeDir to WorkspaceCreateConfig. Session manager passes getProjectWorktreesDir(projectId) at all 3 spawn/restore sites, routing worktrees to projects/{id}/worktrees/ instead of ~/.worktrees/.

Archiving removed

No more archive/ directory

Killed/terminated sessions stay as JSON metadata in sessions/ with a terminal lifecycle state. The archive/ directory concept, deleteMetadata archiving, and archive path functions are fully removed.

7. Behavioral Bug Fixes

Cross-cutting fixes discovered during storage redesign testing — runtime reconciliation, dashboard scoping, config resolution, and tab completions.

Bug fix

Stale runtime reconciliation

Sessions with dead tmux runtimes showed as "active" indefinitely. sm.list() now detects dead runtimes during enrichment and persists runtime_lost reason to disk. deriveLegacyStatus() maps runtime_lostkilled.

Bug fix

Dashboard sidebar shows all projects

Sidebar only showed sessions for the currently-viewed project. useSessionEvents in Dashboard.tsx is now called without a project filter. Kanban filters client-side via projectSessions memo. Sidebar always sees every project's sessions.

Bug fix

Tab completions show all projects

listProjects() only called loadConfig(), which found the local config (1 project). Now also reads global config via loadGlobalConfig() and merges both project lists.

Bug fix

Config resolution falls back to global

ao stop donna failed with "project not found" when cwd config didn't contain it. CLI now falls back to global config at ~/.agent-orchestrator/config.yaml for both ao stop and ao start.

8. Documentation Updates

Architecture docs updated to reflect CLI behavior, lifecycle, and dashboard changes.

CLAUDE.md

Session lifecycle, storage, CLI behavior

Added canonical lifecycle states/reasons, stale runtime reconciliation, LastStopState + running.json to storage section, config resolution note, new "CLI Behavior" section (ao start/stop/Ctrl+C semantics, dashboard sidebar behavior), and key files (lifecycle-state.ts, running-state.ts, start.ts).

AGENTS.md + copilot-instructions.md

Key files and common mistakes

Added lifecycle-state.ts, start.ts, running-state.ts to key files. Added common mistakes: runtime_lost without deriveLegacyStatus, sidebar project scoping, ao stop parent process kill scoping.

9. LOC Change Overview

Fetched comparison: git diff --shortstat upstream/main..HEAD.

By area

+3,448
+405
+280
+103
+50
-20
Overall summary

This PR ships the storage redesign, cross-project CLI, and behavioral hardening.

The PR moves AO from storage-key based flat metadata to project-scoped JSON storage with deterministic hashed project IDs. It removes the entire storageKey system, removes archiving (terminated sessions stay visible), adds a crash-safe migration command with rollback, and makes stop/start fully recoverable with cross-project awareness. Fixes stale runtime detection, dashboard sidebar scoping, Ctrl+C shutdown, tab completions, and config resolution across all CLI commands.

Strategic direction Adopts V2 storage: projects/{projectId}, JSON metadata, lifecycle-centered status. Removes the entire storageKey system.
Identity model Hashed project IDs ({basename}_{hash}) replace bare basenames. Deterministic, collision-safe, no manual suffixes.
Cross-project CLI ao stop sees all projects via global config, records cross-project sessions. ao stop <project> is surgical. ao start restores all including cross-project. Ctrl+C mirrors ao stop.
Runtime reconciliation Dead tmux sessions detected by sm.list() enrichment and persisted as runtime_lostkilled. No more stale "active" sessions.
Dashboard Sidebar always shows all projects' sessions. Kanban filters client-side. Tab completions merge local + global config.
Migration safety Atomic writes, file-locked config updates, macOS case-insensitive collision detection, git worktree repair in rollback, partial-failure isolation.
Documentation CLAUDE.md, AGENTS.md, copilot-instructions.md, and DESIGN.md updated with CLI behavior, lifecycle states, and architectural invariants.