Cherry-pick tasks functionality from fotolog #8

Closed
opened 2026-05-19 22:39:05 +00:00 by erik · 8 comments
Owner

The fotolog implementation has some nice things we'll want to borrow:

  • a de-duplication key so that we can't accidentally enqueue duplicate tasks
  • a place for results
  • a worker so that multiple instances can process tasks
Column   |           Type           | Collation | Nullable |              Default
-----------+--------------------------+-----------+----------+-----------------------------------
 id        | integer                  |           | not null | nextval('tasks_id_seq'::regclass)
 key       | text                     |           | not null |
 worker    | text                     |           |          |
 results   | jsonb                    |           |          |

/needs-spec

The fotolog implementation has some nice things we'll want to borrow: - a de-duplication key so that we can't accidentally enqueue duplicate tasks - a place for results - a worker so that multiple instances can process tasks ``` Column | Type | Collation | Nullable | Default -----------+--------------------------+-----------+----------+----------------------------------- id | integer | | not null | nextval('tasks_id_seq'::regclass) key | text | | not null | worker | text | | | results | jsonb | | | ``` /needs-spec
agent-pm removed their assignment 2026-05-22 16:10:29 +00:00
Member

Brief — Developer

Phase: fix. Run fjx dev prompt apply for the cycle protocol.

Tasks

  1. Fix TOCTOU crash in enqueueTask (src/lib/worker/queue.ts:89–93)

    The fallback SELECT after a conflicting INSERT uses a non-null assertion (existing.rows[0]![0]) that throws a TypeError if the conflicting task completes (transitions to done/failed) between the INSERT and the SELECT, because the partial unique index no longer covers it and the SELECT returns 0 rows.

    Replace the two-step INSERT + fallback SELECT with a single atomic CTE that inserts and falls back in one round-trip, eliminating the race window. Example shape:

    WITH ins AS (
      INSERT INTO tasks (...) VALUES (...)
      ON CONFLICT DO NOTHING
      RETURNING id
    )
    SELECT id FROM ins
    UNION ALL
    SELECT id FROM tasks WHERE key = $key LIMIT 1
    

    Add a test that verifies the fallback path returns a row (use a pre-inserted done task with the same key to simulate the post-race state).

  2. Archive the openspec change — run the archive command for the change that covers issue #8 once the fix is committed and the PR is updated.

<!-- pm:brief:dev --> # Brief — Developer **Phase:** fix. Run `fjx dev prompt apply` for the cycle protocol. ## Tasks 1. **Fix TOCTOU crash in `enqueueTask`** (`src/lib/worker/queue.ts:89–93`) The fallback SELECT after a conflicting INSERT uses a non-null assertion (`existing.rows[0]![0]`) that throws a `TypeError` if the conflicting task completes (transitions to `done`/`failed`) between the INSERT and the SELECT, because the partial unique index no longer covers it and the SELECT returns 0 rows. Replace the two-step INSERT + fallback SELECT with a single atomic CTE that inserts and falls back in one round-trip, eliminating the race window. Example shape: ```sql WITH ins AS ( INSERT INTO tasks (...) VALUES (...) ON CONFLICT DO NOTHING RETURNING id ) SELECT id FROM ins UNION ALL SELECT id FROM tasks WHERE key = $key LIMIT 1 ``` Add a test that verifies the fallback path returns a row (use a pre-inserted `done` task with the same key to simulate the post-race state). 2. **Archive the openspec change** — run the archive command for the change that covers issue #8 once the fix is committed and the PR is updated.
Member

Ledger for: agent-pm

Status: closed @ 2026-05-23T01:58:45.635Z
Briefs:

Activity

  1. i8/needs-spec : refreshed dev brief
  2. i8/spec-approved : applied
  3. i8/closed : applied

(PM manages this ledger; agent section unused)

<!-- agent-pm:ledger --> ## Ledger for: agent-pm Status: closed @ 2026-05-23T01:58:45.635Z Briefs: - [dev](#issuecomment-1548) - [qa](#issuecomment-1652) - build: absent ### Activity 1. i8/needs-spec : refreshed dev brief 2. i8/spec-approved : applied 3. i8/closed : applied --- _(PM manages this ledger; agent section unused)_
Member

Ledger for: agent-dev

Status: review @ 2026-05-23T01:53:42.306Z


Fix cycle — TOCTOU crash in enqueueTask

Tasks

# Task Status Evidence
1 Fix TOCTOU crash in enqueueTask with atomic CTE done 6c4ba5e
2 Archive openspec change task-queue-dedup-results done 04a0436 (already committed)

Implementation notes

  • Replaced two-step INSERT + fallback SELECT (lines 77–93 of queue.ts) with a single CTE: WITH ins AS (INSERT ... ON CONFLICT DO NOTHING RETURNING id) SELECT id FROM ins UNION ALL SELECT id FROM tasks WHERE key = $3 LIMIT 1. The entire statement runs at one snapshot, eliminating the race window.
  • Added test "enqueueTask: fallback CTE returns a row when active task exists at conflict time" — pre-inserts a done task with the same key to confirm no crash and a valid id is returned in the post-race state.
  • All 78 unit tests pass; just validate is clean.

Prior QA findings disposition

  • Finding 1 (TOCTOU crash): fixed in this cycle.
  • Finding 2 (restartTask re-enters dedup window): out of scope per @erik's directive — address only the TOCTOU issue.
  • Finding 3 (admin UI silent discard): out of scope per same directive.
<!-- agent-dev:ledger --> ## Ledger for: agent-dev Status: review @ 2026-05-23T01:53:42.306Z --- ## Fix cycle — TOCTOU crash in enqueueTask ### Tasks | # | Task | Status | Evidence | |---|------|--------|----------| | 1 | Fix TOCTOU crash in `enqueueTask` with atomic CTE | done | 6c4ba5e | | 2 | Archive openspec change task-queue-dedup-results | done | 04a0436 (already committed) | ### Implementation notes - Replaced two-step INSERT + fallback SELECT (lines 77–93 of queue.ts) with a single CTE: `WITH ins AS (INSERT ... ON CONFLICT DO NOTHING RETURNING id) SELECT id FROM ins UNION ALL SELECT id FROM tasks WHERE key = $3 LIMIT 1`. The entire statement runs at one snapshot, eliminating the race window. - Added test `"enqueueTask: fallback CTE returns a row when active task exists at conflict time"` — pre-inserts a done task with the same key to confirm no crash and a valid id is returned in the post-race state. - All 78 unit tests pass; `just validate` is clean. ### Prior QA findings disposition - **Finding 1 (TOCTOU crash):** fixed in this cycle. - **Finding 2 (restartTask re-enters dedup window):** out of scope per `@erik`'s directive — address only the TOCTOU issue. - **Finding 3 (admin UI silent discard):** out of scope per same directive.
agent-dev removed their assignment 2026-05-22 16:19:24 +00:00
Author
Owner

/spec-approved

/spec-approved
agent-pm removed their assignment 2026-05-22 22:20:51 +00:00
agent-dev removed their assignment 2026-05-22 22:38:06 +00:00
Author
Owner

Implementation looks good. @agent-dev needs to archive the spec and push changes before I'll accept the PR.

Implementation looks good. @agent-dev needs to archive the spec and push changes before I'll accept the PR.
agent-pm removed their assignment 2026-05-22 23:27:07 +00:00
Member

Brief — QA

Phase: review. Run fjx qa prompt default for the cycle protocol.

<!-- pm:brief:qa --> # Brief — QA **Phase:** review. Run `fjx qa prompt default` for the cycle protocol.
Member

Ledger for: agent-qa

Status: concerns @ 2026-05-22T23:33:08.313Z


Action summary

  • Workflow: validate.yamlsuccess (run #37, commit 04a0436)
  • Coverage: src/lib/worker/queue.ts 88% lines / 92.9% branches; src/lib/worker/tasks.ts 71.1% lines / 84.2% branches; src/lib/worker/worker-id.ts 100%
  • No test workflow run is recorded for this PR's head commit — only the validation (fmt/lint/type-check) step ran in CI. Tests were not exercised in CI for this commit.

Gap analysis

1. TOCTOU crash in enqueueTask — not exercised by any test

queue.ts:89–93 executes a two-step INSERT-then-SELECT:

// step 1
INSERT INTO tasks ... ON CONFLICT ... DO NOTHING RETURNING id
// step 2 (only reached when INSERT returned 0 rows due to conflict)
SELECT id FROM tasks WHERE key = $1 AND status IN ('pending', 'processing')
return existing.rows[0]![0]   // ← !-asserts a row exists

Between step 1 and step 2, the conflicting task can complete (transition to done or failed). When it does, the partial unique index no longer covers it, so step 2 returns 0 rows. existing.rows[0] is undefined and the ! assertion causes:

TypeError: Cannot read properties of undefined (reading '0')

The test suite cannot cover this because every dedup test runs inside withTransaction, which holds the conflicting row in an uncommitted state for the duration of the test — the race window never opens. In production, enqueueTask is called on autocommit connections, so the window is real.

Severity: high — uncaught exception in the main write path, triggered by a race condition that becomes more likely under load.

Minimal fix: replace the hard assertion with a null check and either return the conflict-id via a different query (e.g. a CTE that does INSERT + fallback SELECT atomically) or define a fallback contract (re-enqueue as a fresh row if the conflicting task already finished).

2. restartTask re-enters the dedup window untested

restartTask sets status = 'pending' on a previously failed/done task. The key column is not changed. This means the restarted row is back inside the partial unique index window, and the next enqueueTask with the same type+payload will conflict and return the restarted task's id rather than a new row. This may be correct but is completely untested. If the intent is the opposite — restarted tasks should not block new enqueues — this is a silent functional bug.

3. Admin UI does not surface new columns

getTask and listTasks now return key, worker, and results. The admin task detail view (src/routes/admin/tasks/[id].tsx) was not updated — data is fetched and silently discarded. The new columns are invisible to operators.

4. 64-bit hash key space (informational)

deriveTaskKey truncates SHA-256 to 16 hex chars (8 bytes / 64 bits). Birthday collision becomes ~1% at ~400 million tasks — vanishingly unlikely for this app. Noted as an undocumented assumption.


Confirmed issues

# File Issue Severity
1 src/lib/worker/queue.ts:93 TOCTOU crash — existing.rows[0]![0] throws TypeError when conflicting task completes before the fallback SELECT high
2 src/lib/worker/queue.ts + restartTask Restart + re-enqueue interaction has no test coverage; behaviour is plausible but unverified medium
3 src/routes/admin/tasks/[id].tsx key, worker, results silently discarded — not rendered in the task detail view low

Untested assumptions

  • All dedup tests use withTransaction, which prevents the conflict fallback path (queue.ts:89–93) from ever executing — 0 test coverage for the behaviorally critical branch.
  • No CI test run for this commit — validate-only CI means test suite correctness is asserted only by the developer's local run, not a reproducible artifact.

Verdict

concerns — the TOCTOU crash (finding 1) is a real defect in the production write path. Under concurrency it will surface as an uncaught exception. The fix is small but must happen before merge. Findings 2 and 3 should be addressed or explicitly accepted.


Dev tasks (original)

# Task Status Evidence
1.1 Create migration 0010_tasks_dedup_results.sql done b5f0e54
1.2 Backfill + NOT NULL key done b5f0e54
1.3 Partial unique index idx_tasks_key_active done b5f0e54
1.4 just migrate up + psql inspect done migration applied, schema verified
2.1 deriveTaskKey helper in queue.ts done b5f0e54
2.2 Unit test for deriveTaskKey done b5f0e54
3.1 enqueueTask signature updated done b5f0e54
3.2 ON CONFLICT DO NOTHING INSERT done b5f0e54
3.3 Fallback SELECT on conflict done b5f0e54
3.4 TaskRow extended with key/worker/results done b5f0e54
3.5 JSDoc updated on enqueueTask done b5f0e54
4.1 WORKER_ID in worker-id.ts done b5f0e54
4.2 worker stamped in processTaskBatch done b5f0e54
4.3 results persisted in processTask done b5f0e54
4.4 TaskHandler widened to Promise done b5f0e54
5.1 enqueueTask dedup test done b5f0e54
5.2 re-enqueue after done test done b5f0e54
5.3 explicit key test done b5f0e54
5.4 worker column stamp test done b5f0e54
5.5 results persisted test done b5f0e54
5.6 void handler → null results test done b5f0e54
5.7 throws → null results + retry test done b5f0e54
6.1 deno check all touched files done all green
6.2 just test::units + just test::e2e done 77+5 passed
6.3 just validate done clean
6.4 PR ready for review done PR #17

Notes

  • Migration numbered 0010 (not 0002 as spec suggested) because 0002 already exists (trips).
  • WORKER_ID extracted to worker-id.ts to break circular import: loop.ts→tasks.ts→loop.ts.
  • Two existing listTasks tests fixed: they enqueued same (type, {}) payload twice, which now correctly deduplicates; changed payloads to {n:1}/{n:2}.
  • admin-tasks e2e test: added sanitizeResources/Ops: false to fix pre-existing TCP leak from Supabase auth HTTP connection.
  • processTaskBatch tests use drainUntilDone loop (up to 20 iterations) to handle batch contention from parallel test files leaving tasks pending.
<!-- agent-qa:ledger --> ## Ledger for: agent-qa Status: concerns @ 2026-05-22T23:33:08.313Z --- ## Action summary - **Workflow:** `validate.yaml` — **success** (run #37, commit `04a0436`) - **Coverage:** `src/lib/worker/queue.ts` 88% lines / 92.9% branches; `src/lib/worker/tasks.ts` 71.1% lines / 84.2% branches; `src/lib/worker/worker-id.ts` 100% - **No test workflow run** is recorded for this PR's head commit — only the validation (fmt/lint/type-check) step ran in CI. Tests were not exercised in CI for this commit. --- ## Gap analysis ### 1. TOCTOU crash in `enqueueTask` — not exercised by any test `queue.ts:89–93` executes a two-step INSERT-then-SELECT: ```typescript // step 1 INSERT INTO tasks ... ON CONFLICT ... DO NOTHING RETURNING id // step 2 (only reached when INSERT returned 0 rows due to conflict) SELECT id FROM tasks WHERE key = $1 AND status IN ('pending', 'processing') return existing.rows[0]![0] // ← !-asserts a row exists ``` Between step 1 and step 2, the conflicting task can complete (transition to `done` or `failed`). When it does, the partial unique index no longer covers it, so step 2 returns 0 rows. `existing.rows[0]` is `undefined` and the `!` assertion causes: ``` TypeError: Cannot read properties of undefined (reading '0') ``` The test suite cannot cover this because every dedup test runs inside `withTransaction`, which holds the conflicting row in an uncommitted state for the duration of the test — the race window never opens. In production, `enqueueTask` is called on autocommit connections, so the window is real. **Severity:** high — uncaught exception in the main write path, triggered by a race condition that becomes more likely under load. **Minimal fix:** replace the hard assertion with a null check and either return the conflict-id via a different query (e.g. a CTE that does INSERT + fallback SELECT atomically) or define a fallback contract (re-enqueue as a fresh row if the conflicting task already finished). ### 2. `restartTask` re-enters the dedup window untested `restartTask` sets `status = 'pending'` on a previously failed/done task. The `key` column is not changed. This means the restarted row is back inside the partial unique index window, and the next `enqueueTask` with the same type+payload will conflict and return the restarted task's id rather than a new row. This may be correct but is completely untested. If the intent is the opposite — restarted tasks should not block new enqueues — this is a silent functional bug. ### 3. Admin UI does not surface new columns `getTask` and `listTasks` now return `key`, `worker`, and `results`. The admin task detail view (`src/routes/admin/tasks/[id].tsx`) was not updated — data is fetched and silently discarded. The new columns are invisible to operators. ### 4. 64-bit hash key space (informational) `deriveTaskKey` truncates SHA-256 to 16 hex chars (8 bytes / 64 bits). Birthday collision becomes ~1% at ~400 million tasks — vanishingly unlikely for this app. Noted as an undocumented assumption. --- ## Confirmed issues | # | File | Issue | Severity | |---|------|--------|----------| | 1 | `src/lib/worker/queue.ts:93` | TOCTOU crash — `existing.rows[0]![0]` throws TypeError when conflicting task completes before the fallback SELECT | high | | 2 | `src/lib/worker/queue.ts` + `restartTask` | Restart + re-enqueue interaction has no test coverage; behaviour is plausible but unverified | medium | | 3 | `src/routes/admin/tasks/[id].tsx` | `key`, `worker`, `results` silently discarded — not rendered in the task detail view | low | --- ## Untested assumptions - All dedup tests use `withTransaction`, which prevents the conflict fallback path (`queue.ts:89–93`) from ever executing — 0 test coverage for the behaviorally critical branch. - No CI test run for this commit — validate-only CI means test suite correctness is asserted only by the developer's local run, not a reproducible artifact. --- ## Verdict **concerns** — the TOCTOU crash (finding 1) is a real defect in the production write path. Under concurrency it will surface as an uncaught exception. The fix is small but must happen before merge. Findings 2 and 3 should be addressed or explicitly accepted. --- ## Dev tasks (original) | # | Task | Status | Evidence | |---|------|--------|----------| | 1.1 | Create migration 0010_tasks_dedup_results.sql | done | b5f0e54 | | 1.2 | Backfill + NOT NULL key | done | b5f0e54 | | 1.3 | Partial unique index idx_tasks_key_active | done | b5f0e54 | | 1.4 | just migrate up + psql inspect | done | migration applied, schema verified | | 2.1 | deriveTaskKey helper in queue.ts | done | b5f0e54 | | 2.2 | Unit test for deriveTaskKey | done | b5f0e54 | | 3.1 | enqueueTask signature updated | done | b5f0e54 | | 3.2 | ON CONFLICT DO NOTHING INSERT | done | b5f0e54 | | 3.3 | Fallback SELECT on conflict | done | b5f0e54 | | 3.4 | TaskRow extended with key/worker/results | done | b5f0e54 | | 3.5 | JSDoc updated on enqueueTask | done | b5f0e54 | | 4.1 | WORKER_ID in worker-id.ts | done | b5f0e54 | | 4.2 | worker stamped in processTaskBatch | done | b5f0e54 | | 4.3 | results persisted in processTask | done | b5f0e54 | | 4.4 | TaskHandler widened to Promise<unknown> | done | b5f0e54 | | 5.1 | enqueueTask dedup test | done | b5f0e54 | | 5.2 | re-enqueue after done test | done | b5f0e54 | | 5.3 | explicit key test | done | b5f0e54 | | 5.4 | worker column stamp test | done | b5f0e54 | | 5.5 | results persisted test | done | b5f0e54 | | 5.6 | void handler → null results test | done | b5f0e54 | | 5.7 | throws → null results + retry test | done | b5f0e54 | | 6.1 | deno check all touched files | done | all green | | 6.2 | just test::units + just test::e2e | done | 77+5 passed | | 6.3 | just validate | done | clean | | 6.4 | PR ready for review | done | PR #17 | ## Notes - Migration numbered 0010 (not 0002 as spec suggested) because 0002 already exists (trips). - WORKER_ID extracted to worker-id.ts to break circular import: loop.ts→tasks.ts→loop.ts. - Two existing listTasks tests fixed: they enqueued same (type, {}) payload twice, which now correctly deduplicates; changed payloads to {n:1}/{n:2}. - admin-tasks e2e test: added sanitizeResources/Ops: false to fix pre-existing TCP leak from Supabase auth HTTP connection. - processTaskBatch tests use drainUntilDone loop (up to 20 iterations) to handle batch contention from parallel test files leaving tasks pending.
agent-qa removed their assignment 2026-05-22 23:33:29 +00:00
Author
Owner

Address the TOCTOU issue and ignore the other issues by agent-qa. @agent-dev, when done, archive the openspec change.

Address the TOCTOU issue and ignore the other issues by agent-qa. @agent-dev, when done, archive the openspec change.
agent-dev removed their assignment 2026-05-23 01:53:51 +00:00
Sign in to join this conversation.
No description provided.