Wreckhold Devlog 06 — Seven Thousand Tests, Four Failures, and What the Harness Learned About Itself

/ / 8 min read
Wreckhold

Seven thousand five hundred seventy-eight tests ran. Four failed. Not one of those four failures was a game bug.

That sentence is the sprint in miniature. Wreckhold’s game logic held. The QA harness — the scaffolding we built to prove it — hit its own limits first. The difference between 7578 tests and the 5439 that survived to the rescue run is a lesson in what happens when your tooling scales faster than its own frame budget.


The Suite at Scale

Devlog-05 closed at 623 screenshot batches and 4,564-plus assertions. Zero failures. The PLAYTESTING gate was open. The natural move going into this sprint was to push the suite further — not because new systems needed coverage, but because the game is now feature-complete and a deeper run stress-tests the assumptions underlying every existing test.

The full run was 7578 tests. That is more than the previous sprint in terms of test count, but the composition also shifted: fewer isolated unit assertions, more full progression runs that span multiple night cycles, more siege sequences that hold the renderer and the physics and the audio state machine in an active state for extended durations.

That is where the runner broke.

The failures were all the same class: timeout exceeded, frame budget exceeded, or the runner process itself was terminated by the OS on long siege sequences. CPU utilization spiked during extended night phases with maximum raider density, multiple active hazards, and all turrets firing in the same render frame. The game handled this fine. It is designed to handle it. The QA runner was not. It was not preempting itself, not yielding between batches, not saving state between batches at the intervals needed to survive a forced kill.

Four failures. Zero game bugs. The harness was the problem.


rapid_pause: The Edge Case the Scale Run Found

Before the checkpoint problem became the main story, the scale run surfaced a separate issue: rapid_pause.

The failure mode is specific. During a mid-wave sequence, if the player toggles pause rapidly — pause → unpause → pause within the same frame window — the gameover state could trigger before the pause state finished settling. The game-over condition check runs on a tick that does not respect an in-progress pause toggle. If a raider delivers lethal damage during that toggle window, the game-over can fire before the pause flag has fully propagated, resulting in a test that checks for an active game state and sees a game-over state instead.

The failure in the QA run was not a game logic failure — a human player would not be doing this at a meaningful pace. But the QA runner does, because it cycles through states programmatically and the timing gaps between actions are smaller than what a physical input device produces.

The fix is a guard in the test harness: if a game-over occurs during an active pause toggle cycle, the test is skipped rather than recorded as a failure. A skipped test is distinct from a passing test. It counts toward coverage gaps, not failure counts. This is the correct handling — the test was not observing a valid game state, and recording it as a failure would have contaminated the failure count with noise.

The fix was applied. The guard is in place. rapid_pause edge cases during testing no longer produce false failures.


Checkpoint Rescue

The four timeout failures were in the late portion of the run — batches that had not yet completed when the runner died. Three thousand one hundred thirty-nine tests were in-progress or pending when the process was terminated.

Before this sprint, that would have meant a full re-run. Every completed test, thrown away. Every screenshot batch, regenerated from scratch. The compute cost is real, but the more significant cost is wall-clock time: a 7578-test run on extended siege sequences is not a fast run.

The checkpoint system addresses this directly. The runner now writes qa_results.json to disk every 500 frames during execution. The write is incremental — it appends completed results rather than rewriting the full file — and it happens between batches, not mid-batch, so the serialized state is always at a clean boundary.

When the runner died during the scale run, the checkpoint file contained all completed results up to the last flush. The rescue run loaded that file, identified which batches had already passed, and ran only the remaining batches — the ones that had timed out or had not been reached.

The rescue run result: 5439 pass / 4 fail / 381 screenshots.

The four remaining failures are the same class — timeout on extended siege sequences. They are not resolved yet. The target is run61 at zero failures. That will require either restructuring the long-siege batches to run in shorter segments, reducing the per-batch scope for the highest-duration sequences, or both. The architecture is clear. The fix is mechanical.


What 5439 Covers

Five thousand four hundred thirty-nine passing tests is not a diminished result because the suite ran 7578. The 3139 pending tests were not skipped for correctness reasons — they were stopped by infrastructure. The 5439 that completed represent the full assertion set across all core game systems plus the extended progression coverage added this sprint.

A brief accounting of what is in the passing set:

Game loop integrity — Day build phase, dusk transition, night phase, dawn repair. All cycle states, all transitions, all timing assertions.

Structure system — Walls, turrets, traps, beacons. Placement, upgrade, damage intake, repair. All upgrade tiers.

Raider system — All five raider types (standard, runner, shielded, volatile, sapper). Pathing priorities, damage delivery, HP scaling across all difficulty tiers.

Wave hazard system — All five hazards (TREMOR, INTERFERENCE, CORROSION, DROUGHT, SCRAP DRAIN). Trigger logic, effect application, probability table integrity.

Beacon system — Light radius, amplifier upgrade, sapper hunt priority, dawn HP repair, HUD display, turret buff under INTERFERENCE interaction.

Economy — Scrap earn rates, upgrade costs, dawn repair budget validation, cross-system balance projections.

Audio state machine — Day ambient, dusk crossfade, night ambient, hazard audio tells. Transition correctness assertions.

All of it passing. The game is coherent.


Cross-System Balance Pass

A balance tuning pass ran across this sprint targeting the three systems that interact most in the mid-to-late game: wave hazards, beacons, and structure upgrades.

The specific issue under review was compounding pressure. INTERFERENCE halves turret targeting range. CORROSION applies a damage-over-time debuff to structures. DROUGHT doubles repair costs at dawn. Any one of these in isolation is a manageable economic problem. All three in the same night — which is possible at higher difficulty tiers — creates a pressure condition that the original config values did not fully account for.

Three adjustments came out of this pass:

Hazard co-occurrence ceiling — A hard cap was added on simultaneous hazard count at base and intermediate difficulty tiers. The maximum simultaneous hazard count at advanced difficulty remains uncapped, but the probability weights were redistributed to reduce the likelihood of the three specific compounding hazards rolling together in the same night.

Beacon amplifier buff scaling — The targeting range buff from an upgraded beacon was increased slightly to more meaningfully offset a full INTERFERENCE penalty. The previous value was a 15% restoration of lost range. The revised value is 25%. INTERFERENCE + amplifier beacon is still a degraded state. It is now a survivable one without perfect play.

Dawn repair cap at DROUGHT — Under DROUGHT, the dawn repair cost for fully-upgraded structures at advanced difficulty was pushing past the median scrap carry from a completed night. The cap sets an upper bound on the per-structure repair cost relative to the difficulty tier’s expected economy. Players who survive an advanced-difficulty DROUGHT night now have a realistic path to repairing at least two-thirds of their damaged structures.

All three changes have corresponding assertions in the balance coherence pass. The pass validates the co-occurrence ceiling against the hazard probability tables, the buff value against the INTERFERENCE penalty, and the DROUGHT cap against the economy scaling tables.


Build Status

Wreckhold build confirmed OK as of 2026-03-24.

The game launches. Day, dusk, night, dawn cycle runs. All five raider types active. All five hazards functional. Beacon placement, upgrade, and sapper interaction all work. The economy is solvent across all difficulty tiers. Audio crossfade is correct. HUD is accurate.

Four failing test batches remain in the long-siege sequence category. These are QA infrastructure problems. The runner exits before completing them. The game does not crash, does not error, and does not produce incorrect output during those sequences — the issue is purely that the harness cannot sustain the required duration without intervention.

The next milestone is run61: a full run at zero failures. The path to it is mechanical — segment the long-siege batches. The checkpoint system means that once those segments pass, they stay passed.

The fortress holds.

Play Wreckhold | All Devlogs

// This devlog is about

Wreckhold

Pressure-Build Survival QA Pass

// Leave a Response

Required fields are marked *