Wages and Mages Devlog 06 — Twenty-Three Thousand Tests and the One That Requires Human Hands
Devlog 05 ended with the itch_ready badge. 6,808 tests. 1,002 screenshots. Zero failures. A game promoted to the launch queue through automated evidence — every narrator line verified, every defense grid stress-tested, every save file round-tripped. The factory’s coldest audit had found one bug and fixed it in the same wave.
The factory looked at that badge and kept going.
Between devlog 05 and now, the QA harness ran ten more expansion waves — waves 8 through 17 — growing the test suite 3.4x to 23,339 tests and the screenshot corpus to 2,064 captures. Zero failures across all ten waves. One infrastructure crisis that nearly killed the pipeline. Two hero screenshots documenting zones the store page couldn’t show. And a milestone transition that puts the game in a new category entirely: waiting for human hands.
This is the devlog about what happens after a game is ready but before it ships.
Waves 9–11: Broadening the Map (6,808 → 9,563 tests)
Wave 8 had established the itch_ready baseline. Waves 9 through 11 pushed past the systems that earlier waves had already validated and into the zones and mechanics that existed at the edges.
Wave 9 opened three fronts the factory had only lightly touched. The Ice Kingdom got its first deep pass — boss fight mechanics, the ice mech trade system, shop item validation, and map data integrity for the Glacial Court. North Pass traversal received 9 NPC interaction tests, sky gradient zone verification, and 3 dedicated area screenshots. The Mirror World got its first mechanical audit: inverted palette logic, phase-gated portal access, and shrine reward validation. The wave also closed 4 pre-existing test failures that had been silently tolerated — audio wiring mismatches and API access patterns written against stale interfaces.
Wave 10 deepened crafting coverage across all 33 recipes and 4 quality tiers, swept phase 1/2/3 state transitions, and ran data integrity checks across the full system graph. Wave 11 pushed hardest — 1,234 new tests in a single wave, the largest expansion since Wave 7’s state gap closure sprint. The test count hit 9,563. The screenshot count hit 1,511.
And then the pipeline broke.
The Segfault at Screenshot 1,511
Love2D’s canvas capture system has a resource ceiling. The QA harness discovered it the hard way.
Real canvas captures — the kind where Love2D renders a frame, reads the pixel buffer, and writes a PNG — consume GPU memory and driver resources that accumulate across a session. Below a thousand captures, the cost is invisible. Above a thousand, the Love2D process begins competing with itself for graphics resources. At approximately 1,000 real captures, the process segfaults mid-run.
Wave 11 hit the wall. The QA harness was running 9,563 tests with real screenshot captures enabled. At capture 1,511, the Love2D process crashed. The test results were incomplete — 7,446 of 9,563 tests had executed before the segfault killed the session. The remaining 2,117 tests never ran. For the first time in the entire QA campaign, the factory had a wave that could not complete.
The fix was architectural rather than cosmetic. The screenshot system now tracks its real-capture count and automatically degrades to synthetic screenshots after 800 real captures — well below the ~1,000 threshold where resource exhaustion begins. Synthetic captures validate layout, element positioning, and state correctness without invoking the GPU capture pipeline. They are lighter, faster, and effectively unlimited.
After the fix shipped, Wave 11 re-ran to completion. 9,563 pass. 0 fail. 1,511 screenshots. The pipeline was stable — and the factory could push test volume as high as it needed without worrying about the tools falling over mid-run.
Waves 12–14: The Acceleration (9,563 → 16,040 tests)
With the segfault fix in place, the factory stopped being careful and started being thorough. The next three waves added 6,477 tests — nearly as many as the entire suite contained at the start of devlog 05.
Wave 12 pushed past 10,000 tests with comprehensive data validation across every game system. Items got a full audit: weapons, armor, and accessories tested across all tiers with field validation, deduplication checks, and helper function verification. The enemy roster was validated end to end — base enemies, dungeon creatures, elite generation, chaos infusion, and level scaling. Wave defense got stress-tested for nights 1 through 10. Recipe-item cross-references were verified for all 4 crafting tiers. Structure upgrade chains were validated for walls, turrets, buildings, and traps. Save/load round-trips covered all three campaign phases. Even the seal weapon system — 4 phases, scabbard charge mechanics, 12 consumable items — got its own test block.
Wave 13 ran 3,012 new tests — the single largest wave expansion in the entire campaign. The focus shifted from data validation to behavioral edge cases. Economy deep paths covered negative balance prevention, difficulty-scaled loot tables, buy/sell flows, and reward formula validation. The dual narrator system received its deepest audit yet: complete line pool verification for both W.A.G.E-9999 and M.A.G.E-0001, trigger system validation, affinity tracking, and quip content completeness. Grid TD wave survival testing extended into procedural night generation for nights 11 through 50.
Wave 14 introduced a new data module — factions_data.lua — with 5 factions, 6 reputation thresholds, rewards, penalties, reputation events, and cross-faction dynamics. The wave also deep-validated all 32 crafting recipes across smithy, alchemy, and enchanting paths, tested night raid mechanics at every difficulty tier (Easy/Normal/Insane), and ran save/load round-trips with complex state including double-cycle serialization.
Tests: 9,563 → 16,040 | Screenshots: 1,511 → 1,606
Waves 15–17: The Adversarial Gauntlet (16,040 → 23,339 tests)
The final three waves stopped asking “does this work?” and started asking “can this break?”
Wave 15 attacked faction conflicts with deliberate adversarial scenarios: 5 factions × 6 thresholds × 17 reputation events. What happens when one faction relationship rises while its antagonist drops? Full steampunk council simulation with every member interaction tested. Max and min reputation edge cases for every faction. Advanced building tiers got validated at scale — 9 buildings × 5 tiers with cost, effect, and description verification. Turret upgrade chains were tested across 6 turret types × 3 marks. Cross-system integration tests confirmed that faction reputation properly modifies shop prices, building tiers gate crafting recipes, and reputation thresholds unlock companion access.
Wave 16 was the largest single wave in the entire QA campaign: 3,755 new tests across seven sections. Dungeon deep exploration covered rooms, floors, bosses, hazards, loot, and cross-references. Quest and NPC interaction trees were validated for all 30 quests across main, side, and corporate categories. Wave defense scaling was tested for both handcrafted and procedural nights with full difficulty curves. Dual narrator dialogue got a tone analysis pass across all 20 scenes with W.A.G.E/M.A.G.E speaker verification and choice branching. Skill tree prerequisites were validated for all 6 characters across 3 branches each — including cycle detection to prove no circular dependencies exist. Shop economy progression was tested across all 3 campaign phases.
Wave 17 closed the remaining gaps. Thirty quests fully validated with objectives, rewards, and narrator lines. The companion system got deep testing for all 5 companions — stats, skills, growth curves, and skill tree cross-references. Thirteen state modules were verified through enter/draw/exit preview cycles. A final structures data sweep covered walls, turrets, traps, buildings, tier chains, and structure limits.
Tests: 16,040 → 23,339 | Screenshots: 1,606 → 2,064
The Ice Kingdom and Mirror World
Between waves 9 and 10, the factory paused to document what it had never shown the public.
Two zones — the Ice Kingdom and the Mirror World — had been playable since Phase 2 and testable since Wave 9, but no hero screenshot existed for either. The Glacial Court overworld and the inverted-palette mirror dimension were real, tested, and invisible to anyone reading the store page.
The factory captured them: the Glacial Court overlooking its frozen architecture, and the Mirror World with its inverted palette casting familiar terrain in alien light. Store copy was updated to reference all four campaign regions — Heartlands, Ice Kingdom, Northern Pass, Mirror World — for the first time. The game now has 8 hero screenshots covering hub, combat, defense grid, dungeon, crafting, skill tree, and both late-game zones.
The store page finally shows what the test suite already knew.
The Numbers
| Metric | Devlog 05 | Devlog 06 | Delta |
|---|---|---|---|
| Automated tests | 6,808 | 23,339 | +16,531 (3.4×) |
| Screenshots | 1,002 | 2,064 | +1,062 |
| Test failures | 0 | 0 | 0 |
| QA waves completed | 7 | 17 | +10 |
| Hero screenshots | 5 | 8 | +3 |
| Faction tests | — | 200+ | new |
| Quest validations | — | 30 quests | new |
| Companion tests | — | 5 companions | new |
| Infrastructure fixes | 1 (world entry) | 1 (segfault) | — |
| Game status | itch_ready | PLAYTESTING | promoted |
The PLAYTESTING Gate
After Wave 17 completed with zero failures, the game was promoted from itch_ready to PLAYTESTING with a status of human_pending.
The distinction matters. Itch_ready means the factory is confident the game works — every system tested, every screen captured, every edge case exercised. PLAYTESTING means the factory has exhausted what it can verify alone. Twenty-three thousand tests exercise every function, every state transition, every data relationship, every save/load cycle, every narrator trigger, every crafting recipe, every faction interaction, every skill tree branch. The test suite is adversarially robust — ten waves designed to break things found nothing to break.
But there is one test the factory cannot write: does a human being enjoy playing this?
The game enters PLAYTESTING because the automated evidence is complete. No test failures. No known bugs. 2,064 screenshots documenting every surface. The build is not just correct — it has been stress-tested by 23,339 assertions spanning the entire system graph. What it needs now is the input that cannot be synthesized: a human sitting down, picking up the controller, and discovering the bugs that only appear when someone plays the game in a way no test anticipated.
W.A.G.E-9999 would file this under “pre-revenue market validation phase.” M.A.G.E-0001 would call it “the trial the prophecy cannot foretell.”
Wages and Mages has 23,339 tests watching every system. It has 2,064 screenshots documenting every screen. It has zero failures. It has everything except human hands.
That test starts now.