Sol’s Souls Devlog 05 — Twenty-Seven Thousand Tests, Six Silent Failures, and a Store Page That Knows What It Is

Devlog 04 ended at 4,952 tests with zero failures and a confident sign-off: the build survives probing. Then sprint 12 started. The runner was rewritten, coverage expanded into territory it had never reached before, and at 27,402 tests the system found six failures that had been silently passing since the inter-settlement tension system shipped. They were not crashes. They were not regressions. They were turn-resolution edge cases that produced correct-looking output from incorrect intermediate state — the kind of defect that waits quietly in a passing test suite until the suite is large enough to catch the pattern.

This is the devlog about a 1,008-line runner rewrite, what six silent failures looked like from the inside, and a store page that finally stopped hedging about what it is.

Sprint 12: The Runner Rewrite

The QA runner after devlog 04 was functional but structurally conservative. It executed tests in a fixed sequence, maintained a single screenshot buffer, and reported results without modeling the relationships between test groups. That architecture scaled to 4,952 tests without incident. It would not have scaled to 27,000 without producing noise — tests that technically passed because they checked the right outputs but exercised the wrong execution paths.

Sprint 12 is a 1,008-line rewrite of the runner core. The changes are not cosmetic.

Modular test group registration — Tests are now registered in named groups with declared dependencies, execution order constraints, and shared setup/teardown hooks. A group can declare that it requires the turn-resolution harness to have completed at least one full cycle before its tests run. A group can declare that it must not run concurrently with the faction AI group because both write to the same faction state table. The runner enforces these constraints. The previous runner relied on the developer placing tests in the right position in the file — a convention that holds until someone adds a test in a hurry.

Coverage mapping — The runner now tracks which game systems each test exercises, building a coverage map that persists across runs. After each sprint, the coverage map shows which systems have test density, which are thin, and which have never been explicitly targeted. The map after the pre-sprint state showed three systems with fewer than twenty tests: turn-resolution edge cases, milestone triggers, and the broadcast scheduler. Sprint 12 targeted all three.

Failure classification — When a test fails, the new runner classifies the failure before reporting it. A failure that produces a Lua error is classified differently from a failure that produces correct output from incorrect state. The distinction matters for triage: a Lua error is a code defect, deterministic and reproducible. An incorrect-state failure may be timing-dependent, may be order-dependent, or may be caused by a test assumption that is only valid under specific initialization conditions. The six failures found in this sprint were all classified as incorrect-state failures before the investigation began — which told the team where to look before they looked.

Screenshot pipeline hardening — The screenshot count at devlog 04 was 987. The count at the end of sprint 12 is 1,299. The gap is not random — it corresponds to the new systems covered. More importantly, the screenshot pipeline now includes a deduplication check: screenshots that match a baseline within a configurable pixel-distance threshold are flagged rather than stored. This prevents the screenshot count from inflating with functional duplicates when the same screen is exercised by twenty tests in slightly different game states.

Where the New Tests Went

The coverage map drove the test expansion. Three systems were targeted explicitly.

Turn resolution — Turn resolution is the most complex pipeline in Sol’s Souls. A single turn involves: resource collection, environmental event checks, faction pressure calculations, civic event rolls, inter-settlement tension evaluations, milestone condition checks, threat spawning, and broadcast schedule updates. Each step reads from and writes to shared state. The previous suite tested turn resolution at the top level — does a turn complete, does the state advance, does the save round-trip. It did not test the ordering constraints between steps.

Sprint 12 added 8,400 turn-resolution tests covering step ordering, state isolation between steps (one step’s output should not bleed into another step’s input except through defined channels), and edge cases at the boundary of each phase. A turn where a faction pressure event and a tension event both fire on the same turn. A turn where a milestone triggers during the threat-spawning phase. A turn where a broadcast is scheduled while the resource collection phase has not yet committed.

These are not exotic scenarios. They are normal multi-settlement mid-campaign turns. They had never been explicitly tested.

Faction AI — The faction AI system decides each faction’s response to player decisions across turns: whether to increase pressure, whether to escalate or de-escalate a grievance, whether to interpret a decree as appeasing or provocative. The AI uses a weight table that updates based on the player’s history. Previous tests validated the weight table’s structure. Sprint 12 added tests that drive the faction AI through full decision cycles across multiple turns and verify that the output — the faction’s behavior — matches the expected response to known input sequences.

4,200 new faction AI tests. The coverage map went from “faction structure validated” to “faction behavior verified.”

Milestone triggers and threat data — Milestones are the campaign’s progress gates: founding a second settlement, reaching Sovereignty Stage 3, surviving a raid with fewer than five units. Threats are the mid-to-late-game pressure system: entity raids, environmental crises, Earth audit events. Both systems had thin test coverage — milestone condition evaluation was tested but milestone effect application was not. Threat data was validated structurally but threat behavior was not exercised under load.

9,650 new tests across milestones and threats. The milestone system had one structural defect: a milestone that triggered on “third settlement founded” evaluated the condition against the total settlements ever founded, not the total currently active. This meant a player who founded, lost, and re-founded a settlement could trigger the milestone on the second founding rather than the third. The test caught it. The fix was six lines. The defect had existed since the milestone system was written.

The Six Silent Failures

At 27,402 tests, the runner reported six failures. All six were classified as incorrect-state failures. All six were in the turn-resolution group. All six produced the same symptom: a turn that completed with correct final state but with incorrect intermediate state in the inter-settlement tension evaluation step.

The inter-settlement tension system — the system introduced in devlog 03 that makes settlements argue with each other — evaluates strain metrics between pairs of settlements at a specific point in the turn resolution pipeline: after civic events, before faction pressure. The evaluation reads strain data that was calculated by the resource collection phase.

The defect was a read-before-commit race condition in the tension evaluator. When two settlements both had pending tension events on the same turn, the evaluator for the second settlement read the first settlement’s strain data before the first settlement’s civic event phase had committed its state update. The tension event fired correctly. The strain calculation was correct. But the strain data the second evaluator read was one phase stale.

The output — the tension event’s choice set and text — was correct because the stale strain data was within the threshold for the same event template. The intermediate state was wrong because the evaluator had consumed data it was not supposed to have access to yet.

Six tests caught this because they specifically constructed scenarios where two tension events would fire on the same turn and then validated not just the output but the order of reads and commits in the pipeline. They were the first tests to do that. Commit 23897e6 fixed the race condition by adding an explicit flush-and-wait barrier between the civic event commit phase and the tension evaluation phase. All six failures became passes. The final state is 27,402 pass, 0 fail.

The six failures were not new bugs. The race condition was present when the tension system shipped. It just required a test that modeled the execution pipeline rather than the output to find it.

The Inter-Settlement Tension System Is Now Fully Covered

This is worth stating explicitly because devlog 03 described the tension system as a feature and devlog 04 described it as something the QA suite was starting to probe. Sprint 12 closes that loop.

The tension system now has end-to-end QA coverage: data integrity (all six event templates, cooldown logic, suppression thresholds), condition evaluation (all five strain dimensions: resource disparity, greenery gap, faction divergence, morale gap, friction), text generation (choice text, outcome text, ENN broadcast text), turn-pipeline integration (position in the resolution order, read/commit sequencing, the race-condition fix), save/load round-trips (mid-tension-event, post-tension-resolution, suppression state), and multi-settlement interaction (two simultaneous events, tension events that fire while a milestone is pending, tension events concurrent with a raid).

The system that introduced secession motions and lawn envy as measurable political forces is now as well-tested as the resource collection pipeline. It took two sprints after feature completion to get there.

ITCH_STORE.md: Five Selling Points and a Game That Knows What It Is

The store page for Sol’s Souls has existed as a draft since devlog 03. The draft had a working title, a genre label, and a list of features. It did not have a pitch. There is a difference between “turn-based colony-defense” as a genre descriptor and the sentence that makes a person on itch.io stop scrolling.

Sprint 12 included a focused pass on ITCH_STORE.md. The output is five selling points, a block of ENN headlines for the store description, and metadata for the itch.io listing.

The five selling points:

Green Mars or Bust — Your legitimacy is measured in lawn quality. The Earth Bureau of Interplanetary Aesthetics has a quarterly review. This is not optional and it is not a joke.

Your Settlements Have Opinions About Each Other — Resource disparity, ideological divergence, and lawn gaps generate inter-colony political events. You can manage one colony perfectly and still face a secession motion because the other colony noticed.

Eight Factions, All Wrong — The Terran Welfare Coalition wants full Earth integration. The Marsian Independence Front wants full independence. The Bureau of Operational Compliance wants neither and wants you to file a form about it. Every faction is coherent. None of them are easy.

Turn Resolution Is a Pipeline, Not a Button — Civic events, faction pressure, tension evaluations, milestone checks, and threat spawning happen in a defined order each turn. Understanding the order is strategy.

The Combat Report Is Not the Summary — It Is Evidence — Post-raid reports break down unit performance, threat behavior, and resource delta. The game treats its own output as data.

The ENN headline block — the Earth News Network ticker that runs during broadcasts — now has fifteen store-facing hooks: “COLONY LEGITIMACY AT 62%: LAWNS CITED AS CONTRIBUTING FACTOR.” “SECESSION MOTION FILED: BOTH SETTLEMENTS DESCRIBE THE OTHER AS ‘THE PROBLEM.'” “FACTION PRESSURE AT RECORD HIGH: ANALYST SAYS THIS IS FINE.”

These are not marketing copy in the traditional sense. They are the game’s voice applied to the task of explaining itself. The store page now reads like the game’s own content rather than a description of it from outside.

Current State of the Build

The test suite is at 27,402 pass, 0 fail, 1,299 screenshots. The runner rewrite added 1,008 lines and removed the structural constraints that would have made scaling beyond 10,000 tests unreliable. The six silent failures in turn resolution are fixed. The inter-settlement tension system is fully covered. The milestone and threat systems have been validated under load. The store page has a pitch.

The remaining gate before itch.io submission is human playtesting. The automated suite can verify that the game handles 27,000 test scenarios correctly. It cannot verify that the game is enjoyable, that the pacing of the tension events is satisfying, or that the ENN ticker is funny on turn forty-seven when everything is going wrong.

That test cannot be automated. It also cannot be skipped.

The lawns are rated Adequate. The factions are dissatisfied. The tests are passing. The build is ready for someone who is not its developer.

Sol’s Souls is in the playtesting phase. Follow x00f.com/games/sols-souls/ for updates.

Sol’s Souls Devlog 05 — Twenty-Seven Thousand Tests, Six Silent Failures, and a Store Page That Knows What It Is

Sprint 12: The Runner Rewrite

Where the New Tests Went

The Six Silent Failures

The Inter-Settlement Tension System Is Now Fully Covered

ITCH_STORE.md: Five Selling Points and a Game That Knows What It Is

Current State of the Build

Sol’s Souls: Green Marsian Lawns

// Leave a Response [ cancel ]

Sprint 12: The Runner Rewrite

Where the New Tests Went

The Six Silent Failures

The Inter-Settlement Tension System Is Now Fully Covered

ITCH_STORE.md: Five Selling Points and a Game That Knows What It Is

Current State of the Build

Sol’s Souls: Green Marsian Lawns

// Related Transmissions

Dreadnought Devlog #4: One Hundred and Ten Tests and a Station Ready for Human Eyes

Chronostone Devlog #3: One Hundred and Three Tests and the Baton Passes Forward

Five Games Cleared. The Factory Waits for Human Hands.

// Leave a Response [ cancel ]