Sol’s Souls Devlog 04 — Five Thousand Tests, a Thousand Screenshots, and the Screens That Lied
Devlog 03 ended at 3,111 tests and zero failures. The build was mechanically complete, itch-readiness-certified, and described — accurately — as ready for the one test suite that cannot be automated. Then the factory ran the automated suite harder, and the suite found things.
Not crashes. Not corruption. Gaps. Places where the test harness exercised the system but did not probe it. Faction dashboards that rendered correctly under normal conditions but had never been stress-tested with adversarial faction states. Broadcast screens that played their animations flawlessly in isolation but had never been validated while other systems wrote to the same state. A QA runner that exited cleanly after each pass but did not persist long enough to catch the defects that only surface on the second cycle.
This is the devlog about what happens after a game passes its tests and you ask whether the tests were hard enough.
The QA Runner Gets a Spine
The QA system in devlog 03 ran as a single-pass process. Start the suite, execute every test, report the results, exit. That architecture was fine when the suite was growing — add tests, run, check. But a single-pass runner has a blind spot: it never sees the state that accumulates between runs.
Three changes to the runner fixed this.
Persistent QA runner — The test harness now runs as a persistent process that survives across test cycles instead of exiting after each pass. This means the runner holds its own state between cycles: screenshot buffers, screen transition counts, timing data. A transient runner sees each test in isolation. A persistent runner sees patterns across tests — screens that take longer on the second pass, screenshots that shift between cycles, state that drifts when it should not.
Intermediate flush and rebuild — Between test cycles, the runner now flushes accumulated state and rebuilds from a clean checkpoint. This is the opposite of persistence — tear everything down, reconstruct, and verify the reconstruction matches expectations. The flush cycle caught three screen-state inconsistencies that survived thousands of individual test runs because no individual test spanned the boundary between two full cycles.
Screen fixes — The persistent runner exposed rendering edge cases that the single-pass runner never encountered. Screens that initialized correctly on first load but retained stale layout data on subsequent visits. Scroll positions that reset visually but not mechanically. Panel dimensions that calculated correctly from fresh state but accumulated rounding drift across reloads. Each fix was small. The pattern they formed was not.
The runner changes added no new test cases. They changed how the existing tests were executed, and that change was enough to expose defects that 3,111 passing tests had been stepping over.
Closing the Integrity Gap: Deep Data Validation
With the persistent runner surfacing timing and state-drift issues, the next sprint focused on a category of test that devlog 03 had underinvested in: data validation.
The existing suite tested systems — does the event fire, does the decree expire, does the save round-trip. The new tests validate the data those systems consume. Every faction record, every event template, every civic definition, every governance decree — tested for structural completeness, type correctness, and cross-reference integrity.
What this caught was instructive. No bugs in the traditional sense. No crashes, no wrong outputs, no broken saves. What it caught was looseness. A faction record with a valid ID, valid name, valid traits, but a description field that was an empty string instead of nil — technically correct, visually wrong if a tooltip ever rendered it. An event template where two choices produced identical mechanical effects but different text — not a bug, but a design gap where the player sees a meaningful choice that the system treats as identical. A governance decree with an expiry turn set to zero, which the system interpreted as “never expires” because the expiry check was if turn > expiry and zero is never reached — correct behavior, ambiguous intent.
The data validation tests formalized the difference between “this does not crash” and “this is correct.” 193 new tests. Zero failures, but seven data tightenings made before the tests would pass.
Faction Dashboards and Broadcast Screens Under Adversarial Load
Devlog 03’s test suite exercised faction dashboards and broadcast screens in their normal states. Factions had reasonable influence levels, broadcasts had valid content, and the tests confirmed that everything rendered correctly under those conditions.
The new tests ask what happens when conditions are not reasonable.
Faction dashboard adversarial coverage — What renders when a faction’s influence drops to zero? What about negative influence from a decree that overcorrects? What happens when all four factions have identical influence scores and the “dominant faction” label has no clear winner? What renders when a settlement has no faction presence at all — a state that should be impossible but that a corrupted save could produce?
Forty-one new dashboard tests cover these states. The dashboard handled most of them gracefully — the renderer was already defensive about zero values and ties. Two edge cases required fixes: a division-by-zero in the influence percentage calculation when total faction influence across all settlements summed to exactly zero (possible only in a fresh game before the first turn resolves), and a label overflow when all four faction names rendered at full length in the compact summary view.
Broadcast screen stress testing — The broadcast system plays signal animations and transmission VFX while displaying scrolling ENN ticker content. Previous tests validated these elements individually. The new tests validate them concurrently while other systems write to shared state — a turn resolving while a broadcast is mid-animation, a save triggered while the ticker is mid-scroll, a screen transition requested while the signal animation is between keyframes.
Twenty-three broadcast stress tests. The broadcast system was robust — no failures. But the tests now document that robustness as verified behavior rather than assumed behavior, which is a different thing when someone modifies the animation system six months from now.
The Screens That Lied: Visual Polish Fixes
Five visual defects were found and fixed during this sprint. None of them affected functionality. All of them affected the impression the game makes in the first thirty seconds.
HUD right-edge text clipping — The HUD displays resource counts, turn number, and settlement name along the top of the screen. On settlements with long names — and Sol’s Souls has settlements with procedurally generated names that can run fourteen characters — the rightmost text elements clipped against the screen edge. The fix was a layout constraint: rightmost elements now calculate available width from the screen edge inward rather than flowing left-to-right and hoping they fit. Two pixels of breathing room. The difference between “polished” and “did anyone actually look at this.”
INSANE difficulty flavor text readability — The difficulty selection screen shows flavor text for each level. The INSANE difficulty text was rendered in a dim color that communicated “this is a warning” but failed the game’s own contrast threshold. The text described a mode where every resource depletes 40% faster and raids arrive twice as often. Players selecting that mode should be able to read the description of what they are choosing. The fix was a color bump — same hue, higher luminance. The warning tone survives. The readability does not suffer.
Guide text full-width flow — The How to Play guide from devlog 03 rendered text in a column narrower than the available panel width. The padding was originally set to accommodate a sidebar that was removed during layout iteration. The text now flows to the full panel width, which adds roughly fifteen characters per line and eliminates three instances where a paragraph broke awkwardly across lines because the artificial constraint forced a wrap mid-clause.
Store hero lineup with combat report screenshot — The itch.io store page uses a hero image that shows the game’s core identity. The previous hero showed the campaign map — accurate but visually sparse. The new lineup places a combat report screenshot alongside the campaign view, showing both the strategic layer and the tactical resolution layer in a single frame. Not a code fix. A marketing fix.
Visual overlap and text clipping across three screens — The settlement detail screen, governance screen, and overview screen each had minor layout issues: a panel border that overlapped adjacent text by one pixel, a scrollable list where the last entry’s bottom edge clipped against the panel boundary, and a resource delta indicator that rendered behind its parent panel’s header. Three screens, three fixes, each under ten lines of code. The kind of defects that individually are invisible and collectively make a game feel unfinished.
The QA Expansion: 3,111 to 4,952
The test count at the end of devlog 03 was 3,111. The count at the end of this sprint is 4,952. The failure count remains zero. The screenshot count is 987.
Here is what the new 1,841 tests cover.
Deep data validation (3,304 tests) — Structural integrity checks across all game data: faction records, event templates, civic definitions, governance decrees, unit types, research nodes, trade routes, settlement templates. Type checking, cross-reference validation, completeness verification. The tests that caught the looseness described above.
QA integrity hardening (3,715 tests) — Persistent runner stress tests, flush-and-rebuild verification, cross-cycle state comparison. The tests that validate the test harness itself — ensuring that what the runner reports as “pass” actually means the system is correct, not merely that it did not crash.
Faction dashboard and broadcast coverage (4,275 tests) — Adversarial faction states, concurrent animation stress, shared-state write conflicts during rendering. The tests described in the faction and broadcast sections above.
Extended visual coverage (4,659 tests) — Screenshot-driven validation across all seventeen screens in multiple game states: early game, mid game, late game, post-endstate. Contrast ratio verification. Layout boundary checks. Scroll position persistence. The visual tests that caught the five polish defects.
Expansion sprint finalization (4,952 tests) — Integration tests spanning multiple systems: a full campaign from founding ceremony through sovereignty victory with data validation, visual verification, and save/load integrity checks at every turn boundary. These are not unit tests. They are campaign-length acceptance tests that exercise the entire state pipeline end-to-end.
987 screenshots. Every screen, every game state, every difficulty level, every critical UI path. The screenshots are not decorative — they are the visual regression baseline. If a future commit changes a pixel, the QA system will know which pixel and on which screen.
From ‘Passes Tests’ to ‘Survives Probing’
The distinction matters and it is not semantic.
A game that passes its tests has been verified against its own expectations. The tests say “this should work” and the game says “yes, this works.” That is what devlog 03 reported. 3,111 tests. Zero failures. The game met its own standard.
A game that survives probing has been verified against expectations it did not set for itself. The persistent runner found state drift that no individual test expected to encounter. The data validation tests found looseness that no system test was designed to catch. The adversarial faction tests created states that the game’s own logic would never produce — and verified that the game handled them anyway.
4,952 tests with zero failures means the game does not just work under the conditions it was designed for. It works under conditions that were designed to break it. The faction dashboards handle impossible states. The broadcast system survives concurrent writes. The save pipeline round-trips through flush-and-rebuild cycles without drift. The HUD renders correctly with names it was not optimized for.
This is the difference between a build that works and a build that is ready to be played by someone who is not its developer. Players do not follow the happy path. They name their settlements “AAAAAAAAAAAA.” They save mid-animation. They select INSANE difficulty and then read the fine print. They check if the faction dashboard shows something coherent when their colony is collapsing.
The game that passes those tests is not the same game that passes its own.
What’s Next
The build is at 4,952 tests, 987 screenshots, and zero failures. The visual polish pass removed every sub-pixel defect the expanded suite could find. The QA runner is persistent, adversarial, and self-verifying.
What remains is the push toward itch.io store submission: platform packaging, final store page assets, and the continued QA expansion that keeps the test-to-feature ratio climbing rather than plateauing.
The lawns are still green. The tests are harder. The settlements are still arguing, but now we are listening.
Sol’s Souls is in the playtesting phase. Follow x00f.com/games/sols-souls/ for updates.