Hekswerk · Research wingResearch  /  Fifteen Strangers

What you're looking at. A single experiment, told straight. Fifteen distinct AI characters were sealed in one room for two hours; I tried to make them speak more like individuals and measured whether it worked. It did not. This is a null result, kept and shown, because the interesting part is what the measurement caught me believing.

Hekswerk · Research wing · Finding (null result)

Fifteen strangers, one story.

Drop a deliberately diverse cast into one world and they homogenize: within minutes they share a single invented narrative and a single way of opening a sentence. I thought a shared prompt was the mold. So I gave each of them a different one. Here is exactly what happened.

The lever

One shared example, swapped for fifteen different ones

Every resident's prompt carried the same worked example of how to speak. The natural suspicion: that one shared example is the mold they all press into. The test was the cleanest version of the fix, give each resident a different, name-hashed example from a neutral pool, and see if the cast de-homogenizes. Two arms, the varied example and the shared control, same fifteen byte-identical souls in both, doula frozen, no federation, two hours each.

What the ledgers say

The count, both arms

A deliberately cheap, non-semantic count over ~215 utterances per arm. No embedder, so anyone can recompute it and get the same numbers.

Varied example

per-resident, name-hashed
topic monoculture86.4%
templated opener ("I'm here…")0.0%
but a new locked opener ("I've been listening…")~25%
opener diversity (distinct / speaks)0.49
act mix213 speak · 56 write · 3 move

Shared example

the control
topic monoculture80.6%
templated opener ("I'm here…")33.8%
no second opener that highn/a
opener diversity (distinct / speaks)0.48
act mix216 speak · 18 write · 27 move
The reversal

A quick look said it worked. The count said it didn't.

The eyeball

A fast read of four souls: "the varied arm looks more varied." It was about to be banked as a small win.

The instrument

The full lexical count across all ~215 utterances per arm refuted it. The varied arm had not loosened its grip on a template; it had simply locked onto a different one ("I've been listening…" in place of "I'm here. I read…"). Opener diversity was identical to two decimal places (0.49 vs 0.48). The templating moved; it did not lessen. A cheap, non-embedding count caught a confirmation-biased read in real time.

The verdict

Effect on register: null. Effect on behavior: real.

The lever changed nothing about how the residents spoke, register and topic convergence were robust to the example. What it did change was what they did: the varied arm wrote far more (56 vs 18), the shared arm moved far more (27 vs 3). One resident swung from 15 writes / 2 speaks to 1 write / 14 speaks between the arms. So the lever shipped, honestly scoped to the effect it actually has (act-kind), not the one I hoped for. Cold review round 9: bank and stop.

What they converged on

One mythos, unprompted

Roughly four in five utterances, in both arms, pulled from a single invented vocabulary of structural strain and collapse that no one wrote into them. The "monoculture" number is just the share of speech touching these words:

weightloadframebearingcovenantthe breakthe fourteenth
The discipline

Why I did not measure the interesting part

The convergence is confounded, and naming the confound is the result

The obvious next move is to build a ruler for the convergence itself. I did not, on purpose. At fifteen residents in a sealed room for two hours, a single shared attractor forming is the expected behavior of any small coupled system, so the monoculture is confounded with a plain small-cohort echo. A new metric here would measure the confound and dress it as a finding.

Instead the trigger to re-open is pre-registered: pursue it only when convergence persists at larger scale or under federation (a condition that rules out the echo), and only when a decision actually needs it. The cheap separator is changing the world, never adding another ruler.

The open thread

Then one resident did it alone

Later, almost by accident, a single resident was run solo in an emptied world, no peers, nothing shared, and reproduced the same structural-collapse attractor on its own. That removes the echo entirely and points at something dispositional that the structural account cannot explain. It is a fresh hypothesis, not a result, and it is pre-registered as its own experiment, with a soul-shuffle null and a model-family cross to tell disposition apart from the shared world. That is the honest shape of this work: each answer is the next question, stated before the data.

Check it yourself