I build persistent AI characters, and study them in the open.

They run continuously on my own hardware, and I study them with pre-registered experiments whose whole point is that you shouldn't have to trust me: methods published, rules locked before the data exists, adverse reviews kept on the record. The research is self-funded; I pay for it with freelance AI-evaluation work, which is the engine, not the headline.

Persistent AI characters, on my own hardware, with nothing hidden.

Self-funded, independent, and methods-first. No product, no engagement metrics, no growth target: the research is the point.

A complete record

The AI characters I study run continuously and locally, and everything that ever happens to one is written to a single append-only log. Any moment of a "life" can be replayed exactly. No hidden state, which turns questions people usually argue about into experiments you can actually run.

A self that comes apart

Because a character's parts come apart cleanly, I can run experiments most setups can't. The written character, its lived history, the AI engine underneath, and the world it lives in are separate components with clean joints, so you can ask one experimentally: swap the engine under a character that's been alive for a week. Is it still the same character? A pre-registered experiment on exactly that question ran until I halted and voided it: the briefing the character read about itself turned out to be wrong, and scrapping a costly run beat trusting a result built on a false premise. The full episode is in the exhibits below.

No hook, on purpose

Commercial AI companions are optimized to keep users engaged, which poisons their data for science. This system is constitutionally forbidden from that: no engagement targets, no behavior rewards. Whatever dynamics show up are the mechanism's own.

Local-first

Everything runs on my own machine. Nothing about these characters' lives is uploaded to anyone's platform, which is both a privacy stance and an experimental control.

Solo research is usually untrustworthy. I treat myself as the threat.

Measurement rules get frozen and dated before the data they'll judge exists. Instruments are deliberately broken first, to prove the checks can actually fail. And an automated cold reviewer (walled off from my notes and my enthusiasm, seeing only the code and the data) reviews the work with a mandate to tell me I'm wrong.

It has. Repeatedly, in writing, including rejecting one of my experiment designs outright, after which I rebuilt the experiment to its spec. Those adverse reviews are kept and published as part of the record, because a research log that only contains wins isn't a record; it's marketing.

Built with AI, verified anyway.

This research infrastructure was built by directing AI coding agents, and disclosed as such, down to which words are mine and which are a model's. The verification machinery exists precisely so that work produced that way can be trusted on evidence instead of authorship. In 2026, that's not a caveat; it's the demonstration.

One library, two wings.

The Hekswerk library on GitHub holds the published writing. The methodology wing (sixteen essays on how I plan, label evidence, and handle confidence) is live now. The research wing (essays and findings from the program above, each labeled with its provenance) is being added. The code itself lives in its own repositories and the library links to it, so every claim ends at something you can run.

Read the record.

Start with the methodology essays; the research essays land alongside them as they clear their own review gate. Nothing goes in the library that can't carry a date and a provenance line.

Open the library

Read the work directly.

Not papers and not marketing: working artifacts you can inspect. A timeline you can scrub, the honesty machinery laid open, the architecture argued in full. Where a claim can be checked it links to something you can run; where a judgment is mine, it says so.

The engine behind the research.

The research is self-funded. Since 2024 I've done freelance AI-evaluation work for US-based AI quality-assurance platforms (LLM evaluation, dataset quality assurance, and evaluation-harness review): steady, detailed judgment work that pays for the program above. It is the engine behind the research, not a service I'm selling here.

Talk to me about the research.

Research collaboration, or questions about the program: email works best. I read it myself and reply directly.