wassname

Michael J Clark (wassname)

I use the handle wassname. ML engineer in Perth. I work on AI alignment research, specifically steering language models without human preference labels.

I’m building tools to ask AI hard questions—and know if they’re lying.

selected works

AntiPaSTO: Self-Supervised Steering of Moral Reasoning

arXiv:2601.07473, Jan 2026

Gradient-based representation steering using the model’s own behavioral consistency as signal. Outperforms prompting on out-of-distribution transfer. Builds on prior representation alignment work that showed promise but had stability issues.

LLM Ethics Leaderboard

Benchmarking moral reasoning across 12+ language models.

Eliciting Suppressed Knowledge

Probing suppressed activations recovers knowledge models possess but inhibit. ~20% AUROC improvement on TruthfulQA.

LoRA Lie Detectors

Low-rank adapters for deception detection. 87% accuracy on synthetic deception dataset.

more on github →

writing

LessWrong — technical AI safety, policy

background

Kiwi from Christchurch, now in Perth. Physics BSc, MSc petroleum geoscience. Did oil & gas before switching to ML in 2016.

Day job: ML lead at Woodside Energy. Also board member at Cytophenix (medical AI for AMR) and partner at Three Springs Technology (ML consulting).


I want to optimize for the good ending, not the bad one.