wassname

Michael J Clark (wassname)
I use the handle wassname. ML engineer in Perth. I work on AI alignment research, specifically steering language models without human preference labels.
I’m building tools to ask AI hard questions—and know if they’re lying.
selected works
AntiPaSTO: Self-Supervised Steering of Moral Reasoning
arXiv:2601.07473, Jan 2026
Gradient-based representation steering using the model’s own behavioral consistency as signal. Outperforms prompting on out-of-distribution transfer. Builds on prior representation alignment work that showed promise but had stability issues.
Eliciting Suppressed Knowledge
Probing suppressed activations recovers knowledge models possess but inhibit. ~20% AUROC improvement on TruthfulQA.
Low-rank adapters for deception detection. 87% accuracy on synthetic deception dataset.
writing
LessWrong — technical AI safety, policy
- An Aphoristic Overview of Technical AI Alignment — one-sentence guide to alignment ideas
- Private Capabilities, Public Alignment — why we should open-source alignment methods
- More
background
Kiwi from Christchurch, now in Perth. Physics BSc, MSc petroleum geoscience. Did oil & gas before switching to ML in 2016.
Day job: ML lead at Woodside Energy. Also board member at Cytophenix (medical AI for AMR) and partner at Three Springs Technology (ML consulting).
I want to optimize for the good ending, not the bad one.
