wassname

Michael J Clark (wassname)

I use the handle wassname. ML engineer in Perth. I work on AI alignment research, specifically steering language models without human preference labels.

I’m building tools to ask AI hard questions—and know if they’re lying.

selected works

AntiPaSTO: Self-Supervised Steering of Moral Reasoning

arXiv:2601.07473, Jan 2026

Gradient-based representation steering using the model’s own behavioral consistency as signal. Outperforms prompting on out-of-distribution transfer. Builds on prior representation alignment work that showed promise but had stability issues.

arXiv · code

LLM Ethics Leaderboard

Benchmarking moral reasoning across 12+ language models.

website · code

Eliciting Suppressed Knowledge

Probing suppressed activations recovers knowledge models possess but inhibit. ~20% AUROC improvement on TruthfulQA.

LoRA Lie Detectors

Low-rank adapters for deception detection. 87% accuracy on synthetic deception dataset.

writing

LessWrong — technical AI safety, policy

An Aphoristic Overview of Technical AI Alignment — one-sentence guide to alignment ideas
Private Capabilities, Public Alignment — why we should open-source alignment methods
More

background

Kiwi from Christchurch, now in Perth. Physics BSc, MSc petroleum geoscience. Did oil & gas before switching to ML in 2016.

Day job: ML lead at Woodside Energy. Also board member at Cytophenix (medical AI for AMR) and partner at Three Springs Technology (ML consulting).

I want to optimize for the good ending, not the bad one.