Michael J Clark (wassname)

Author

Michael J Clark

Michael J Clark

Michael J Clark (wassname)

I use the handle wassname. ML engineer in Perth. I work on AI alignment research, specifically steering language models without human preference labels.

I’m building tools to ask AI hard questions-and know if they’re lying. Also exploring unsupervised ways to make AI more moral than humans.

Open to collaboration, especially on AntiPaSTO.

selected works

AntiPaSTO: Self-Supervised Steering of Moral Reasoning

arXiv:2601.07473, Jan 2026

Gradient-based representation steering using the model’s own behavioral consistency as signal. Outperforms prompting on out-of-distribution transfer. Builds on prior representation alignment work that showed promise but had stability issues.

S-space steering for eval-awareness control

Replicated eval-awareness paper with novel singular-value-basis (S-space) steering. Hawthorne gap on Qwen3-32B reduced to almost zero (1% vs prior work’s 26%). Apart Research Control hackathon 2026.

LLM Ethics Leaderboard

Benchmarking moral reasoning across 12+ language models.

Eliciting Suppressed Knowledge

Probing suppressed activations recovers knowledge models possess but inhibit. ~20% AUROC improvement on TruthfulQA.

LoRA Lie Detectors

Low-rank adapters for deception detection. 87% accuracy on synthetic deception dataset.

more on github →

selected talks

Perth Machine Learning Group (3,400+ members) co-organizer. Selected talks:

selected writing

LessWrong — technical AI safety, policy

background

Kiwi from Christchurch, now in Perth. Physics BSc, MSc petroleum geoscience. Did oil & gas before switching to ML in 2016.

Day job: ML and modelling at Woodside Energy. Also board member at Cytophenix (medical AI for AMR) and partner at Three Springs Technology (ML consulting).


I want to optimize for the good ending, not the bad one.