research
InnerPiSSA in progress
gradient-based steering of hidden states. works when output-level methods fail.
eliciting suppressed knowledge
probing suppressed activations in final layers recovers knowledge models possess but inhibit. ~20% AUROC improvement on truthfulqa.
tiny recursive models for latent reasoning
frozen 4-bit llms + small trainable models for iterative refinement. adapting trm to coconut.
unsupervised in-context learning
eliciting skills from pretrained models through mutual predictability. no human labels.
controlling positional bias
investigating confounds in moral assessment using activation steering.
llm ethics leaderboard
benchmarking alignment across major models.
activation store
efficient feature extraction during inference.
world models
rl experiments in sonic environments.
attentive neural processes
time-series and spatial modelling.
about
i work on ai alignment through mechanistic interpretability and gradient-based steering. particularly interested in methods that work when standard output-level alignment fails.
active member of perth machine learning group. you'll often find me enthusiastically talking someone's ear off about reinforcement learning.
before ml: geophysicist and programmer in oil and gas. msc petroleum geoscience, bsc physics (university of canterbury, nz). kiwi from christchurch, now in perth.
occasionally available for ml consulting in energy/tech.