~Welcome!

to the personal website of:

Matthew Farrugia-Roberts
First year doctoral student
Department of Computer Science & Magdalen College
University of Oxford

My research aim is to understand the foundations of intelligence, learning, and computation, and to use this understanding to anticipate risks to humanity from future advanced intelligent systems.

Contact: Email matthew@far.in.net, schedule a 1-on-1, or give anonymous feedback.

This page includes my bio, publications, teaching, and affiliations.

Recent announcements (see news page for more):

Our paper “Mitigating goal misgeneralization via minimax regret” was presented at RLC 2025! Joint work with Karim Abdel Sadek and other collaborators from Krueger AI Safety Lab and Google DeepMind.
Our paper “Loss landscape degeneracy and stagewise development in transformers” will appear in TMLR! Joint work with collaborators from Timaeus and Monash University.

Recent writing (see essays page for more):

Blowing up (visualising a simple example from resolution of singularities).
Instrumental/intrinsic value ambiguity (toy model for a kind of goal misgeneralisation).
Smile! (some personal reflection).

§About me

I’m currently a PhD student at the University of Oxford, studying mesa optimisation and agent foundations under the supervision of Professor Alessandro Abate. I collaborate on applying singular learning theory to understand deep learning with Timaeus, and understanding goal misgeneralisation with Krueger AI Safety Lab. I also help run the Oxford AI Safety Initiative.

My background is in computer science, machine learning, and AI safety. I previously completed a master’s thesis on lossless compression of neural networks supervised by Daniel Murfet, and an internship at CHAI researching the foundations of reward learning with Adam Gleave and Joar Skalse. I also helped run a virtual AI safety reading group at metauni. Before that, I studied computer science and machine learning at the University of Melbourne and ETH Zürich. While studying, I also worked for several years as a tutor and lecturer at the University of Melbourne.

§Publications

Reward ambiguity and generalization in reinforcement learning:

Karim Abdel Sadek(=), MFR(=), Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger, and Michael Dennis, 2025, “Mitigating goal misgeneralization via minimax regret”. Conference paper (poster, 3.2MB) to appear at RLC 2025. Preprint on arXiv. Tweet threads on results and motivation.
Joar Skalse(=), MFR(=), Alessandro Abate, Stuart Russell, and Adam Gleave, 2023, “Invariance in policy optimisation and partial identifiability in reward learning”. Conference paper (poster) presented at ICML 2023. Preprint on arXiv.

Science of deep learning, singular learning theory, developmental interpretability:

Simon Pepin Lehalleur(=), Jesse Hoogland(=), MFR(=), Susan Wei, Alexander Gietelink Oldenziel, George Wang, Liam Carroll, and Daniel Murfet, 2025, “You are what you eat: AI alignment requires understanding how data shapes structure and generalisation,” Position paper under review. Preprint on arXiv.
Liam Carroll, Jesse Hoogland, MFR, and Daniel Murfet, 2025, “Dynamics of transient structure in in-context linear regression transformers,” Conference paper under review. Preprint on arXiv.
Jesse Hoogland(=), George Wang(=), MFR, Liam Carroll, Susan Wei, and Daniel Murfet, 2025, “Loss landscape degeneracy and stagewise development in transformers”, Journal paper to appear in TMLR. Preprint on arXiv.
George Wang(=), MFR(=), Jesse Hoogland, Liam Carroll, Susan Wei, and Daniel Murfet, 2024, “Loss landscape geometry reveals stagewise development of transformers.” Workshop paper (poster, 7.3MB) presented at HiLD: 2nd Workshop on High-dimensional Learning Dynamics, a workshop at ICML 2024. Best papers of HiLD award.

Neural network geometry:

MFR, 2024, “Proximity to losslessly compressible parameters”. Conference paper under review. Preprint on arXiv.
MFR, 2024, “Losslessly compressible neural network parameters”. Workshop paper presented at Machine Learning and Compression Workshop, a workshop at NeurIPS 2024.
MFR, 2023, “Functional equivalence and path connectivity of reducible hyperbolic tangent networks”. Conference paper (poster, 3.9MB) presented at NeurIPS 2023. Preprint on arXiv.
MFR, 2022, Structural Degeneracy in Neural Networks, Master’s thesis, School of Computing and Information Systems, the University of Melbourne. Available online.

Computer science education:

MFR, Bryn Jeffries, and Harald Søndergaard, 2022, “Teaching simple constructive proofs with Haskell programs”. Conference paper: extended abstract presented at TFPIE 2022, full paper published in EPTCS.
MFR, Bryn Jeffries, and Harald Søndergaard, 2022, “Programming to learn: Logic and computation from a programming perspective”. Conference paper presented at ACM ITiCSE 2022.

§Teaching

Here are some select teaching projects. See my teaching page for a full list.

University of Oxford:

Head TA (2025), AI Safety and Alignment.
TA (2025), Foundations of Self-Programming Agents.

Independent:

Teacher (2024–2025), “Hi, JAX!” (webpage 2024, online).

The University of Melbourne:

Guest lecturer (2024), COMP90087 The Ethics of Artificial Intelligence (recording).
Lecturer (2019), COMP90059 Introduction to Python Programming.
Head TA (2018) and coordinator (2019), COMP20007 Design of Algorithms.
Head TA (2017–2021), COMP30024 Artificial Intelligence (projects repository).
Head TA (2017–2021), COMP30026 Models of Computation (paper 1, paper 2).

§Affiliations

Current:

Doctoral student at the Department of Computer Science, University of Oxford.
Committee member, Oxford AI Safety Initiative.
Member of Magdalen College, University of Oxford.
AI researcher, AI Existential Safety Community, Future of Life Institute.
Research affiliate at Timaeus.

Past:

Research associate at Timaeus.
Research assistant (AI alignment & reward hacking) at Krueger AI Safety Lab and the Computational and Biological Learning Lab, University of Cambridge.
Research assistant (human–agent interaction) at the School of Computing and Information Systems, the University of Melbourne.
Research intern at the Centre for Human-compatible AI, University of California, Berkeley.

Any views expressed on this website are not intended to represent the views of any of my current or previous affiliated institutions.