Welcome!
to the personal website of:
Matthew Farrugia-Roberts
Research Assistant (AI
Alignment and Reward Hacking)
Krueger AI Safety Lab
Nouns: Matthew, Matt, he/him, they/them (singular)—all fine.
Contact: ‘matthew’ at this domain.
Website perpetually under construction. This page includes my bio, announcements, research interests, publications, teaching, coursework, and affiliations.
About me
I am a student, researcher, and teacher from Melbourne, Australia. I’m working on understanding goal misgeneralisation at Krueger AI Safety Lab. I also collaborate on developmental interpretability research at Timaeus.
Previously, I completed a Master of Computer Science degree at the University of Melbourne, with a thesis on lossless compression of neural networks, supervised by Daniel Murfet. During the degree I completed a virtual research internship at the Center for Human-compatible AI studying reward learning theory with Adam Gleave and Joar Skalse, and I helped run a virtual AI safety reading group at metauni. I also completed an exchange semester at ETH Zürich.
Before that, I worked as a tutor and lecturer at the University of Melbourne, for classes on programming, algorithmics, artificial intelligence, theoretical computer science, networks, and operating systems. I also completed a Bachelor of Science (taking these classes and others) shortly beforehand.
Announcements
Coming soon:
- Together with collaborators from Melbourne and Timaeus I published a paper at the HiLD workshop, “Loss landscape geometry reveals stagewise development of transformers”. The paper received a best papers of HiLD award! Unfortunately, I’m not able to attend ICML this year, but my co-author Jesse Hoogland will be at the workshop to present the paper.
- I’m excited to announce that this October I will move to Oxford to start a DPhil in the Department of Computer Science!
Recent news:
- I wrote a critique of Mark Zuckerberg’s recent letter about open source and the future of AI.
- I’m running Hi, JAX!, a free online introductory JAX course with weekly workshops from July 11 to September 12.
- I gave a guest lecture on ethics and the future of intelligence for the subject COMP90087 The Ethics of Artificial Intelligence at the University of Melbourne. A recording is available.
Research interests
Broad research interests:
- Intelligence, learning, and computation (e.g., agent foundations, bounded/computational rationality, artificial intelligence, cognitive science)
- Technology and society (e.g., existential risks from advanced intelligent systems, political philosophy, history and future of humanity)
So far, I’m still a student on these topics, with much to learn.
While I’m establishing myself as an academic, I have focussed on some narrower topics:
- AI alignment (reward learning theory, goal misgeneralisation, developmental interpretability)
- Deep learning theory (neural network geometry, singular learning theory)
- Computer science education (discrete mathematics, theoretical computer science)
Publications by topic
Developmental interpretability:
- George Wang(=), Matthew Farrugia-Roberts(=), Jesse Hoogland, Liam Carroll, Susan Wei, and Daniel Murfet, 2024, “Loss landscape geometry reveals stagewise development of transformers”. Workshop paper to appear at HiLD: 2nd Workshop on High-dimensional Learning Dynamics, ICML 2024. Best papers of HiLD award.
- Jesse Hoogland(=), George Wang(=), Matthew Farrugia-Roberts, Liam Carroll, Susan Wei, and Daniel Murfet, 2024, “The developmental landscape of in-context learning”. Conference paper under review. Preprint on arXiv.
Neural network geometry:
- Matthew Farrugia-Roberts, 2024, “Proximity to losslessly compressible parameters”. Conference paper under review. Preprint on arXiv.
- Matthew Farrugia-Roberts, 2023, “Functional equivalence and path connectivity of reducible hyperbolic tangent networks”. Conference paper (poster) presented at NeurIPS 2023. Preprint on arXiv.
- Matthew Farrugia-Roberts, 2022, Structural Degeneracy in Neural Networks, Master’s thesis, School of Computing and Information Systems, the University of Melbourne. Available online.
Reward learning theory:
- Joar Skalse(=), Matthew Farrugia-Roberts(=), Alessandro Abate, Stuart Russell, and Adam Gleave, 2023, “Invariance in policy optimisation and partial identifiability in reward learning”. Conference paper (poster) presented at ICML 2023. Preprint on arXiv.
Computer science education:
- Matthew Farrugia-Roberts, Bryn Jeffries, and Harald Søndergaard, 2022, “Teaching simple constructive proofs with Haskell programs”. Conference paper: extended abstract presented at TFPIE 2022, full paper published in EPTCS.
- Matthew Farrugia-Roberts, Bryn Jeffries, and Harald Søndergaard, 2022, “Programming to learn: Logic and computation from a programming perspective”. Conference paper presented at ACM ITiCSE 2022.
See also my Google Scholar profile
Teaching
Teaching in 2024:
- COMP90087 The Ethics of Artificial Intelligence (TA, guest lecture on AI safety)
Teaching in 2023:
- COMP90087 The Ethics of Artificial Intelligence (TA)
Teaching in 2021:
- COMP30024 Artificial Intelligence (co-Head TA)
- COMP30026 Models of Computation (co-Head TA)
- COMP90087 The Ethics of Artificial Intelligence (TA)
Teaching since 2016:
- COMP90087 The Ethics of Artificial Intelligence (2021, 2023: TA)
- COMP30026 Models of Computation (2016: TA, 2017–2020: Head TA, 2021: co-Head TA)
- COMP30024 Artificial Intelligence (2017–2019: Head TA, 2020–2021: co-Head TA)
- COMP90059 Introduction to Python Programming (2018: Lecturer and coordinator)
- COMP20007 Design of Algorithms (2016: TA, 2017: Head TA, 2018: Coordinator)
- COMP10001 Foundations of Computing (2017: TA)
- COMP30023 Computer Systems (2017: TA)
- COMP90038 Algorithms and Complexity (2016: TA)
Coursework
Master of Computer Science, University of Melbourne, part-time 2019–2022
- Coursework in theoretical computer science and machine learning (coursework portfolio)
- Coursework average mark 98.75%
- Minor thesis project on structural degeneracy in
neural networks
- Thesis mark 95.5% (top of year)
- Overall weighted average mark 96.25% (top of year)
- Dean’s Honours List (top 5 percentile marks across Faculty of Engineering and Information Technology)
Exchange semester, ETH Zürich, 2020
- Coursework on theoretical computer science, statistical learning
theory, network modelling, and neuroscience (coursework portfolio)
- Grade-point average 5.92 / 6.00
Bachelor of Science, University of Melbourne, 2014–2016
- Major in Computing and Software Systems (coursework in computer
science and software engineering) plus electives in physics,
mathematics, and education
- Average mark 93.04%
- Dean’s Honours List (top percentile marks in Science faculty, all three years)
- AAII Prize in Computer Science (top of class in AI subject—actually I achieved top-of-class marks fairly regularly but this time it came with an industry award)
- ACS Student Award (best marks across third year computer science classes)
Affiliations
Current affiliations:
- Research assistant at Krueger AI Safety Lab.
- Research associate at Timaeus.
Past affiliations:
- Research assistant at the Computational and Biological Learning Lab, University of Cambridge.
- Visiting research associate at the Melbourne Deep Learning Group.
- Independent AI safety researcher, supported by Manifund grant “Introductory Resources for Singular Learning Theory”.
- Research assistant at the School of Computing and Information Systems, the University of Melbourne.
- Teaching assistant at the Centre for AI and Digital Ethics and the School of Computing and Information Systems, the University of Melbourne.
- Master of Computer Science student at the Melbourne Deep Learning Group and the School of Computing and Information Systems, the University of Melbourne.
- Virtual research intern at the Centre for Human-compatible AI, University of California, Berkeley.
- Virtual research intern at the (then-named) Brain, Mind & Markets Laboratory, the University of Melbourne.
- Casual tutor and lecturer at the School of Computing and Information Systems, the University of Melbourne.
Any views expressed on this website are not intended to represent the views of any of my affiliated institutions.