far.in.net


Structural degeneracy in neural networks

Minor Thesis
submitted in partial fulfilment of
the requirements for the degree of

Master of Computer Science
at
The University of Melbourne

Matthew Farrugia-Roberts

Supervised by Daniel Murfet and Nic Geard

Submitted: October, 2022. Minor revision: December, 2022.

Thesis

Download: Full text PDF (3.2MB).

Abstract

Neural networks learn to implement input–output functions based on data. Their ability to do so has driven applications in many task domains. However, a solid theoretical understanding of deep learning remains elusive. For example, classical theoretical frameworks fail to account for the performance of large neural networks despite their capacity to implement overly complex functions. There is a need for basic theoretical research clarifying the relationship between structure, function, and complexity in neural networks.

For almost all neural networks, the relationship between structure and function is well understood: all parameters implementing a given function are related by simple operations such as exchanging the weights of units. However, for the remaining degenerate neural networks, there are additional parameters implementing the same function. Since some of these additional parameters correspond to smaller, less complex neural networks, degenerate neural networks are positioned to play a crucial role in understanding neural network learning. However, prior work investigating neural network structures has largely emphasised the generic non-degenerate case.

In light of this situation, I present a comprehensive theoretical investigation of structural degeneracy in a simple family of neural networks (single-hidden-layer biased hyperbolic tangent networks). I develop an algorithmic framework for analysing degeneracy, including an ideal measure of degeneracy called the rank of a neural network (the minimum number of hidden units required to implement an identical function). Using this framework, I characterise the class of functionally equivalent parameters including in the degenerate case, extending prior work that has only considered the non-degenerate case. I show that in the degenerate case, the functional equivalence class is piecewise linear path connected. Moreover, I study a measure of approximate degeneracy, the parametric approximate rank, based on proximity to low-rank parameters. Drawing on computational complexity theory, I show that determining such proximity is an NP-complete problem.

The insights into structural degeneracy developed in this thesis have the potential to clarify topics of current interest in deep learning. More broadly, this thesis lays a foundation for future work developing efficient measures of degeneracy and empirically investigating the role of degeneracy in deep learning.

Citation

BibTeX database entry:

@mastersthesis{Farrugia-Roberts2022,
    title={Structural Degeneracy in Neural Networks},
    author={Farrugia-Roberts, Matthew},
    month=dec,
    year=2022,
    school={School of Computing and Information Systems,
        The University of Melbourne},
    url={https://far.in.net/mthesis},
}

Recorded talks

A talk at the Singular Learning Theory & Alignment Summit, 2023

A series of three ~90-120 minute improvised talks at metauni’s singular learning theory seminar, presenting some of the key results in the thesis.

An earlier talk in the same series.

TODO: Upload and link a couple of other related talks.

Colophon

Typeset with LaTeX.

Fonts include Computer Modern, BOONDOX fraktur, and Ralph Smith Formal Script (\usepackage[frak=boondox,scr=rsfs]{mathalfa}).

Most figures drawn with TikZ, and some with pgfplots.

Errata

Please contact me if you find any errors.

Known errors: