# Structural degeneracy in neural networks

Minor Thesis

*submitted in partial fulfilment of the
requirements for the degree of*

Master of Computer Science

*at*

The University of Melbourne

**Matthew Farrugia-Roberts**

Supervised by **Daniel
Murfet** and **Nic Geard**

Submitted: October, 2022. Minor revision: December, 2022.

## Thesis

Download: Full text PDF (3.2MB).

## Abstract

Neural networks learn to implement input–output functions based on data. Their ability to do so has driven applications in many task domains. However, a solid theoretical understanding of deep learning remains elusive. For example, classical theoretical frameworks fail to account for the performance of large neural networks despite their capacity to implement overly complex functions. There is a need for basic theoretical research clarifying the relationship between structure, function, and complexity in neural networks.

For almost all neural networks, the relationship between structure
and function is well understood: all parameters implementing a given
function are related by simple operations such as exchanging the weights
of units. However, for the remaining *degenerate* neural
networks, there are additional parameters implementing the same
function. Since some of these additional parameters correspond to
smaller, less complex neural networks, degenerate neural networks are
positioned to play a crucial role in understanding neural network
learning. However, prior work investigating neural network structures
has largely emphasised the generic non-degenerate case.

In light of this situation, I present a comprehensive theoretical
investigation of structural degeneracy in a simple family of neural
networks (single-hidden-layer biased hyperbolic tangent networks). I
develop an algorithmic framework for analysing degeneracy, including an
ideal measure of degeneracy called the *rank* of a neural network
(the minimum number of hidden units required to implement an identical
function). Using this framework, I characterise the class of
functionally equivalent parameters including in the degenerate case,
extending prior work that has only considered the non-degenerate case. I
show that in the degenerate case, the functional equivalence class is
piecewise linear path connected. Moreover, I study a measure of
approximate degeneracy, the *parametric approximate rank,* based
on proximity to low-rank parameters. Drawing on computational complexity
theory, I show that determining such proximity is an NP-complete
problem.

The insights into structural degeneracy developed in this thesis have the potential to clarify topics of current interest in deep learning. More broadly, this thesis lays a foundation for future work developing efficient measures of degeneracy and empirically investigating the role of degeneracy in deep learning.

## Citation

BibTeX database entry:

```
@mastersthesis{Farrugia-Roberts2022,
title={Structural Degeneracy in Neural Networks},
author={Farrugia-Roberts, Matthew},
month=dec,
year=2022,
school={School of Computing and Information Systems,
The University of Melbourne},
url={https://far.in.net/mthesis},
}
```

## Recorded talks

A series of three ~90-120 minute improvised talks at metauni’s singular learning theory seminar, presenting some of the key results in the thesis.

- Part 1 (youtube): Overview of thesis. Three perspectives on degeneracy. Reducibility and reduction algorithms.
- Part 2 (youtube): Rank and algorithms. Parametric approximate rank. Statement of hardness theorem.
- Part 3 (youtube): Bounding parametric approximate rank (PAR). PAR is NP. Reduction from SAT to PAR (sketch half of proof).
- Part 4 (youtube): An earlier talk in the same series. Preliminary results on piecewise linear path connectivity of the functional equivalence class of reducible parameters.

TODO: Upload and link a couple of other related talks.

## Colophon

Typeset with LaTeX.

Fonts include Computer Modern, BOONDOX fraktur, and Ralph Smith
Formal Script
(`\usepackage[frak=boondox,scr=rsfs]{mathalfa}`

).

Most figures drawn with TikZ, and some with pgfplots.

## Errata

Please contact me if you find any errors.