# Structural degeneracy in neural networks

Minor Thesis

*submitted in partial fulfilment of the
requirements for the degree of*

Master of Computer Science

*at*

The University of Melbourne

**Matthew Farrugia-Roberts**

Supervised by **Daniel
Murfet** and **Nic Geard**

Submitted: October, 2022. Minor revision: December, 2022.

## Thesis

Download: Full text PDF (3.2MB).

## Abstract

Neural networks learn to implement input–output functions based on data. Their ability to do so has driven applications in many task domains. However, a solid theoretical understanding of deep learning remains elusive. For example, classical theoretical frameworks fail to account for the performance of large neural networks despite their capacity to implement overly complex functions. There is a need for basic theoretical research clarifying the relationship between structure, function, and complexity in neural networks.

For almost all neural networks, the relationship between structure
and function is well understood: all parameters implementing a given
function are related by simple operations such as exchanging the weights
of units. However, for the remaining *degenerate* neural
networks, there are additional parameters implementing the same
function. Since some of these additional parameters correspond to
smaller, less complex neural networks, degenerate neural networks are
positioned to play a crucial role in understanding neural network
learning. However, prior work investigating neural network structures
has largely emphasised the generic non-degenerate case.

In light of this situation, I present a comprehensive theoretical
investigation of structural degeneracy in a simple family of neural
networks (single-hidden-layer biased hyperbolic tangent networks). I
develop an algorithmic framework for analysing degeneracy, including an
ideal measure of degeneracy called the *rank* of a neural network
(the minimum number of hidden units required to implement an identical
function). Using this framework, I characterise the class of
functionally equivalent parameters including in the degenerate case,
extending prior work that has only considered the non-degenerate case. I
show that in the degenerate case, the functional equivalence class is
piecewise linear path connected. Moreover, I study a measure of
approximate degeneracy, the *parametric approximate rank,* based
on proximity to low-rank parameters. Drawing on computational complexity
theory, I show that determining such proximity is an NP-complete
problem.

The insights into structural degeneracy developed in this thesis have the potential to clarify topics of current interest in deep learning. More broadly, this thesis lays a foundation for future work developing efficient measures of degeneracy and empirically investigating the role of degeneracy in deep learning.

## Citation

BibTeX database entry:

```
@mastersthesis{Farrugia-Roberts2022,
title={Structural Degeneracy in Neural Networks},
author={Farrugia-Roberts, Matthew},
month=dec,
year=2022,
school={School of Computing and Information Systems,
The University of Melbourne},
url={https://far.in.net/mthesis},
}
```

## Recorded talks

A talk at the Singular Learning Theory & Alignment Summit, 2023

- “Hidden unit acrobatics: An introduction to swaps, flips, and other symmetries of singular networks and their applications”. An intuitive-level overview of reducibility and main thesis results, plus important directions for future work. 45 minutes. [youtube] [interactive notebook]

A series of three ~90-120 minute improvised talks at metauni’s singular learning theory seminar, presenting some of the key results in the thesis.

- Part 1: Overview of thesis. Three perspectives on degeneracy. Reducibility and reduction algorithms. [youtube]
- Part 2: Rank and algorithms. Parametric approximate rank. Statement of hardness theorem. [youtube]
- Part 3: Bounding parametric approximate rank (PAR). PAR is NP. Reduction from SAT to PAR (sketch half of proof). [youtube]

An earlier talk in the same series.

- Part 0: Preliminary results on piecewise linear path connectivity of the functional equivalence class of reducible parameters. [youtube]

TODO: Upload and link a couple of other related talks.

## Colophon

Typeset with LaTeX.

Fonts include Computer Modern, BOONDOX fraktur, and Ralph Smith
Formal Script
(`\usepackage[frak=boondox,scr=rsfs]{mathalfa}`

).

Most figures drawn with TikZ, and some with pgfplots.

## Errata

Please contact me if you find any errors.

Known errors:

- Page 38 (pdf page 43). After the list, “so w is reducible” should read “so w is non-minimal”. Oops!
- Page 42 (pdf page 47). The final paragraph of the proof of theorem
3.39 is incorrect. The subparameters to which I apply Sussmann’s result
are not necessarily irreducible, they may satisfy reducibility condition
(i) (this happens when the outgoing weight vectors of some unit have
some zero components). However, the proof can be recovered since no
units can have an outgoing weight vector with
*all*units zero, and the required transformation can be constructed by piecing together those from the different subnetworks. A corrected proof is provided in a forthcoming conference paper (preprint on arXiv). - Page 90 (pdf page 94). In step (3) of the list, “approximately zero incoming weight” should read “approximately zero outgoing weight”. Oops!
- Page 91 (pdf page 95). Algorithm 6.10, line 10. The less-than-or-equal comparison operation should be a greater-than operation. Oops!
- Reference to Amari
*et al.,*(2017). That should actually be 2018. There are probably some other typos in the reference list, as many reference details were completed by hand.