far.in.net


~What is complexity?

Oesser taught me that the ideal of beauty is simplicity…—Goethe
Beauty is in the eye of the beholder.—English proverb

What is complexity? Etymologically, the prefix “com-” traces back to Latin, meaning “with, together,” and “plex” traces back to PIE, meaning “to plait,” as in, “to entwine.” Complex entities are those that are formed by entwining multiple elements together.

A sensible idea for quantitatively formalising this notion of complexity and measuring the complexity of an entity would be to view the entity as a number of elements entwined together, and then simply count those elements.

The counting part is usually straight-forward. However, it may not be obvious how to view an arbitrary entity as formed from components. Which elementary components should we allow, and how should we allow them to be combined into complex entities?

The approach taken in the field of algorithmic information theory is to use a model of computation (usually a particular Turing machine) as the basis for answering these questions. The resulting complexity measure is due to Solomonoff and was later independently described by Kolmogorov. It has become known as Kolmogorov complexity, and is derived as follows.

  1. First, we fix a particular universal Turing machine (UTM).

  2. Second, encode the entity for which we want to measure complexity as a finite string.

  3. Third, find the shortest input string that, when given to the fixed UTM, produces the string encoding of the entity as output.

  4. The length (number of symbols) of this shortest input string is the Kolmogorov complexity of the entity.

This “computational” approach to defining complexity neatly resolves the challenge of selecting elements and combinations. The input symbols of our UTM constitute a very simple set of elementary components. The individual computational operations of a Turing machine are likewise simple and small in number. Yet, assuming the Church–Turing thesis holds, these elements in combination are expressive enough to capture any natural mechanical procedure, and therefore any entity we are likely to encounter in the universe.

The task of designing an encoding for each entity in step (2) superficially resembles the challenge we started with, of dividing the entity into elementary components. However, the difference is, we don’t propose to measure the length of the encoded string, but instead we measure the length of the input string that generates the encoded string through the UTM. This means that all we have to do is ensure that all of the relevant information about the entity has been included in the encoding. In particular, it’s alright if we include some redundancy, in that the complexity will only be affected by a small amount because the input can contain the information that is core to the entity and also describe the nature of the redundancy at relatively minor additional cost.

A more serious challenge is that finding the shortest program in step (3) is in general uncomputable. This fact is a consequence of the universality of Turing machines as a model of computation. One response is to search over a subset of programs, or over all programs but for a non-universal model of computation. If one formulates the problem right, it becomes possible to compute the length of the shortest program from the restricted search space. Of course, the result can only be an upper bound on the length of the shortest program from the full space—by compromising on universality, we allow once more the possibility that we miss some possible ways of decomposing the object whose complexity is to be measured.

A final challenge, possibly a fatal one, is that even if we could compute the Kolmogorov complexity for UTMs, we would be left with the choice of which UTM to use as the basis for our complexity measure. The famous invariance theorem says that the complexity of a string is insensitive to this choice up to an additive constant. However, this constant that depends crucially on the compared UTMs: given any finite string, one can devise a UTM that makes this string’s Kolmogorov complexity any natural number including 0. The same is true for any finite set of strings and any corresponding set of complexities (subject only to the limitation that there can be at most 2k2^k strings of complexity kNk \in \mathbb{N} for any UTM). This all being the case, on what basis can we claim that any number is the ‘true’ Kolmogorov complexity of any string?

The standard response to this challenge is to cry foul, and claim that the constructed UTMs to which I have alluded are “unnatural,” pathological machines designed specifically to distort complexity by injecting specialised hardware into the UTMs that contains the elements of the strings whose complexities are to be artificially distorted, so that they can be produced in response to certain trigger codes for example. It is claimed that the ‘true’ complexity of the string is hidden inside the states of the UTM MxM_x, providing a ‘shortcut’ to generating this output. If one is interested in defining the complexity of things in the real world—so the argument goes—one should just should use a UTM that is obviously free of any such “shortcuts.” Several extremely small UTM designs have been proposed based on simulating tag systems (a simple Turing-equivalent model of computation). These UTMs can’t possibly be accused of containing shortcuts to any string of appreciable complexity, because they only use a handful of symbols and states, having been ruthlessly optimised for universality alone.

Of course, these machines have been optimised only relative to Turing’s intuitive notion of the elements of computation. To say that states and symbols of a UTM are what is to be counted is to make a meaningful choice about how to view entities as formed from elementary components and about how we should allow elementary components to be combined into complex entities. The pathology of the above constructions is judged only from a posteriori these choices. There has been offered as yet no a priori judgement against these constructions.

All choices of UTM involve making an infinity of judgements about which strings are simpler and which are more complex. To choose any UTM is to take a meaningful stand in a particular location of the network of all computable trees of strings I have elsewhere called the Turing web. From any such location, some strings appear near and others far. I note that tag systems, on which many of the “minimal” UTMs are based, offer a perspective somewhat removed from Turing’s familiar construction. There are countless other models of computation, each emphasising one or another subset of the possible strings.

If what we really want to do is define the complexity of objects we encounter in the real world, we might just need to make choices. We could start by looking around us at the kinds of elements and computations that nature affords us.