Schwartz distributions
Sunday, August 17th, 2025
This is a technical note intended as an elementary introduction to Schwartz distributions and their calculus. Schwartz distributions are a concept from functional analysis used in singular learning theory to define the density of states, a key step in the bridge between algebraic geometry and Bayesian statistics.
I found it pretty difficult to follow Watanabe’s introduction to Schwartz distributions in section 4.1 of Algebraic Geometry and Statistical Learning Theory. The main problems I encountered were that the book is a speed-run through definitions that includes little motivation or hints as to the interpretation of those definitions, and uses notational conventions that emphasise concision (at the expense of clarity for beginners) and connections to deeper mathematical results (that were not familiar to someone like me and do not appear to be relevant for understanding the remainder of the book). I therefore spent a long time reconstructing the motivation to make sense of the definitions, and developing a careful understanding of the notational rules, so as to understand how they can be safely broken.
After all this, I think it’s possible to understand Schwartz distributions having only a fairly modest background in differential and integral calculus and similar topics from elsewhere in the early parts of an undergraduate mathematics curriculum (e.g., linearity of a function, compactness of a set). This note is my attempt at such an introduction. I focus on the special case of generalising single-variable, real functions, in thorough detail, for simplicity (the same understanding should transfer to Schwartz distributions with multivariate inputs and complex outputs, as used in the book).
Contents:
- Basic idea
- Test functions: generalised inputs
- Schwartz distributions: generalised functions
- Regular Schwartz distributions
- Non-regular Schwartz distributions and Dirac’s delta distribution
- Distributional derivatives
- Distributional integrals
- Calculus on families of distributions
- Conclusion
Basic idea
Differential and integral calculus offer powerful techniques for analysing the properties of smooth functions. However, there are certain limits to what they can handle—we can’t differentiate a function at a discontinuity, nor define a function that represents a point mass.
Schwartz distributions are a generalisation of functions allowing differentiation and integration. They include analogues of many familiar functions from differential and integral calculus, but also some new functions previously out of the reach of differentiation and integration, and even some things that don’t correspond to functions at all.
Perhaps the most familiar way of thinking about functions is that they are objects that map elements of one set (the domain, we’ll assume ) to elements of another (the co-domain, we’ll assume again). To define a function, it suffices to specify which output corresponds to each possible input. Likewise, to define derivatives, we analyse how the mapping changes for small changes in the input.
The theory of Schwartz distributions departs from this approach to defining functions. Instead of defining a function by their response to each individual input point, we define a generalised function by its response to a generalised notion of input points, namely compact sets of points weighted by smooth functions called test functions.
We’ll see that regular functions can be recovered by defining their response to each test function as a weighted combination of the responses of the original function to each of the individual inputs in the compact set. Since this “average response” changes smoothly as we vary the test function (even if the original function has a discontinuity or a point mass), we’ll be able to differentiate and anti-differentiate these generalised objects.
Test functions: generalised inputs
Thus, before defining Schwartz distributions, we need to define their input space. Formally, a test function is any function satisfying the following two properties.
Smoothness: is smooth, that is, it has an infinite number of continuous derivatives.
Compact support: is compact, that is (for us), closed and bounded.
The set of all smooth, compactly-supported test functions is a subspace of the vector space of functions. Denote this space by .
A illustrative example of a non-trivial test function is the unit bump function, This is a well-known example of a smooth function and has compact support . It’s enough to keep this example, plus shifted and scaled versions of it, in mind for the remainder of our discussion. The bump function is also a natural example of the idea of generalising an individual input (such as the origin) to a broader, weighted compact set of inputs.
Beyond that, conditions (1) and (2) are quite restrictive as to what qualifies a test function. In particular, non-zero analytic functions don’t qualify, as they have unbounded support. We will see that these strong restrictions actually turn in our favour when it comes to defining Schwartz distributions and their derivatives and integrals in terms of test functions.
The last thing to discuss about the space of test functions before we can define Schwartz distributions is our test function topology. Formally, given a sequence of test functions and a target test function , we say that converges to (as test functions), denoted , if the following two conditions are met.
The combined support of all , , is compact.
For all indices , converges uniformly to , that is, where .
This is another strong definition, and again, its strength will turn in our favour when we define Schwartz distributions as continuous on the space of test functions only for this strong definition of convergence.
Schwartz distributions: generalised functions
Finally, we turn to defining Schwartz distributions. As mentioned, the approach will be to define these generalised functions by their response to each test function.
Without further ado, a Schwartz distribution from to is a functional that maps each test function to some number , satisfying the following two conditions.
The functional is linear, that is, for all and all ,
The functional is continuous with respect to the test function topology, that is, if then, in the usual topology on , we require
The set of all such Schwartz distributions is denoted . It is a subspace of the vector space of functionals from to . When we talk about convergence of Schwartz distributions, we usually refer to pointwise convergence (not uniform convergence, nor the stronger kind of convergence we defined for test functions). In the resulting topology, the space of Schwartz distributions is complete.
Regular Schwartz distributions
Notice that a Schwartz distribution is not a function that takes individual points as inputs. Instead, it’s a functional that takes a test function as input. However, there is an analogy between Schwartz distributions and our familiar functions (or at least sufficiently integrable functions), as follows.
Let be a function. Say that is locally integrable if, for any compact set , the (Lebesgue) integral is defined and finite.
Given a locally integrable function , let’s define a functional that responds to each test function with the average response of the original function to each input, weighted by the test function. That is, put This integral is always defined because we assumed is locally integrable and the each test function is continuous with compact support. This functional satisfies the properties of a Schwartz distribution. Therefore, we have furnished a Schwartz distribution for each locally integrable function. We can think of this Schwartz distribution as analogous to the original function. (Two functions give rise to the same Schwartz distribution if and only if they are equal almost everywhere.)
In fact, it is common to abuse notation by denoting the Schwartz distribution by the more familiar (or even where is understood as a free variable).
A Schwartz distribution that can be constructed in this way for some locally integrable function is called a regular Schwartz distribution. Not all Schwartz distributions can be constructed in this way (see the next section). However, regular Schwartz distributions are dense in the space of all Schwartz distributions.
Note that all continuous functions are locally integrable, but there are also many non-continuous functions that are locally integrable. An example is the (Heaviside) step function with . The corresponding distribution is the step distribution, , given by Under the theory of Schwartz distributions, we’ll be able to extend differentiation to non-continuous functions like the step function.
Non-regular Schwartz distributions and Dirac’s delta distribution
We don’t have to define Schwartz distributions by starting from a locally integrable function. We can define them directly in terms of their response to each test functions.
For example, define a functional that responds to each test function by evaluating the test function at the origin. That is, for all , put This functional also satisfies the properties of Schwartz distributions. Therefore, . This distribution is called (Dirac’s) delta distribution.
However, there exists no function such that for all . Therefore, is not a regular Schwartz distribution.
Nevertheless, it is common to abuse notation and denote by the non-function object that would satisfy the above equation. To further obscure things, we sometimes notationally conflate the carefully-defined formal object we have called with the non-function object (or even where is understood as a free variable).
Distributional derivatives
So far, we have defined a new class of function-like objects, the space of Schwartz distributions . Next, we want to define differential calculus in this space in a way that agrees with our familiar calculus for regular distributions arising from differentiable functions, but also extends naturally to all distributions.
Given a distribution , define the distributional derivative to be the Schwartz distribution such that for , Here, is a test function representing the derivative of the original test function —recall that test functions are infinitely differentiable, and note that their derivatives have compact support since outside of , the derivative of the test function is zero.
Why this definition? Intuitively, recall that distributions encode functions by their average output weighted by each test function. Ordinary derivatives measure increases in function outputs as we increase individual inputs. The natural extension is to measure increases in the average output as we shift the entire test function in the positive direction. This corresponds to decreasing the weight of each point in proportion to the derivative of the test function. The difference in the average will be the aggregate change from each point, hence an average weighted by the negative derivative of the test function.
Formally, we can show that this definition of differentiation lines up with the usual definition for regular distributions arising from differentiable functions. Given locally integrable and differentiable, let be its derivative. Then we have for all , In other words, , as desired.
Moreover, notice that the definition of the distributional derivative makes no mention of any differentiable function . It only relies on the differentiability of the test functions. So, we can apply this definition even for distributions arising from non-differentiable functions, or distributions without any corresponding functions at all.
For an example, recall the (non-differentiable) step function , and its corresponding distribution . Let’s compute the distributional derivative . For , Interestingly, we recover the distributional definition of the (non-regular) delta distribution, . Intuitively, this is quite a fitting derivative for , a function which changes not at all away from zero, and then changes infinitely rapidly at its discontinuity.
Distributional integrals
We now turn to indefinite integration, or anti-differentiation, of distributions. The definition of the distributional indefinite integral is more involved than that of the distributional derivative, so first we need to define some machinery.
First, like in ordinary calculus, note that the anti-derivative of a distribution will be unique only up to the addition of some constant of integration, defined as a Schwartz distribution with distributional derivative . When , such a is a regular distribution corresponding to a constant function (though this does not make it a constant functional).
Second, since we want our anti-differentiation operation to be inverse to distributional differentiation, and (recall) differentiating a distribution involved taking the derivatives of input test functions, we’re going to need to transform input test functions to their anti-derivatives. Unfortunately, not all test functions have an anti-derivative that is a test function (test functions always have anti-derivatives and these are always smooth, but there may not be one with compact support). Fortunately, a unique anti-derivatives test function exists for any test function that integrate to zero. Moreover, we can transform any test function into one that satisfies this condition using the linear map where is a fixed reference test function with (e.g., a normalised bump function). Thus, for each test function , we can define a unique anti-derivative test function such that For known to be a derivative of test function , we have and thus .
Finally, we can define indefinite integration of Schwartz distributions. Given a distribution , a constant of integration , and a reference test function , define the distributional anti-derivative of to be the Schwartz distribution such that for ,
We can verify that this operation is reversed by differentiation, that is, . For ,
As an example, take the one-dimensional delta distribution, with . Since we previously showed that where is the distribution corresponding to the step function, we expect to find that for some constant of integration with . Indeed: where . We can then show is a constant of integration: where we used that for any test function .
Calculus on families of distributions
In Algebraic Geometry and Statistical Learning Theory, Watanabe actually defines a different, more general kind of calculus in the space of distributions—taking derivatives and integrals along any parameterised family of Schwartz distributions. Our last step is to understand these more general definitions of differentiation and integration and how they relate to the previous definitions.
Formally, let be a parameterised family of Schwartz distributions. We assume that the family is such that the necessary derivatives and integrals cited in the below definitions exist. We can then define two new parameterised families of Schwartz distributions as follows.
Define the derivative family as a new parameterised family of Schwartz distributions such that for and , Here, the operator on the LHS is the distributional derivative we are defining, and the one on the RHS is the usual derivative from calculus.
Define the indefinite integral family as a new parameterised family of Schwartz distributions such that for and , where is a free constant of integration (any distribution that does not depend on ). Here, the operator on the LHS is the distributional indefinite integration operator we are defining, and the operator in the middle is the usual indefinite integral from calculus. For clarity, we also offer the expression on the RHS to emphasise that the middle expression is still parameterised by .
These operations generalise the axis-based definitions from the previous sections. Given a single Schwartz distribution , define a parameterised family of Schwartz distributions such that for , where with is used to shift each test function in the positive direction by . Then, we have We then have and . A similar connection holds for the integral definitions.
As an extended example, first define two families of distributions, generalising the step distribution and the delta distribution.
Given , define a shifted step distribution such that, for , This is the distribution that corresponds to the shifted step function given by .
Given , define a shifted delta distribution such that, for , Continuing the notational conventions from earlier, we might denote these distributions by the pseudo-functions , and even write the identity .
Then, for , we have Conversely, we have where such that is a constant of integration in .
The negative sign was introduced in both relationships due to the way we defined the parameterised families.
Conclusion
The theory of Schwartz distributions is an elegant generalisation of the idea of calculus for smooth functions. I’m glad I put in the time to work through these examples. I found it an especially interesting challenge to develop the theory of differentiation and integration for Schwartz distributions (not the parametric versions)—these were not given in the book. I am proud to say I worked out these definitions myself (with some hints from Gemini, though also sometimes it sent me down the wrong track and I had to recover).
If I had more time, I am sure I would appreciate working through a more traditional introduction to the theory of Schwartz distributions. This mathoverflow thread has some recommendations, including the original article by Schwartz (in French). Alas, I have to make it through the rest of singular learning theory’s long list of prerequisites first.