far.in.net


Schwartz distributions

Sunday, August 17th, 2025

This is a technical note intended as an elementary introduction to Schwartz distributions and their calculus. Schwartz distributions are a concept from functional analysis used in singular learning theory to define the density of states, a key step in the bridge between algebraic geometry and Bayesian statistics.

I found it pretty difficult to follow Watanabe’s introduction to Schwartz distributions in section 4.1 of Algebraic Geometry and Statistical Learning Theory. The main problems I encountered were that the book is a speed-run through definitions that includes little motivation or hints as to the interpretation of those definitions, and uses notational conventions that emphasise concision (at the expense of clarity for beginners) and connections to deeper mathematical results (that were not familiar to someone like me and do not appear to be relevant for understanding the remainder of the book). I therefore spent a long time reconstructing the motivation to make sense of the definitions, and developing a careful understanding of the notational rules, so as to understand how they can be safely broken.

After all this, I think it’s possible to understand Schwartz distributions having only a fairly modest background in differential and integral calculus and similar topics from elsewhere in the early parts of an undergraduate mathematics curriculum (e.g., linearity of a function, compactness of a set). This note is my attempt at such an introduction. I focus on the special case of generalising single-variable, real functions, in thorough detail, for simplicity (the same understanding should transfer to Schwartz distributions with multivariate inputs and complex outputs, as used in the book).

Contents:

Basic idea

Differential and integral calculus offer powerful techniques for analysing the properties of smooth functions. However, there are certain limits to what they can handle—we can’t differentiate a function at a discontinuity, nor define a function that represents a point mass.

Schwartz distributions are a generalisation of functions allowing differentiation and integration. They include analogues of many familiar functions from differential and integral calculus, but also some new functions previously out of the reach of differentiation and integration, and even some things that don’t correspond to functions at all.

Perhaps the most familiar way of thinking about functions is that they are objects that map elements of one set (the domain, we’ll assume R\mathbb{R}) to elements of another (the co-domain, we’ll assume R\mathbb{R} again). To define a function, it suffices to specify which output corresponds to each possible input. Likewise, to define derivatives, we analyse how the mapping changes for small changes in the input.

The theory of Schwartz distributions departs from this approach to defining functions. Instead of defining a function by their response to each individual input point, we define a generalised function by its response to a generalised notion of input points, namely compact sets of points weighted by smooth functions called test functions.

We’ll see that regular functions can be recovered by defining their response to each test function as a weighted combination of the responses of the original function to each of the individual inputs in the compact set. Since this “average response” changes smoothly as we vary the test function (even if the original function has a discontinuity or a point mass), we’ll be able to differentiate and anti-differentiate these generalised objects.

Test functions: generalised inputs

Thus, before defining Schwartz distributions, we need to define their input space. Formally, a test function is any function φ:RR\varphi : \mathbb{R}\to \mathbb{R} satisfying the following two properties.

  1. Smoothness: φ\varphi is smooth, that is, it has an infinite number of continuous derivatives.

  2. Compact support: suppφ={xR:φ(x)0}\operatorname{supp}{\varphi} = \overline{\{\,x \in \mathbb{R}\,:\,\varphi(x) \neq 0\,\}} is compact, that is (for us), closed and bounded.

The set of all smooth, compactly-supported test functions is a subspace of the vector space of functions. Denote this space by C0\mathcal{C}^\infty_0.

A illustrative example of a non-trivial test function is the unit bump function, φ(x)=[ ⁣ ⁣[x<1] ⁣ ⁣]exp ⁣(11x2). \varphi(x) = \left[\!\!\left[|x| < 1\right]\!\!\right] \exp\!\left(-\frac{1}{1-|x|^2}\right). This is a well-known example of a smooth function and has compact support [1,1][-1,1]. It’s enough to keep this example, plus shifted and scaled versions of it, in mind for the remainder of our discussion. The bump function is also a natural example of the idea of generalising an individual input (such as the origin) to a broader, weighted compact set of inputs.

Beyond that, conditions (1) and (2) are quite restrictive as to what qualifies a test function. In particular, non-zero analytic functions don’t qualify, as they have unbounded support. We will see that these strong restrictions actually turn in our favour when it comes to defining Schwartz distributions and their derivatives and integrals in terms of test functions.

The last thing to discuss about the space of test functions before we can define Schwartz distributions is our test function topology. Formally, given a sequence of test functions φ1,φ2,C0\varphi_1, \varphi_2, \ldots \in \mathcal{C}^\infty_0 and a target test function φC0\varphi \in \mathcal{C}^\infty_0, we say that φk\varphi_k converges to φ\varphi (as test functions), denoted φkTφ\varphi_k \xrightarrow{\text{T}}\varphi, if the following two conditions are met.

  1. The combined support of all φk\varphi_k, k=1suppφk\bigcup_{k=1}^\infty \operatorname{supp}{\varphi_k}, is compact.

  2. For all indices nNn \in \mathbb{N}, dndxnφk\frac{\mathrm{d}{}^n}{\mathrm{d}x^n} \varphi_k converges uniformly to dndxnφ\frac{\mathrm{d}{}^n}{\mathrm{d}x^n} \varphi, that is, limkmaxxKdndxnφk(x)dndxnφ(x)=0, \lim_{k \to \infty} \max_{x \in K} \left| \frac{\mathrm{d}{}^n}{\mathrm{d}x^n} \varphi_k(x) - \frac{\mathrm{d}{}^n}{\mathrm{d}x^n} \varphi(x) \right| = 0, where K=suppφk=1suppφkK = \operatorname{supp}{\varphi} \cup \bigcup_{k=1}^\infty \operatorname{supp}{\varphi_k}.

This is another strong definition, and again, its strength will turn in our favour when we define Schwartz distributions as continuous on the space of test functions only for this strong definition of convergence.

Schwartz distributions: generalised functions

Finally, we turn to defining Schwartz distributions. As mentioned, the approach will be to define these generalised functions by their response to each test function.

Without further ado, a Schwartz distribution from R\mathbb{R} to R\mathbb{R} is a functional T:C0RT : \mathcal{C}^\infty_0\to \mathbb{R} that maps each test function φC0\varphi \in \mathcal{C}^\infty_0 to some number T[φ]RT[\varphi] \in \mathbb{R}, satisfying the following two conditions.

  1. The functional is linear, that is, for all a,bRa, b \in \mathbb{R} and all φ,ψC0\varphi, \psi \in \mathcal{C}^\infty_0, T[aφ+bψ]=aT[φ]+bT[ψ]. T[a\varphi + b\psi] = a T[\varphi] + b T[\psi].

  2. The functional is continuous with respect to the test function topology, that is, if φ1,φ2,Tφ,\varphi_1, \varphi_2, \ldots \xrightarrow{\text{T}}\varphi, then, in the usual topology on R\mathbb{R}, we require T[φ1],T[φ2],T[φ].T[\varphi_1], T[\varphi_2], \ldots \to T[\varphi].

The set of all such Schwartz distributions is denoted D\mathcal{D}. It is a subspace of the vector space of functionals from C0\mathcal{C}^\infty_0 to R\mathbb{R}. When we talk about convergence of Schwartz distributions, we usually refer to pointwise convergence (not uniform convergence, nor the stronger kind of convergence we defined for test functions). In the resulting topology, the space of Schwartz distributions is complete.

Regular Schwartz distributions

Notice that a Schwartz distribution is not a function that takes individual points xRx \in \mathbb{R} as inputs. Instead, it’s a functional that takes a test function as input. However, there is an analogy between Schwartz distributions and our familiar functions (or at least sufficiently integrable functions), as follows.

Let f:RRf : \mathbb{R}\to \mathbb{R} be a function. Say that ff is locally integrable if, for any compact set KRK \subset \mathbb{R}, the (Lebesgue) integral Kf(x)dx\int_K f(x) dx is defined and finite.

Given a locally integrable function f:RRf : \mathbb{R}\to \mathbb{R}, let’s define a functional Tf:C0RT_f : \mathcal{C}^\infty_0\to \mathbb{R} that responds to each test function with the average response of the original function ff to each input, weighted by the test function. That is, put Tf[φ]=Rf(x)φ(x)dx.T_f[\varphi] = \int_{\mathbb{R}} f(x) \varphi(x) dx. This integral is always defined because we assumed ff is locally integrable and the each test function φ\varphi is continuous with compact support. This functional TfT_f satisfies the properties of a Schwartz distribution. Therefore, we have furnished a Schwartz distribution for each locally integrable function. We can think of this Schwartz distribution as analogous to the original function. (Two functions give rise to the same Schwartz distribution if and only if they are equal almost everywhere.)

In fact, it is common to abuse notation by denoting the Schwartz distribution TfT_f by the more familiar ff (or even f(x)f(x) where xx is understood as a free variable).

A Schwartz distribution that can be constructed in this way for some locally integrable function is called a regular Schwartz distribution. Not all Schwartz distributions can be constructed in this way (see the next section). However, regular Schwartz distributions are dense in the space of all Schwartz distributions.

Note that all continuous functions are locally integrable, but there are also many non-continuous functions that are locally integrable. An example is the (Heaviside) step function θ:RR\theta : \mathbb{R}\to \mathbb{R} with θ(x)=[ ⁣[x>0] ⁣]\theta(x) = [\![x > 0]\!]. The corresponding distribution is the step distribution, Tθ:C0RT_\theta : \mathcal{C}^\infty_0\to \mathbb{R}, given by Tθ[φ]=Rθ(x)φ(x)dx=0φ(x)dx. T_\theta[\varphi] = \int_{\mathbb{R}} \theta(x) \varphi(x) dx = \int_{0}^{\infty} \varphi(x) dx. Under the theory of Schwartz distributions, we’ll be able to extend differentiation to non-continuous functions like the step function.

Non-regular Schwartz distributions and Dirac’s delta distribution

We don’t have to define Schwartz distributions by starting from a locally integrable function. We can define them directly in terms of their response to each test functions.

For example, define a functional Tδ:C0RT_\delta : \mathcal{C}^\infty_0\to \mathbb{R} that responds to each test function by evaluating the test function at the origin. That is, for all φC0\varphi \in \mathcal{C}^\infty_0, put Tδ[φ]=φ(0).T_\delta[\varphi] = \varphi(0). This functional TδT_\delta also satisfies the properties of Schwartz distributions. Therefore, TδDT_\delta \in \mathcal{D}. This distribution is called (Dirac’s) delta distribution.

However, there exists no function δ:RR\delta : \mathbb{R}\to \mathbb{R} such that Rδ(x)φ(x)dx=φ(0) \int_{\mathbb{R}} \delta(x) \varphi(x) dx = \varphi(0) for all φC0\varphi \in \mathcal{C}^\infty_0. Therefore, TδT_\delta is not a regular Schwartz distribution.

Nevertheless, it is common to abuse notation and denote by δ\delta the non-function object that would satisfy the above equation. To further obscure things, we sometimes notationally conflate the carefully-defined formal object we have called TδT_\delta with the non-function object δ\delta (or even δ(x)\delta(x) where xx is understood as a free variable).

Distributional derivatives

So far, we have defined a new class of function-like objects, the space of Schwartz distributions D\mathcal{D}. Next, we want to define differential calculus in this space in a way that agrees with our familiar calculus for regular distributions arising from differentiable functions, but also extends naturally to all distributions.

Given a distribution T:C0RT : \mathcal{C}^\infty_0\to \mathbb{R}, define the distributional derivative to be the Schwartz distribution dTdx:C0R\frac{\mathrm{d}T}{\mathrm{d}x} : \mathcal{C}^\infty_0\to \mathbb{R} such that for φC0\varphi \in \mathcal{C}^\infty_0, dTdx[φ]=T ⁣[φ]. \frac{\mathrm{d}T}{\mathrm{d}x}[\varphi] = T\!\left[ -\varphi' \right]. Here, φC0\varphi' \in \mathcal{C}^\infty_0 is a test function representing the derivative of the original test function φ\varphi—recall that test functions are infinitely differentiable, and note that their derivatives have compact support since outside of suppφ\operatorname{supp}\varphi, the derivative of the test function is zero.

Why this definition? Intuitively, recall that distributions encode functions by their average output weighted by each test function. Ordinary derivatives measure increases in function outputs as we increase individual inputs. The natural extension is to measure increases in the average output as we shift the entire test function in the positive direction. This corresponds to decreasing the weight of each point in proportion to the derivative of the test function. The difference in the average will be the aggregate change from each point, hence an average weighted by the negative derivative of the test function.

Formally, we can show that this definition of differentiation lines up with the usual definition for regular distributions arising from differentiable functions. Given f:RRf : \mathbb{R}\to \mathbb{R} locally integrable and differentiable, let f:RRf' : \mathbb{R}\to \mathbb{R} be its derivative. Then we have for all φC0\varphi \in \mathcal{C}^\infty_0, Tf[φ]=Rf(x)φ(x)dx=[f(x)φ(x)]Rf(x)φ(x)dx(integration by parts)=Rf(x)(φ)(x)dx(φ has compact support)=Tf[φ]=dTfdx[φ].\begin{align*} T_{f'}[\varphi] &= \int_{\mathbb{R}} f'(x) \varphi(x) \,\mathrm d{x} \\&= \Big[ f(x) \varphi(x) \Big]_{-\infty}^{\infty} - \int_{\mathbb{R}} f(x) \varphi'(x) \,\mathrm d{x} &\text{(integration by parts)} \\&= \int_{\mathbb{R}} f(x) (-\varphi')(x) \,\mathrm d{x} &\text{($\varphi$ has compact support)} \\&= T_{f}[-\varphi'] = \frac{\mathrm{d}T_f}{\mathrm{d}x}[\varphi] . \end{align*} In other words, dTfdx=Tf\frac{\mathrm{d}T_f}{\mathrm{d}x} = T_{f'}, as desired.

Moreover, notice that the definition of the distributional derivative makes no mention of any differentiable function ff. It only relies on the differentiability of the test functions. So, we can apply this definition even for distributions arising from non-differentiable functions, or distributions without any corresponding functions at all.

For an example, recall the (non-differentiable) step function θ(x)=[ ⁣[x>0] ⁣]\theta(x) = [\![x>0]\!], and its corresponding distribution Tθ[φ]=0φ(x)dxT_\theta[\varphi] = \int_{0}^{\infty} \varphi(x) \,\mathrm d{x}. Let’s compute the distributional derivative dTθdx\frac{\mathrm{d}T_\theta}{\mathrm{d}x}. For φC0\varphi \in \mathcal{C}^\infty_0, dTθdx[φ]=Tθ ⁣[φ]=0(φ)(x)dx=[φ(x)]0=φ(0)=Tδ[φ]. \frac{\mathrm{d}T_\theta}{\mathrm{d}x} [\varphi] = T_\theta\!\left[ -\varphi' \right] = \int_{0}^{\infty} (-\varphi')(x) \,\mathrm d{x} = \Big[ {-}\varphi(x) \Big] _ {0} ^ {\infty} = \varphi(0) = T_\delta[\varphi] . Interestingly, we recover the distributional definition of the (non-regular) delta distribution, TδT_\delta. Intuitively, this is quite a fitting derivative for TθT_\theta, a function which changes not at all away from zero, and then changes infinitely rapidly at its discontinuity.

Distributional integrals

We now turn to indefinite integration, or anti-differentiation, of distributions. The definition of the distributional indefinite integral is more involved than that of the distributional derivative, so first we need to define some machinery.

First, like in ordinary calculus, note that the anti-derivative of a distribution will be unique only up to the addition of some constant of integration, defined as a Schwartz distribution CDC \in \mathcal{D} with distributional derivative dCdx=0\frac{\mathrm{d}C}{\mathrm{d}x} = 0. When d=1d=1, such a CC is a regular distribution corresponding to a constant function (though this does not make it a constant functional).

Second, since we want our anti-differentiation operation to be inverse to distributional differentiation, and (recall) differentiating a distribution involved taking the derivatives of input test functions, we’re going to need to transform input test functions to their anti-derivatives. Unfortunately, not all test functions have an anti-derivative that is a test function (test functions always have anti-derivatives and these are always smooth, but there may not be one with compact support). Fortunately, a unique anti-derivatives test function exists for any test function that integrate to zero. Moreover, we can transform any test function into one that satisfies this condition using the linear map φφ(φ(x)dx)ψ \varphi \mapsto \varphi - \left(\int_{-\infty}^{\infty} \varphi(x) \,\mathrm d{x}\right) \psi where ψC0\psi \in \mathcal{C}^\infty_0 is a fixed reference test function with ψ(x)dx=1\int_{-\infty}^{\infty} \psi(x) \,\mathrm d{x} = 1 (e.g., a normalised bump function). Thus, for each test function φC0\varphi \in \mathcal{C}^\infty_0, we can define a unique anti-derivative test function Φψ[φ]C0\Phi_\psi[\varphi] \in \mathcal{C}^\infty_0 such that ddx(Φψ[φ])=φ(φ(x)dx)ψ. \frac{\mathrm{d}}{\mathrm{d}x}(\Phi_\psi[\varphi]) = \varphi - \left(\int_{-\infty}^{\infty} \varphi(x) \,\mathrm d{x}\right) \psi. For φ\varphi' known to be a derivative of test function φ\varphi, we have φ(x)dx=0\int_{-\infty}^{\infty} \varphi'(x) \,\mathrm d{x} = 0 and thus Φψ[φ]=φ\Phi_{\psi}[\varphi'] = \varphi.

Finally, we can define indefinite integration of Schwartz distributions. Given a distribution T:C0RT : \mathcal{C}^\infty_0\to \mathbb{R}, a constant of integration C:C0RC : \mathcal{C}^\infty_0\to \mathbb{R}, and a reference test function ψC0\psi \in \mathcal{C}^\infty_0, define the distributional anti-derivative of TT to be the Schwartz distribution Tdx:C0R\int T \,\mathrm d{x} : \mathcal{C}^\infty_0\to \mathbb{R} such that for φC0\varphi \in \mathcal{C}^\infty_0, (Tdx) ⁣[φ]=T[Φψ[φ]]+C[φ]. \left(\int T \,\mathrm d{x}\right)\![\varphi] = T[-\Phi_\psi[\varphi]] + C[\varphi].

We can verify that this operation is reversed by differentiation, that is, ddxTdx=T\frac{\mathrm{d}}{\mathrm{d}x} \int T \,\mathrm d{x} = T. For φC0\varphi \in \mathcal{C}^\infty_0, (ddxTdx) ⁣[φ]=(Tdx) ⁣[φ]=T[Φψ[φ]]+C[φ]=T[Φψ[φ]]+dCdx[φ](Φψ is linear)=T[φ]+0.(φ is a derivative)\begin{align*} \left(\frac{\mathrm{d}}{\mathrm{d}x} \int T \,\mathrm d{x}\right)\![\varphi] &= \left(\int T \,\mathrm d{x} \right)\![ -\varphi'] \\&= T[-\Phi_\psi[- \varphi']] + C[ -\varphi'] \\&= T[\Phi_\psi[\varphi']] + \frac{\mathrm{d}C}{\mathrm{d}x}[\varphi] &\text{($\Phi_\psi$ is linear)} \\&= T[\varphi] + 0. &\text{($\varphi'$ is a derivative)} \end{align*}

As an example, take the one-dimensional delta distribution, Tδ:C0RT_\delta : \mathcal{C}^\infty_0\to \mathbb{R} with Tδ[φ]=φ(0)T_\delta[\varphi] = \varphi(0). Since we previously showed that Tδ=ddxTθT_\delta = \frac{\mathrm{d}}{\mathrm{d}x} T_\theta where TθT_\theta is the distribution corresponding to the step function, we expect to find that Tδdx=Tθ+D\int T_\delta \,\mathrm d{x} = T_\theta + D for some constant of integration DD with dDdx=0\frac{\mathrm{d}D}{\mathrm{d}x} = 0. Indeed: (Tδdx) ⁣[φ]=Tδ[Φψ[φ]]+C[φ]=Φψ[φ](0)+C[φ]=0(φ(x)(φ(z)dz)ψ(x))dx+C[φ]=(φ(x)dx0φ(x)dx)(10ψ(x)dx)(φ(x)dx)+C[φ]=0φ(x)dx(0ψ(x)dx)(φ(x)dx)+C[φ].=Tθ[φ]+D[φ]\begin{align*} \left(\int T_\delta \,\mathrm d{x}\right)\![\varphi] &= T_\delta[-\Phi_\psi[\varphi]] + C[\varphi] \\&= -\Phi_\psi[\varphi](0) + C[\varphi] \\&= -\int_{-\infty}^{0} \left( \varphi(x) - \left(\int_{-\infty}^{\infty}\varphi(z)\,\mathrm d{z}\right)\psi(x) \right) \,\mathrm d{x} + C[\varphi] \\&= \left( \int_{-\infty}^{\infty}\varphi(x)\,\mathrm d{x} - \int_{-\infty}^{0} \varphi(x) \,\mathrm d{x} \right) \\&\qquad - \left(1-\int_{-\infty}^{0} \psi(x) \,\mathrm d{x}\right) \cdot \left(\int_{-\infty}^{\infty}\varphi(x)\,\mathrm d{x}\right) + C[\varphi] \\&= \int_{0}^{\infty}\varphi(x)\,\mathrm d{x} - \left(\int_{0}^{\infty} \psi(x) \,\mathrm d{x}\right) \cdot \left(\int_{-\infty}^{\infty}\varphi(x)\,\mathrm d{x}\right) + C[\varphi]. \\&= T_\theta[\varphi] + D[\varphi] \end{align*} where D[φ]=C[φ]0ψ(x)dxφ(x)dxD[\varphi] = C[\varphi] - \int_{0}^{\infty} \psi(x) \,\mathrm d{x} \cdot \int_{-\infty}^{\infty}\varphi(x)\,\mathrm d{x}. We can then show DD is a constant of integration: dDdx[φ]=dCdx[φ]0ψ(x)dx(φ(x))dx=00ψ(x)dx0=0 \frac{\mathrm{d}D}{\mathrm{d}x}[\varphi] = \frac{\mathrm{d}C}{\mathrm{d}x}[\varphi] - \int_{0}^{\infty} \psi(x) \,\mathrm d{x} \cdot \int_{-\infty}^{\infty}(-\varphi'(x))\,\mathrm d{x} = 0 - \int_{0}^{\infty} \psi(x) \,\mathrm d{x} \cdot 0 = 0 where we used that φ(x)dx=0\int_{-\infty}^{\infty} \varphi'(x) \,\mathrm d{x} = 0 for any test function φC0\varphi \in \mathcal{C}^\infty_0.

Calculus on families of distributions

In Algebraic Geometry and Statistical Learning Theory, Watanabe actually defines a different, more general kind of calculus in the space of distributions—taking derivatives and integrals along any parameterised family of Schwartz distributions. Our last step is to understand these more general definitions of differentiation and integration and how they relate to the previous definitions.

Formally, let {Tt}tR\{\,T_t\,\}_ {t\in\mathbb{R}} be a parameterised family of Schwartz distributions. We assume that the family is such that the necessary derivatives and integrals cited in the below definitions exist. We can then define two new parameterised families of Schwartz distributions as follows.

  1. Define the derivative family {ddtTt}tR\{\,\frac{\mathrm{d}}{\mathrm{d}t} T_t\,\}_ {t\in\mathbb{R}} as a new parameterised family of Schwartz distributions such that for tRt\in\mathbb{R} and φC0\varphi\in\mathcal{C}^\infty_0, (ddtTt) ⁣[φ]=ddt(Tt[φ]). \left( \frac{\mathrm{d}}{\mathrm{d}t} T_t \right)\![\varphi] = \frac{\mathrm{d}}{\mathrm{d}t} \Big(T_t[\varphi] \Big). Here, the operator on the LHS is the distributional derivative we are defining, and the one on the RHS is the usual derivative from calculus.

  2. Define the indefinite integral family {Ttdt}tR\{\,\int T_t \,\mathrm d{t}\,\}_ {t\in\mathbb{R}} as a new parameterised family of Schwartz distributions such that for tRt\in\mathbb{R} and φC0\varphi\in\mathcal{C}^\infty_0, (Ttdt) ⁣[φ]=Tt[φ]dt=tTτ[φ]dτ+C[φ] \left( \int T_t \,\mathrm d{t} \right)\![\varphi] = \int T_t[\varphi] \,\mathrm d{t} = \int_{-\infty}^{t} T_\tau[\varphi] \,\mathrm d{\tau} + C[\varphi] where CC is a free constant of integration (any distribution that does not depend on tt). Here, the operator on the LHS is the distributional indefinite integration operator we are defining, and the operator in the middle is the usual indefinite integral from calculus. For clarity, we also offer the expression on the RHS to emphasise that the middle expression is still parameterised by tRt \in \mathbb{R}.

These operations generalise the axis-based definitions from the previous sections. Given a single Schwartz distribution TT, define a parameterised family of Schwartz distributions {Tt}tR\{\,T_t\,\}_ {t\in\mathbb{R}} such that for φC0\varphi \in \mathcal{C}^\infty_0, Tt[φ]=T[φht]T_t[\varphi] = T[\varphi \circ h_t] where ht:RRh_t : \mathbb{R}\to \mathbb{R} with ht(x)=xth_t(x) = x-t is used to shift each test function in the positive direction by tt. Then, we have (ddtTt) ⁣[φ]=ddtTt[φ]=ddtT[φht]=T ⁣[ddt(φht)](by linearity and continuity of T)=T[φht].(chain rule)\begin{align*} \left(\frac{\mathrm{d}}{\mathrm{d}t} T_t\right)\! [\varphi] &= \frac{\mathrm{d}}{\mathrm{d}t} T_t[\varphi] = \frac{\mathrm{d}}{\mathrm{d}t} T[\varphi \circ h_t] \\&= T\!\left[\frac{\mathrm{d}}{\mathrm{d}t} (\varphi \circ h_t)\right] &\text{(by linearity and continuity of $T$)} \\&= T[-\varphi' \circ h_t]. &\text{(chain rule)} \end{align*} We then have T0=TT_0 = T and ddtTtt=0=ddxT\frac{\mathrm{d}}{\mathrm{d}t} T_t |_ {t=0} = \frac{\mathrm{d}}{\mathrm{d}x} T. A similar connection holds for the integral definitions.

As an extended example, first define two families of distributions, generalising the step distribution and the delta distribution.

Then, for φC0\varphi \in \mathcal{C}^\infty_0, we have (ddtTθ,t) ⁣[φ]=ddt(Tθ,t[φ])=ddt(tφ(x)dx)=φ(t)=Tδ,t[φ].\begin{align*} \left(\frac{\mathrm{d}}{\mathrm{d}t} T_{\theta,t}\right)\![\varphi] &= \frac{\mathrm{d}}{\mathrm{d}t} \left(T_{\theta,t}[\varphi]\right) \\ &= \frac{\mathrm{d}}{\mathrm{d}t} \left(\int_t^{\infty} \varphi(x)\,\mathrm d{x} \right) \\ &= -\varphi(t) \\ &= -T_{\delta,t}[\varphi]. \end{align*} Conversely, we have (Tδ,tdt) ⁣[φ]=t(Tδ,τ[φ])dτ+C[φ]=tφ(τ)dτ+C[φ]=tφ(x)dx+φ(x)dx+C[φ]=Tθ,t[φ]+C[φ]\begin{align*} \left(\int T_{\delta,t} \,\mathrm d{t}\right)\![\varphi] &= \int_{-\infty}^{t} \left(T_{\delta,\tau}[\varphi]\right) \,\mathrm d{\tau} + C[\varphi] \\ &= \int_{-\infty}^{t} \varphi(\tau) \,\mathrm d{\tau} + C[\varphi] \\ &= - \int_{t}^{\infty} \varphi(x)\,\mathrm d{x} + \int_{-\infty}^{\infty} \varphi(x)\,\mathrm d{x} + C[\varphi] \\ &= -T_{\theta,t}[\varphi] + C'[\varphi] \end{align*} where C[φ]=φ(x)dx+C[φ]C'[\varphi] = \int_{-\infty}^{\infty} \varphi(x)\,\mathrm d{x} + C[\varphi] such that CC' is a constant of integration in tt.

The negative sign was introduced in both relationships due to the way we defined the parameterised families.

Conclusion

The theory of Schwartz distributions is an elegant generalisation of the idea of calculus for smooth functions. I’m glad I put in the time to work through these examples. I found it an especially interesting challenge to develop the theory of differentiation and integration for Schwartz distributions (not the parametric versions)—these were not given in the book. I am proud to say I worked out these definitions myself (with some hints from Gemini, though also sometimes it sent me down the wrong track and I had to recover).

If I had more time, I am sure I would appreciate working through a more traditional introduction to the theory of Schwartz distributions. This mathoverflow thread has some recommendations, including the original article by Schwartz (in French). Alas, I have to make it through the rest of singular learning theory’s long list of prerequisites first.