~Expectiles are simple to compute

Wednesday, June 11^th, 2025

Expectiles are a class of summary statistics generalising the well-known expected value. They have been relatively neglected since their introduction. Perhaps this is because the expectiles of a sample lack a well-known, simple and efficient calculation procedure?

This note derives a simple and efficient algorithm for computing the expectiles of a discrete distribution, or “sample expectiles” of a finite sample from an arbitrary distribution.

§Defining expectiles

Expectiles generalise the expected value by introducing an asymmetry parameter $\tau \in (0, 1)$ , representing a degree of optimism, or, how much to weight better-than-expected data compared to worse-than-expected data.

Formally, given a scalar random variable $X$ with finite second moment, the $\tau$ -expectile of $X$ , $\epsilon_{X}{(\tau)}$ , is the minimiser of an asymmetric version of the expected squared distance, weighting squared positive distances by $\tau$ and squared negative distances by $1-\tau$ : $\epsilon_{X}{(\tau)} = \mathop{\mathrm{arg\,min}}_ \epsilon \mathbb{E}\left[ [\hspace{-1.5pt}[X > \epsilon]\hspace{-1.5pt}]^{\tau}_ {1-\tau} \cdot (\epsilon-X)^2 \right].$ Here, $[\hspace{-1.5pt}[P]\hspace{-1.5pt}]^a_b$ is a generalised Iverson bracket, evaluating to $a$ if $P$ is a true proposition, or to $b$ otherwise.

For a discrete random variable $X$ taking on a finite set of values $x_1, \ldots, x_N$ with respective probabilities $p_1, \ldots, p_N$ , this definition simplifies to $\epsilon_{X}{(\tau)} = \mathop{\mathrm{arg\,min}}_ \epsilon \sum_{i=1}^N p_i \cdot [\hspace{-1.5pt}[x_i > \epsilon]\hspace{-1.5pt}]^{\tau}_ {1-\tau} \cdot (\epsilon-x_i)^2.$

Similarly, if we take a finite sample of size $N$ , $\vec{x} = (x_1, \ldots x_N)$ , where $x_i \sim X$ , then we can define the sample expectile $\epsilon_{\vec x}{(\tau)}$ as $\epsilon_{\vec x}{(\tau)} = \mathop{\mathrm{arg\,min}}_ \epsilon \frac1N \sum_{i=1}^N [\hspace{-1.5pt}[x_i > \epsilon]\hspace{-1.5pt}]^{\tau}_ {1-\tau} \cdot (\epsilon-x_i)^2.$

In the rest of this note, I derive a simple algorithm for computing sample expectiles. This method can easily be extended to computing the expectiles of an arbitrary discrete distribution with finite support.

§The sample expectile objective

Computing the sample $\tau$ -expectile given a sample $\vec x$ involves finding the value of $\epsilon$ that minimises the (rescaled) sample expectile objective $S_{\vec x}(\epsilon) = \frac12 \sum_{i=1}^N [\hspace{-1.5pt}[x_i > \epsilon]\hspace{-1.5pt}]^{\tau}_ {1-\tau} \cdot (\epsilon-x_i)^2.$

The following method is based on three observations:

As a sum of piecewise quadratic functions, the objective is piecewise quadratic. Therefore, its gradient is piecewise linear.
As a sum of functions that are continuously differentiable with respect to $\epsilon$ , the objective is continuously differentiable with respect to $\epsilon$ .¹ That is, its gradient is a continuous function.
As a sum of strictly convex functions, the objective is strictly convex. Therefore, its gradient is strictly increasing.

In summary, $S'_ {\vec x}(\epsilon)$ is piecewise linear, continuous, and monotonically increasing in $\epsilon$ .

§Solving for the minimum

The above implies we can find the minimum of $S_{\vec x}(\epsilon)$ by finding the root of $S'_ {\vec x}(\epsilon)$ , and we can find this root in turn by checking each linear “piece” of $S'_ {\vec x}(\epsilon)$ to find the one that crosses zero.

Let’s start by finding those linear pieces. Differentiating the objective with respect to $\epsilon$ yields $\begin{align*} S'_ {\vec x}(\epsilon) &= \sum_{i=1}^{N} [\hspace{-1.5pt}[x_i > \epsilon]\hspace{-1.5pt}]^{\tau}_ {1-\tau} \cdot (\epsilon - x_i) \\ & = (1-\tau) \sum_{i : x_i \leq \epsilon} (\epsilon - x_i) +\tau \sum_{i : x_i > \epsilon} (\epsilon - x_i) \\ & = \left( (1-\tau) \sum_{i : x_i \leq \epsilon} 1 + \tau \sum_{i : x_i > \epsilon} 1 \right) \cdot \epsilon - \left( (1-\tau) \sum_{i : x_i \leq \epsilon} x_i + \tau \sum_{i : x_i > \epsilon} x_i \right) \\ & = A_{\vec x}(\epsilon) \cdot \epsilon - B_{\vec x}(\epsilon) \end{align*}$ where $\begin{align*} A_{\vec x}(\epsilon) &= (1-\tau) \sum_{i : x_i \leq \epsilon} 1 + \tau \sum_{i : x_i > \epsilon} 1, \\ B_{\vec x}(\epsilon) &= (1-\tau) \sum_{i : x_i \leq \epsilon} x_i + \tau \sum_{i : x_i > \epsilon} x_i. \end{align*}$

Observe that, while $A_{\vec x}(\epsilon)$ and $B_{\vec x}(\epsilon)$ depend on $\epsilon$ , the dependence arises only through comparison with the $x_i$ in the summation bounds. Therefore, these functions are piecewise constant with boundaries at each distinct $x_i$ .

This leads to the following efficient algorithm for finding the root of $S'_ {\vec x}(\epsilon)$ :

Sort $\vec x$ . Hence, assume that $x_1 \leq \ldots \leq x_N$ .
For $i = 1, \ldots, N$ , compute $A_i = A_{\vec x}(x_i)$ and $B_i = B_{\vec x}(x_i)$ . All values can be computed in linear time by sharing partial sums $F_i = \sum_{j : x_j \leq x_i} 1$ and $M_i = \sum_{j : x_j \leq x_i} x_j$ .
For $i = 1, \ldots, N$ , compute $S'_ i = S'_ {\vec x}(x_i) = A_i x_i - B_i$ . Note that the $i$ ^th linear segment of $S'_ {\vec x}(\epsilon)$ runs from $(x_i, S'_ i)$ up to $(x_{i+1}, S'_ {i+1})$ .
Check each $i$ to identify one such that $S'_ i \leq 0 < S'_ {i+1}$ . This linear segment crosses zero at $\epsilon_\star = B_i / A_i$ .

The resulting $\epsilon_\star$ satisfies $S'_ {\vec x}(\epsilon_\star) = 0$ . Since $S'_ {\vec x}(\epsilon)$ is strictly monotonically increasing this implies that this stationary point is a global minimum of the objective and therefore $\epsilon_{\vec x}{(\tau)} = \epsilon_\star.$

§Python/NumPy implementation

Here is a simplified NumPy implementation of the above algorithm.

import numpy as np


def expectile(sample: np.ndarray, tau: float) -> float:
    # step 1: pre-sort the sample
    x = np.sort(sample)
    
    # step 2: compute linear segment coefficients
    # i. precompute partial sums at key points
    N = x.size
    F = np.arange(N) + 1
    M = np.cumsum(x)
    
    # ii. compute the coefficients from these partial sums
    A = ((1 - tau) * F + tau * (F[-1] - F)) # slopes
    B = ((1 - tau) * M + tau * (M[-1] - M)) # offsets

    # step 3: compute starting point of each segment
    G = A * x - B
    
    # step 4: find the segment crossing zero and its root
    # i. find the segment with G_i <= 0 and G_i+1 > 0
    start_below_0 = (G <= 0)[:-1]
    stop_above_0  = (G > 0)[1:]
    i = np.nonzero(start_below_0 & stop_above_0)[0][0]
    
    # ii. interpolate to get the root
    eps = B[i]/A[i]

    # done!
    return eps

This implementation is meant for pedagogical purposes. A version that vectorises the tau input and is more careful about corner cases and potential numerical issues is available in this repository.

§Algorithmic efficiency

An advantage of the above method is that it doesn’t involve resorting to numerical methods for minimising the expectile objective. It’s not quite a closed-form solution as it still involves sorting the data and searching iteratively for the segment of the gradient that crosses zero.

The above implementation is pretty efficient. However, it could be made even more efficient.²

If we are computing only a single expectile for the given sample, notice that we’re just looking for the segment that crosses zero, and we don’t really care about the order of the other segments. Often this kind of “searching” operation only actually requires partial sorting. In this case, we can adapt a method like quickselect to find the crossing segment in linear time. Efficiently evaluating the coefficients requires some care, but it seems it can be done.
If we are computing many different expectiles from the given sample, pre-sorting makes sense, as the cost can be amortised over the entire batch. This saves us from the complexity of a selection algorithm. However, that’s not all—we can also take advantage of a further opportunity to save time by using binary search rather than linear search to find the crossing segment. This requires a logarithmic number of operations per $\tau$ value.

I derived this algorithm in 2020 during a coursework research project. Back then, I remember being disappointed that the implementation I saw for computing sample expectiles worked by applying generic root finding algorithms to the gradient, without taking advantage of its special structure. I also looked around and found that standard statistics libraries (SciPy, R) didn’t appear to have tools for computing sample expectiles.

Since then, there have been some developments in tools for computing sample expectiles. SciPy added a method for computing sample expectiles, though it too uses a generic root finder. More excitingly, Daouia, Stupfler, and Usseglio-Carleve published “An expectile computation cookbook” (submitted 2023), which discusses methods similar to mine for computing exact expectiles for discrete distributions as well as for other kinds of distributions. I’m glad to see expectiles getting some love!

It’s not obvious that the function is continuous or differentiable, let alone continuously differentiable, given the use of (discontinuous) Iverson brackets. However, these brackets always occur in multiplication with a term of the form $(x-\epsilon)^2$ which is zero and has gradient zero at the discontinuity of the Iverson bracket.↩︎
I have to thank Gemini 2.5 Pro (preview), which pointed out these directions for efficiency improvements when I showed it an earlier draft of this post that conjectured prematurely that the above method was optimal.↩︎

§Defining expectiles

§The sample expectile objective

§Solving for the minimum

§Python/NumPy implementation

§Algorithmic efficiency

§Related work