This is a short technical note written in the process of working out
some definitions for a paper about using an approximate minimax regret
training objective to mitigate goal misgeneralisation in advanced deep
reinforcement learning systems. I define three different approximate
relaxations of the minimax objective, and show that the definitions are
related under certain assumptions.
Thanks to my colleagues Karim Abdel Sadek and Michael Dennis for helping me
understand the necessary game theory and decision theory to figure this
out.
Background
This note is a sequel to minimax, three
ways. See that note for additional background. This note references
the results and definitions from the previous note by number. In this
note, I closely follow those definitions and propositions from the
previous note in order to define three approximate relaxations of the
minimax objective and study their relationships.
What do I mean by an approximate relaxation of an objective?1 First let me formalise optimisation.
Suppose we have a function
f:X→R.
To minimise this function is to find an argument
x∈X
that achieves the minimum possible value
f(x)=argminx′∈Xf(x′)
where the RHS is assumed to exist. We define the set of minimisers
x′∈Xargminf(x′)={x∈Xf(x)=x′∈Xargminf(x′)}.
This criterion is very strict, making it difficult to find such
minimisers in practice. It’s slightly more realistic to assume that we
might be able to find some
x∈X
that is within some finite approximation thresholdε≥0
of the minimum possible value. We thus define the set of approximate
minimisersx′∈Xargεminf(x′)={x∈Xf(x)≤x′∈Xargminf(x′)+ε}.
We recover exact minimisation with
ε=0.
Similarly, we define maximisersargmaxx′∈Xf(x′)={x∈X∣f(x)=argmaxx′∈Xf(x′)},
and approximate maximisers given approximation threshold
δ≥0,
x′∈Xargδmaxf(x′)={x∈Xf(x)≥x′∈Xargmaxf(x′)−δ}.
Three approximate
relaxations…
In the previous note, we explored three different definitions of the
minimax objective: minimax(1), minimax(2), and
minimax(3σ).
The idea was that each definition would suggest a different approximate
relaxation. Here I state the relaxed definitions.
As in the previous note, suppose we have a function
r:A×S→R.
It doesn’t matter what the function represents, or what the sets
A
and
S
are, but the notation is chosen to suggest that
a∈A
is an action chosen by a decision maker,
s∈S
is a state of the world, and
r(a,s)
is regret arising from taking action
a
in state
s.
The only assumption I actually make on
r,
A,
and
S
is that all the maxima and minima I refer to exist.
Definition 1 defines minimax actions as those that achieve the
minimum value of a function, and that function is given by the maximum
value given a choice of state that depends on the action. There is no
room to relax the internal maximisation, as we need a concrete value to
minimise over. For now, we just ignore the inner optimisation and we
only approximate the outer one.
Relaxation 1:Let
ε≥0
be an approximation threshold. An action
a∈A
is approximinimax(1) at resolution
ε
if
a∈a′∈Aargεmins′∈Sargmaxr(a′,s′).
We denote the set of approximinimax(1) actions at resolution
ε
by
Aε(1).
Definition 2 defines minimax actions as those that are part of a Nash
equilibrium of a two-player zero-sum game, where the agent minimises the
function and the adversary maximises it. We relax this definition by
replacing the Nash equilibrium condition with an approximate
Nash equilibrium.
Relaxation 2:Let
ε,δ≥0
be approximation thresholds. An action
a∈A
is approximinimax(2) at resolution
(ε,δ)
if there exists
s∈S
such that both
s∈s′∈Sargδmaxr(a,s′)anda∈a′∈Aargεminr(a′,s).
We denote the set of approximinimax(2) actions at resolution
(ε,δ)
by
Aε,δ(2).
This definition clearly allows us to relax the maximum as well as the
minimum, however, at a given resolution, not all games have approximate
Nash equilibria. In such cases, minimax(2) is not well defined. In
contrast, minimax(1) is well-defined for any
r,
A,
S
as long as the maxima and minima exist. Intuitively, we shouldn’t need
there to be an equilibrium for us to be able to talk about minimising
the worst possible response.
Definition 3 is designed to overcome the shortcomings of each of
definitions 1 and 2. We defined minimax actions as minima of a function
involving a concrete max map, a deterministic function actions
to worst-case states. We can relax this definition by relaxing both the
maximisation criterion in the max map and the minimisation criterion in
the definition of minimax actions.
Relaxation 3.1:Let
δ≥0
be an approximation threshold. An approximate max map at
resolution
δ
is any function
σ:A→S
such that for all
a∈A,
σ(a)∈s′∈Sargδmaxr(a,s′).
Relaxation 3.2:Let
ε,δ≥0
be approximation thresholds. Let
σ
be any approximate max map at resolution
δ.
An action
a∈A
is
approximinimax(3σ)
at resolution
ε
if
a∈a′∈Aargεminr(a′,σ(a′)).
We denote the set of
approximinimax(3σ)
actions at resolution
ε
as
Aε(3σ).
… and their relations
Given
ε=δ=0,
we recover definitions 1, 2, and 3 from the previous note, and therefore
the results of propositions 1 through 6. Suppose for simplicity that
A0,0(2)
is non-empty (equivalently, any equilibrium exists). Let
σ
be any approximate max map at resolution 0. Then we have
A0(1)=A0,0(2)=A0(3σ).
It remains to discover what relations hold between these sets in the
case where
ε,δ>0.
In this section, we derive the following relations.
Theorem 1:Let
ε,δ,η≥0
be approximation thresholds. Let
σ:A→S
be any approximate max map at resolution
η.
Let
Δ=a′∈Aargmins′∈Sargmaxr(a′,s′)−s′∈Sargmaxa′∈Aargminr(a′,s′).
Then we have the following relations.
Aε,δ(2)⊆Aε+δ(1).
Aε(1)⊆Aε+Δ,ε+Δ(2).
Aε(1)⊆Aε+η(3σ).
Aε(3σ)⊆Aε+η(1).
Aε(3σ)⊆Aε+η+Δ,ε+η+Δ(2).
Aε,δ(2)⊆Aε+δ+η(3σ).
We prove each part of this theorem after a brief detour to explain
the role of
Δ
in the above statement.
Aside:
Approximate equilibria of zero-sum games
To generalise propositions 2 and 5, we need an appropriate
generalisation of lemma 2. The following lemma shows that if
Aε,δ(2)
is non-empty, then the max–min inequality is approximately an
equality (with approximation threshold
ε+δ).
Note, this assumption is weaker than the assumption of lemma 2, that
A0,0(2)
is non-empty.
Lemma 3.Suppose there exists
(a,s)∈A×S
such that
s∈argδmaxs′∈Sr(a,s′)
and
a∈argεmina′∈Ar(a′,s).
Then
s′∈Sargmaxa′∈Aargminr(a′,s′)≥a′∈Aargmins′∈Sargmaxr(a′,s′)−δ−ε.
Proof.s′∈Sargmaxa′∈Aargminr(a′,s′)≥a′∈Aargminr(a′,s)≥r(a,s)−ε≥s′∈Sargmaxr(a,s′)−δ−ε≥a′∈Aargmins′∈Sargmaxr(a′,s′)−δ−ε.(by definition of max)(by definition of a)(by definition of s)(by definition of min)□
Compare this proof to that of lemma 2 in the previous note. Observe
that we used the same argument, except that instead of invoking the
optimality of
a
and
s,
we invoke the approximate optimality and introduce a compensatory
ε
or
δ.
The rest of the proofs in this note are generalisations of the proofs in
the previous note along very similar lines.
We could proceed the assumption that some set of approximate
equilibria is non-empty. However, we’ll get a more precise bound if we
dig a little deeper into what’s going on here. The proof of lemma 3
holds for every approximate equilibrium
(a,s)
and every
ε
and
δ.
The tightest bound comes from using the approximate equilibrium that is
closest to being an exact equilibrium among all approximate equilibria
that are available. The following lemma shows that this ‘tightest bound’
is as tight as possible, because it captures the difference between
argmaxs′∈Sargmina′∈Ar(a′,s′)
and
argmina′∈Aargmaxs′∈Sr(a′,s′).
Lemma 4:Let
Δ=a′∈Aargmins′∈Sargmaxr(a′,s′)−s′∈Sargmaxa′∈Aargminr(a′,s′).
There exist approximation thresholds
ε,δ≥0,
an action
a∈A,
and a state
s∈S
such that (1)
a∈argεmina′∈Ar(a′,s);
(2)
s∈argδmaxs′∈Sr(a,s′);
and (3)
ε+δ=Δ.
Proof. Let
a∈argmina′∈Aargmaxs′∈Sr(a′,s′)
and
s∈argmaxs′∈Sargmina′∈Ar(a′,s′).
Let
ε=r(a,s)−argmina′∈Ar(a′,s)
and
δ=argmaxs′∈Sr(a,s′)−r(a,s).
We immediately have (1) and (2). For (3), observe
ε+δ=s′∈Sargmaxr(a,s′)−a′∈Aargminr(a′,s)=a′∈Aargmins′∈Sargmaxr(a′,s′)−s′∈Sargmaxa′∈Aargminr(a′,s′)=Δ.(by definition of ε, δ)(by definition of a, s)(by definition of Δ)□
Intuitively, by expressing parts 2 and 5 of theorem 1 in terms of
Δ,
we bypass the need to assume that a particular set of approximinimax(2)
actions is nonempty. We instead connect our results to the pair of
approximation thresholds with the smallest sum such that an approximate
equilibrium exists.
In particular, when
ε,η≪Δ,
we need to relax the approximation thresholds a lot to contain
Aε(1)
or
Aε(3σ),
since otherwise approximinimax(2) will be empty!
Proof of theorem 1 (six
parts)
The proofs for each part of this theorem follow. They are very
similar to the proofs of propositions 1 through 6 from the previous
note. Each proof follows essentially the same structure, except for
introducing
ε
and
δ
terms when we make use of the assumption that an argument is an
approximate optimiser rather than an exact optimiser.
Proof (part 1). Suppose
a∈Aε,δ(2),
that is, there exists
s∈S
such that both
s∈argδmaxs′∈Sr(a,s′)
and
a∈argεmina′∈Ar(a′,s).
We want to show that
a∈Aε+δ(1),
that is,
a∈a′∈Aarg(ε+δ)mins′∈Sargmaxr(a′,s′).
Observe:
s′∈Sargmaxr(a,s′)≤r(a,s)+δ≤a′∈Aargminr(a′,s)+ε+δ≤s′∈Sargmaxa′∈Aargminr(a′,s′)+ε+δ≤a′∈Aargmins′∈Sargmaxr(a′,s′)+ε+δ.(s∈s′∈Sargδmaxr(a,s′))(a∈a′∈Aargεminr(a′,s))(by definition of max)(max–min inequality, lemma 1)□
Proof (part 2). Suppose
a∈Aε(1),
that is,
a∈argεmina′∈Aargmaxs′∈Sr(a′,s′).
We want to show that
a∈Aε+Δ,ε+Δ(2),
that is, there exists
s∈S
such that both
a∈a′∈Aarg(ε+Δ)minr(a′,s)ands∈s′∈Sarg(ε+Δ)maxr(a,s′).
Let’s start by putting
s∈argmaxs′∈Sargmina′∈Ar(a′,s′).
Note that here we are proving the existence of such an
s,
so there is no need to furnish an approximate optimiser—that would just
lead to a weaker result.
Unlike for proposition 2, let’s prove the two conditions one at a
time, so that we can more easily keep track of the approximations
involved. First, the condition on
a:
r(a,s)≤s′∈Sargmaxr(a,s′)≤a′∈Aargmins′∈Sargmaxr(a′,s′)+ε=s′∈Sargmaxa′∈Aargminr(a′,s′)+Δ+ε=a′∈Aargminr(a′,s′)+Δ+ε.(by definition of max)(a∈a′∈Aargεmins′∈Sargmaxr(a′,s′))(by definition of Δ)(s∈s′∈Sargmaxa′∈Aargminr(a′,s′))
Now for the condition on
s,
we reason backwards through the same chain of terms:
r(a,s)≥a′∈Aargminr(a′,s)=s′∈Sargmaxa′∈Aargminr(a′,s′)=a′∈Aargmins′∈Sargmaxr(a′,s′)−Δ≥s′∈Sargmaxr(a,s′)−ε−Δ.(by definition of min)(s∈s′∈Sargmaxa′∈Aargminr(a′,s′))(by definition of Δ)(a∈a′∈Aargmins′∈Sargmaxr(a′,s′))□
For the remaining parts, recall that
σ
is assumed to be an arbitrary approximate max map at resolution
η,
that is, for all
a∈A,
we have
σ(a)∈argδmaxs′∈Sr(a,s′).
Proof (part 3). Suppose
a∈Aε(1),
that is,
a∈argεmina′∈Aargmaxs′∈Sr(a′,s′).
We want to show that
a∈Aε+η(3σ),
that is,
a∈a′∈Aarg(ε+η)minr(a′,σ(a′)).
Observe
r(a,σ(a))≤s′∈Sargmaxr(a,s′)≤a′∈Aargmins′∈Sargmaxr(a′,s′)+ε≤a′∈Aargmin(r(a′,σ(a′))+η)+ε≤a′∈Aargminr(a′,σ(a′))+η+ε.(by definition of max)(a∈a′∈Aargεmins′∈Sargmaxr(a′,s′))(by definition of σ, min)(η constant wrt. a′)□
Proof (part 4). Suppose
a∈Aε(3σ),
that is,
a∈argεmina′∈Ar(a′,σ(a′)).
We want to show that
a∈Aε+η(1),
that is,
a∈a′∈Aarg(ε+η)mins′∈Sargmaxr(a′,s′).
Observe
s′∈Sargmaxr(a,s′)≤r(a,σ(a))+η≤a′∈Aargminr(a′,σ(a′))+ε+η≤a′∈Aargmins′∈Sargmaxr(a′,s′)+ε+η.(by definition of σ)(a∈argεmina′∈Ar(a′,σ(a′)))(by definition of max, min)□
Parts 5 and 6 follow from the above bounds. I hypothesised in the
previous note that by following a direct proof, we might get a tighter
bound. This was not the case. The proofs of propositions 5 and 6 only
eliminated trivial redundancies that didn’t affect the bound. For
completeness, I include direct proofs of parts 5 and 6 below.
Proof (part 5). Suppose
a∈Aε(3σ),
that is,
a∈argεmina′∈Ar(a′,σ(a′)).
We want to show that
a∈Aε+η+Δ,ε+η+Δ(2),
that is, there exists
s∈S
such that both
a∈a′∈Aarg(ε+η+Δ)minr(a′,s).ands∈s′∈Sarg(ε+η+Δ)maxr(a,s′)
Like in part 2, let’s start by putting
s∈argmaxs′∈Sargmina′∈Ar(a′,s′).
Unlike for proposition 5, let’s prove the two conditions one at a time,
so that we can more easily keep track of the approximations involved.
First, the condition on
a:
r(a,s)≤s′∈Sargmaxr(a,s′)≤r(a,σ(a))+η≤a′∈Aargminr(a′,σ(a′))+ε+η≤a′∈Aargmins′∈Sargmaxr(a′,s′)+ε+η=s′∈Sargmaxa′∈Aargminr(a′,s′)+Δ+ε+η=a′∈Aargminr(a′,s)+Δ+ε+η.(by definition of max)(by definition of σ)(a∈a′∈Aargεminr(a′,σ(a′)))(by definition of max, min)(by definition of Δ)(s∈s′∈Sargmaxa′∈Aargminr(a′,s′))
Now for the condition on
s,
we reason backwards through the same chain of terms:
r(a,s)≥a′∈Aargminr(a′,s)=s′∈Sargmaxa′∈Aargminr(a′,s′)=a′∈Aargmins′∈Sargmaxr(a′,s′)−Δ≥a′∈Aargminr(a′,σ(a′))−Δ≥r(a,σ(a))−ε−Δ≥s′∈Sargmaxr(a,s′)−η−ε−Δ.(by definition of min)(s∈s′∈Sargmaxa′∈Aargminr(a′,s′))(by definition of Δ)(by definition of max, min)(a∈a′∈Aargεminr(a′,σ(a′)))(by definition of σ)□
Proof (part 6). Suppose
a∈Aε,δ(2),
that is, there exists
s∈S
such that both
s∈argδmaxs′∈Sr(a,s′)
and
a∈argεmina′∈Ar(a′,s).
We want to show
a∈Aε+δ+η(3σ),
that is,
a∈a′∈Aarg(ε+δ+η)minr(a′,σ(a′)).
Observe
r(a,σ(a))≤s′∈Sargmaxr(a,s′)≤r(a,s)+δ≤a′∈Aargminr(a′,s)+ε+δ≤s′∈Sargmaxa′∈Aargminr(a′,s′)+ε+δ≤a′∈Aargmins′∈Sargmaxr(a′,s′)+ε+δ≤a′∈Aargmin(r(a′,σ(a′))+η)+ε+δ≤a′∈Aargminr(a′,σ(a′))+η+ε+δ.(by definition of max)(s∈argδmaxs′∈Sr(a,s′))(a∈argεmina′∈Ar(a′,s))(by definition of max)(by min–max inequality, lemma 1)(by definition of σ, min)(η constant wrt. a′)□
Conclusion
There you have it—three different approximate relaxations of the
minimax objective, and their approximate equivalence.
These relations are not as strong as in the exact case—this theorem
does not establish equality except in the case
ε=δ=η=0.
However, we do recover an interesting network of relations that
establish that we can convert between relaxed objectives modulo a linear
deterioration in approximation quality. This tells us that the
families of solution sets (for varying approximation
thresholds) are closely related, satisfying something like a
bi-Lipschitz equivalence property. Importantly, in the limit as
ε,δ,η→0,
the approximate solution sets get closer and closer to being actually
equal.
I have not thought very much about whether these relations are
optimally tight. However, I didn’t see any places where the proofs could
be streamlined and fewer terms introduced, without imposing further
assumptions on
r,
A,
and
S.
That’s all for now—it’s time to get back to work using these
definitions to analyse the robustness of minimax regret training
methods!
Edmund Lau points out that there are other ways of
‘relaxing’ the notion of optimisation. For example, instead of taking
the above approach of relaxation through discretisation, we
could pursue a probabilistic relaxation whereby we cast exact
optimisation as sampling from a probability distribution concentrated on
the set of minimisers with density given by
p⋆(x)∝[[x∈x′∈Xargminf(x′)]],
and we relax this procedure by instead sampling from a Boltzmann
distribution with inverse temperatureβ≥0,
with density given by
pβ(x)∝exp(−βf(x)).
This is the so-called softmin operation (if we were maximising,
we’d have the more well-known softmax). We recover exact
optimisation in the limit
β→∞.
I haven’t considered what a probabilistic relaxation of minimax might
look like, but I suppose such a concept has probably been studied in
game theory.↩︎