Let X and Y be random variables with on some probability space. In fact, let us suppose that.
By the Doob–Dynkin Lemma (see for example, Oksendal, 2010, Lemma 2.1.2)
for some Borel function g : if Y takes values in.
If Y takes countably many distinct values, then g can be described in more detail. The atoms of σ(Y), the σ-algebra generated by Y, are the disjoint sets
This means that any element of σ(Y) is either ∅ or a countable union of atoms. The function g is characterized as follows: g is constant on each atom, and
for all A ∈ σ(Y). In fact (D.1) is equivalent to the result holding for atoms A. This leads to
where. If P, then
But if, then (D.2) becomes
as. This means that gk can be given any finite value. If we set, then
So we can say then when, then.
We could also note that
and we can set on sets when and obtain a random variable equal to the conditional expected value with probability 1. As many authors use this convention, we shall also adopt it.
Now let W be another random variable which takes countably many distinct values. If, then Then
with probability 1. This is because
with probability 1, since
We could apply these ideas with, and and. If, then
with probability 1. In the application above, if
We could also discuss what is meant by P(A|B) when P(B) = 0. It should be the value of
on the set B. Here.With our convention, we would set.