Probability Theory and Random variables
Published:
This Blog post briefly introduces some concepts in set theory and measure theory that are needed to define probability. It also talks about measurable transformations and random variables.
Set
Set is a collection of some elements and space is the collection of all elements under consideration. For example, \(A_1=\{1\}, A_2=\{1,5,10,12\}\) are all sets contained in the space of natural numbers, \(\mathbb{N}=\{0,1,…\}\). A point set or atomic set is a set containing a single element such as \(\{1\}\) above. The entire space itself is always a valid set, as is the empty set or null set, \(\emptyset\), which contains no elements at all. Sets are often defined implicitly via an inclusion criterion. These sets are denoted with the set builder notation. For example,
\[\mathbb{R}^{+} = \{x \in \mathbb{R}| x > 0\}\]Algebra of sets
Let \(X\) be a set, then \(F\) is an algebra over \(X\),i.e. subset of power set of \(X\), if it is closed under complements and under unions (hence intersections) of pairs of elements of \(F\).i.e.
- \(\emptyset \in F\) (Includes null set)
- if \(A \in F\), then \(A^{c} \in F\) (closed under complement)
- if \(A \in F, B \in F\), then \(A \cup B \in F\) (closed under finite unions)
\(\sigma\) - algebra
A sigma algebra \(F\) is a set of subsets of \(X\), such that
- \(\emptyset \in F\) (Includes null set)
- if \(A \in F\), then \(A^{c} \in F\) (closed under complement)
- if \(A_1, A_2, ...A_n,... \in F\), then \(\cup_{i=1}^{\infty}A_i \in F\) (closed under countable unions)
\(\sigma\)-algebras are a subset of algebras in the sense that all \(\sigma\)-algebras are algebras, but not vice versa. Algebras only require that they be closed under pairwise unions while \(\sigma\)-algebras must be closed under countably infinite unions. \(\sigma\)-algebras can be defined over the real line as well as over abstract sets.
Topological space
There are many definitions of topological space based on open sets, closed sets, neighborhood etc. Here we give the definition w.r.to open sets.
Let \(X\) be a non-empty set, then a set \(\tau\) of subsets of \(X\) is said to be a topology if
- \(\emptyset, X \in \tau\) (Includes null set and the set itself)
- \(\tau\) is closed under arbitrary unions (finite or infinite)
- \(\tau\) is closed under finite intersections.
The ordered pair \((X, \tau)\) is called topological space. The members of \(\tau\) are called open-sets. For a given set \(X\), there can be many topologies.
Example: Let \(X = \{1,2,3\}\), then
- \(\tau = \{\{\emptyset\}, \{1,2,3\}\}\) is a trivial topology on \(X\).
- \(\tau = \{\{\emptyset\}, \{1\}, \{1,2,3\}\}\) is another topology.
Borel \(\sigma\) - algebra
Borel \(\sigma\)-algebra or Borel field on a topological space \((X, \tau)\) is a \(\sigma\)-algebra generated by a by a collection of subsets of \(X\) whose elements are finite open intervals on Real numbers. This is a special case of \(\sigma\)-algebra. The Borel algebra on \(X\) is the smallest \(\sigma\)-algebra containing all open sets (or, equivalently, all closed sets). The elements of Borel field are called Borel sets.
How to construct Borel field? Take all possible open intervals.Take their compliments. Take arbitrary unions. Include \(\emptyset\) and \(\mathbb{R}\). \(B_R\) contains a wide range of intervals including open, closed, and half-open intervals. It also contains disjoint intervals such as \({(2, 7] \cup (19, 32)}\). It contains (nearly) every possible collection of intervals that are imagined.
Measurable Space
A pair \((X, \Sigma)\) is a measurable space if \(X\) is a set and \(\Sigma\) is a \(\sigma\)-algebra of subsets of \(X\). Measurable space allows us to define a function that assigns real numbered values to the abstract elements of \(\sigma\).
Measure (\(\mu\))
Let \((X, \Sigma)\) be a measurable space. A set function \(\mu\) defined on \(\Sigma\) is called a measure iff it has the following properties.
- \(0 \leq \mu(A) \leq \infty; \forall A \in \Sigma\) (measure is a non-negative real number)
- \(\mu(\emptyset) = 0\) (measure of empty set is zero)
- For disjoint sets \(A_1, A_2...\in \Sigma\), \(\mu(\cup_{n=1}^{\infty}A_n) = \sum_{n=1}^{\infty} A_n\)
A measure on a set, \(S\), is a systematic way to assign a positive number to each suitable subset of that set, intuitively interpreted as its size.
Examples of measures
- Counting measure: \(\mu(S) =\) no of elements in \(S\).i.e cardinality in case of discrete sets and \(\infty\) in case of interval sets.
- Lebesgue measure: Conventional length of \(S\), i.e. if \(S = [a, b]\), then \(\mu(S) = b-a\). For discrete sets, lebesgue measure is zero.
A triplet \((X, \Sigma, \mu)\) is called a measure space if \((X, \Sigma)\) is a measurable space and \(\mu: \Sigma \rightarrow [0, \infty)\) is a measure.
Properties of Measure
- Monotonicity: if \(A \subset B\), then \(\mu(A) \leq \mu(B)\)
- Subadditivity: if \(A_1, A_2...\in \sigma\), then \(\mu(\cup_{i} A_i) \leq \sum_{i} \mu(A_i)\)
Probability Space
Let
- \(\Omega\) - be the set of all possible outcomes of a random experiment. This is called as ‘Sample space’.
- \(\sigma\) - be the \(\sigma\)-algebra over \(\Omega\). Elements of \(\Sigma\) are called events.
- \(P\) - is the probability measure defined on the space \((\Omega, \Sigma)\) with the following properties
- \(P(A) \geq 0;\forall A \in \sigma\) (non-negative quantity)
- \(P(\Omega) = 1\) (probability assigned to the sample space is 1)
- \(P(A_1 \cup A_2 \cup...A_n) = P(A_1)+P(A_2+...+P(A_n)\) for disjoint sets in \(\sigma\), i.e. \(A_i \cap A_j = \emptyset\) for \(i \neq j\) These are called Kolgromov’s axioms of probability.
Then the triplet \((\Omega, \Sigma, P)\) is called a probability space. The construction of \(\Sigma\) avoids some pathological subsets, called non-measurable sets. Non-measurable sets are those for which measure is not properly not defined, i.e. elements of the set can be rearranged in such a way that measure of the set changes. By restricting ourselves to the \(\sigma\)-algebra, we are making sure that events are assigned a specific, defined measure (probability in this case). Also by making the \(\sigma\)-algebra closed under unions and complements, we are making sure that the resulting events also have a definite measure. An important benefit of these closure properties is that they ensure that any non-constructible sets that may be lurking within the power set don’t persist into a \(\sigma\)-algebra. Consequently identifying any \(\sigma\)-algebra removes many of the pathological behaviors that arise in uncountably large spaces.
Measurable functions/transformations
Once we have defined a probability distribution on a space \(\Omega\), and a well-behaved collection of subsets \(\Sigma\), we can then consider how the probability distribution transforms when \(\Omega\) transforms. In particular, let \(f: \Omega \rightarrow Y\) be a transformation from \(\Omega\) to another space \(Y\). Can this transformation also transform our probability distribution on \(\Omega\) onto a probability distribution on \(Y\), and if so under what conditions?
Let \(T\) be the \(\sigma\)-algebra on \(Y\). In order for \(f\) to induce a probability distribution on \(Y\) we need the two \(\sigma\)-algebras to be compatible in some sense. In particular we need every subset \(B \in T\) to correspond to a unique subset \(f^{-1}(B) \in \Sigma\). If this holds for all subsets in \(T\) then we say that the transformation \(f\) is measurable and we can define a measure on \(Y\), which is induced by \(f\) as
\[Q(B) = P(f^{-1}(B)) = P(\{\omega \in \Omega; f(\omega) \in B\})\]According to wikipidea, the definition of mesurable function is as follows: Let \((X, \Sigma), (Y, T)\) be measurable spaces meaning that \(X, Y\) are equipped with respective \(\sigma\)-algebras \(\Sigma, T\). A functions \(f:X \rightarrow Y\) is said to be measurable, if for every \(E \in T\), there is a pre-image of \(E\) under \(f\) in \(\Sigma\), i.e. \(\forall E \in T\)
\[f^{-1}(E) := \{x \in X| f(x) \in E\} \in \Sigma\]If \(f\) is measurable from \((\Omega, \Sigma)\) to \((Y, T)\) then \(f^{-1}(T)\) is a sub \(\sigma\)-field of \(\Sigma\). It is called \(\sigma\)-field generated by \(f\) and denoted as \(\sigma(f)\).
Random Variables
Probability measure is a set function, i.e it defines the probability for the events the \(\sigma\)-algebra. It’d be easy for us to work with if we map everything onto the line,i.e. the sample space. A random variable is a convenient way to express the elements of \(\Omega\) as numbers rather than abstract elements of sets.
A random variable is a measurable function from the probability space \((\Omega, \Sigma, P)\) to other probability space on real line which is \((\mathcal{X}, B_{\mathcal{X}}, P_{\mathcal{X}})\) where \(\mathcal{X}\) is the range of \(X\) in \(\mathbb{R}\), \(B_{\mathcal{X}}\) is the Borel field of \(\mathcal{X}\) and \(P_{\mathcal{X}}\) is the probability measure on \(\mathcal{X}\) induced by \(X\). The induced measure on \(\mathcal{X}\) which is \(P_{\mathcal{X}} = P \circ X^{-1}\) is called the distribution of \(X\). Specifically, Cumulative distribution function is defined as follows
\[ \begin{aligned} F(x) &= P_{\mathcal{X}}(X \leq x) \newline &= P_{\mathcal{X}}(X \in (-\infty, x]) \newline &= P(\{\omega; X(\omega) \in (-\infty, x]\}) \newline &= P(\{\omega; -\infty \leq X(\omega) \leq x\}) \end{aligned} \]
\((\Omega, \Sigma, P)\) and \((\mathcal{X}, B_{\mathcal{X}}, P_{\mathcal{X}})\) are really just two different manifestations, or parameterizations, of the same abstract probability system. The two parameterizations, for example, might correspond to different choices of coordinate system, different choices of units etc.
Not all the times we are interested in just calculating the probabilities, some times the random variables defined on the original probability space turns out to be quite useful. Consider the following statistical experiment. Go to the road outside the college building and consider the first car that goes left to right after your arrival. As we do not know/cannot predict which car in the city might be there it is a statistical experiment. The sample space is the set of all cars in your city (or in your country). Now consider the following questions
- How many people are in that car?
- What is the amount of petrol in the fuel tank at that time?
- How many kilometers the car has travelled that day before you noticed?
All of these are random variables on the same sample space. Answer to question 1 might be useful to a person who sells eatables on the roadside? (more passengers means more business). Answer to question 2 might help decide if it would be profitable to open a petrol-selling shop.
Various functions defined on random variables like expectation, variance etc shed further light on the understanding of these variables.
References
Even though the blogpost is short, it took time for me to understand certain concepts and put together these things. The following sources have been very helpful in understanding few concepts.
- Probability Theory: Introduction
- Probability Theory (For Scientists and Engineers)
- Why do we need sigma-algebras to define probability spaces?
- Why do we need random variables?
- What makes the elements of sigma algebra measurable (and measurable w.r.to which measure)?
- Does the sigma algebra over the real line contain the singleton sets?