Mathematics&Physics : 2018

Saturday, October 20, 2018

Linear Discriminant Analysis -1

Computing classification means computing class posterior probabilities $Pr(G|X)$ - that is probability that class is G given input X. If $f_k(x)$ is the class conditional density of $X$ in class $Gk$ and if $\pi_k$ is prior probability, then Bayes theorem gives, \[\begin{equation} Pr(G=k|X=x) = \frac{f_k(x)\pi_k}{\sum_{l=1}^K f_l(x)\pi_l} \end{equation}\]
Depending on the models for class density, different techniques emerge.

* linear and quadratic discrimnant analysis use Gaussian densities
* Mixture of Gaussians lead to non-linear decision boundaries.
* Navie Bayes models assume that each class density is product of marginal densities.

Derivations:
Suppose that each class density is modelled as multivariate Gaussian. \[\begin{equation} f_k(x)=\frac{1}{(2\pi)^{p/2} |\Sigma_k|^{1/2}} e^{-\frac{1}{2}(x-\mu_k)^T\Sigma_k^{-1}(x-\mu_k)} \end{equation}\] Clear that \[\begin{equation} Pr(G=k|X=x) = \frac{f_k(x)\pi_k}{\sum_{l=1}^K f_l(x)\pi_l} \\ Pr(G=l|X=x) = \frac{f_l(x)\pi_l}{\sum_{p=1}^K f_p(x)\pi_p} \\ \frac{Pr(G=k|X=x)}{Pr(G=l|X=x)} = \frac{f_k(x)\pi_k}{f_l(x)\pi_l} \\ log\left(\frac{Pr(G=k|X=x)}{Pr(G=l|X=x)} \right) = log\frac{f_k(x)}{f_l(x)}+log\frac{\pi_k(x)}{\pi_l(x)} \end{equation}\] Now expression $\frac{f_k(x)}{f_l(x)}$ can be simplified as follows. Note $\Sigma$ is assumed to be same for all the classes. \[\begin{equation} log\left(\frac{f_k(x)}{f_l(x)}\right) = \left[ -\frac{1}{2}(x-\mu_k)^T\Sigma^{-1}(x-\mu_k)+\frac{1}{2}(x-\mu_l)^T\Sigma^{-1}(x-\mu_l)\right] \end{equation}\] And \[\begin{equation} = -\frac{1}{2} [x^T\Sigma^{-1}x-x^T\Sigma^{-1}\mu_k-\mu_k^T\Sigma^{-1}x+\mu_k^T\Sigma^{-1}\mu_k - x^T\Sigma^{-1}x + x^T\Sigma^{-1}\mu_k + \mu_l \Sigma^{-1} x - \mu_l^T\Sigma^{-1}\mu_l ] \end{equation}\] Next \[\begin{equation} = \frac{1}{2} [-x^T\Sigma^{-1}x + x^T\Sigma^{-1}\mu_k + \mu_k^T\Sigma^{-1}x -\mu_k^T\Sigma^{-1}\mu_k+x^T\Sigma^{-1}x-x^T\Sigma^{-1}\mu_l-\mu_l^T\Sigma^{-1}x + \mu_l^T\Sigma^{-1}\mu_l ] \end{equation}\]
Before simplying, we should note the following:

Since $\Sigma$ is a covariance matrix and since covariance between $i$ and $j$ is same as $j$ and $i$, then $\Sigma^T=\Sigma = \Sigma^{-1}$.
The mean matrices $\mu_k$ for class $k$ and $\mu_l$ for class $l$ are $p \times 1$ matrix.
Terms such as $\mu_k \Sigma^{-1}\mu_l$ have dimensions $1\times p * p \times p * p \times 1=1\times 1$ or scalars aka reals.

Second term and fifth term with leading $x^T$ can be simplified as \[\begin{equation} \frac{1}{2}x^T\Sigma^{-1}(\mu_k-\mu_l) \end{equation}\] Collecting third and seventh terms $x$ at end. \[\begin{equation} \left[\frac{1}{2}(\mu_k^T - \mu_l^T)\Sigma^{-1}x\right]^T = \frac{1}{2}x^T\Sigma^{-1}(\mu_k-\mu_l) \end{equation}\] The remaining terms are \[\begin{equation*} -\frac{1}{2}\mu_k^l\Sigma^{-1}\mu_l+\frac{1}{2}\mu_l^T\Sigma^{-1}\mu_l+\frac{1}{2}\mu_k^T\Sigma^{-1}\mu_l-\frac{1}{2}(\mu_k^T\Sigma^{-1}\mu_l)^T \\ = \frac{1}{2}(\mu_k+\mu_l)^T\Sigma^{-1}(\mu_k-\mu_l) \end{equation*}\] Combining all those simplification, \[\begin{equation*} log\left(\frac{Pr(G=k|X=x)}{Pr(G=l|X=x)}\right) = log\left(\frac{\pi_k}{\pi_l}\right) \\ -\frac{1}{2}(\mu_k+\mu_l)^T\Sigma^{-1}(\mu_k-\mu_l)+x^T\Sigma^{-1}(\mu_k-\mu_l) \end{equation*}\] At the line that divides class $k,l$ has probabilities $Pr(G=k|X=x)=Pr(G=l|X=x)$ \[\begin{equation*} log(1)=0=log(\pi_k)-log(\pi_l) - \frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k+x^T\Sigma^{-1}\mu_k + \frac{1}{2}\mu_l^T\Sigma^{-1}\mu_l-x^T\Sigma^{-1}\mu_l \end{equation*}\] From here, the linear discriminant can be written as \[\begin{equation} \delta_k=log(\pi_k)-\frac{1}{2}\mu_k^T\Sigma^{-1}\mu_k+x^T\Sigma^{-1}\mu_k \end{equation}\]

Thursday, August 9, 2018

Sets of Measure Zero.

Heard the saying - What happens in Las Vegas, stays in Las Vegas? Sets of measure zero are kind of like that.

For example, say $f(x),g(x)$ are functions that are equal to each other in measurable space $E$, except on a subset $N$. Say a given measure on ``disagreeable'' space $N$ is equal to zero. Now $N$ is like Las Vegas and we are given license to forget about what happens in this space $N$ and assert that $f(x)=g(x)$ a.e where a.e stands for almost everwhere. To be more precise (just to prevent extra point from being leaked out of your exam paper!), we need to state $f(x)=g(x)$ a.e[$\mu$]. That is we need to specify which measure.

Mathematically, \begin{equation} \mu\{x:f(x)\neq g(x)\} = 0 \end{equation}

where $x \in N$. The above criteria specifies the points of $N$. Here $f$ ~ $g$ and it is not too difficult to prove that this is an equivalence relation.

Reflexive property: Clearly $f$ ~ $f$. The disagreeable set $N$ in this case is a null set and measure of null set is $0$. $f=f$ on the whole set. Symmetric: Clearly $f$ ~ $g$ also means that set of disagreeable space remains same when we switch $f$ to the right. Reflexive: Say $f$ ~ $g$ and let $N_1$ be the disagreeable set. Say $g$ ~ $h$ and let $N_2$ be the disagreeable set, then $N_1 \cup N_2$ where $f,g,h$ are not equal to each other. Hence $f$ ~ $h$.

Note all the above statements were made based on a property that $f=g$. Generally, we don't have to be specific about a property. Above assertions and proofs hold in a more abstract sense, that is for any property $P$.

Sunday, August 5, 2018

R&C - Lebesgue's Dominated Convergence Theorem

Dominated Conv Theorem

If $f \in \mathcal{L}(\mu)$, then \[\begin{equation} \left|\int_X f d\mu \right| \leq \int_x|f| d\mu \end{equation}\]

Proof uses the previously proven identity -

If $f$ is a complex measurable function on $X$, there is a complex measurable function $\alpha$ on $X$ such that $|\alpha|=1$ and $f=\alpha|f|$. This is an extension of property of complex numbers.

Start off by setting $z=\int_X f d\mu$. Then there is another complex number $\alpha$ such that $|\alpha|=1$ and $\alpha z= |z|$.

Let $u$ be real part of $\alpha f$. Then, $u \leq |\alpha f|=|f|$. Hence,

\[\begin{align*} \left| \int_X f d\mu\right| = |z|=\alpha z=\alpha \int_X f d\mu \\ = \int_X \alpha f d\mu = \int_X u d\mu \leq \int_X |f| d\mu \end{align*}\] Suppose $\{f_n\}$ is a sequence of complex measurable functions on $X$ such that \[\begin{equation} f(x) = lim_{n \rightarrow \infty} f_n(x) \end{equation}\] exists for every $x \in X$. If there is a function $g \in \mathcal{L^1(\mu)}$ such that \[\begin{equation} |f_n(x)| \leq g(x) \text{ } (n=1,2,\cdots|x \in ) \end{equation}\] then $f \in \mathcal{L^1(\mu)}$, \[\begin{equation} lim_{n \rightarrow \infty}\int_X |f_n - f| d\mu = 0 \end{equation}\] and \[\begin{equation} lim_{n \rightarrow \infty} \int_X f_n d\mu = \int_X f d\mu \end{equation}\]

Clear that $|f| \leq g$. Since $f_n$ are measurable, the limit $f$ is measurable, $f \in \mathcal{L^1(\mu)}$. \[\begin{align*} |f-f_n| \leq |f|+|f_n| \leq 2g \end{align*}\] This means, $2g - |f_n - f|$ is a sequence of functions whose range is in $[0,\infty]$. Hence, precondition to satisfy Fatou’s lemma is satisfied. This yield \[\begin{align*} \int_X 2g d\mu \leq lim_{n \rightarrow \infty}inf \int_X(2g-|f_n-f|)d\mu \\ =\int_X 2g d\mu + lim_{n \rightarrow \infty}inf\left( - \int_X |f_n - f| d\mu\right) \\ = \int_X 2g d\mu - lim sup_{n \rightarrow \infty} \int_X |f_n - f| d\mu \end{align*}\] Taking advantage of finiteness of $\int 2g d\mu$, \[\begin{align*} lim sup_{n \rightarrow \infty}\int_X |f_n - f| d\mu \leq 0 \end{align*}\] If sequence of nonnegative real numbers fails to converge to $0$, then its upper limit is positive. Then above equation implies \[\begin{align*} lim _{n \rightarrow \infty}\int_X |f_n - f| d\mu = 0. \end{align*}\] Hence, \[\begin{align*} lim_{n \rightarrow \infty} \int_X f_n d\mu = \int_X f d\mu \end{align*}\]

Mathematics&Physics