Computing classification means computing class posterior probabilities Pr(G|X) - that is probability that class is G given input X. If fk(x) is the class conditional density of X in class Gk and if πk is prior probability, then Bayes theorem gives,
Pr(G=k|X=x)=fk(x)πk∑Kl=1fl(x)πl
Depending on the models for class density, different techniques emerge.
Suppose that each class density is modelled as multivariate Gaussian. fk(x)=1(2π)p/2|Σk|1/2e−12(x−μk)TΣ−1k(x−μk) Clear that Pr(G=k|X=x)=fk(x)πk∑Kl=1fl(x)πlPr(G=l|X=x)=fl(x)πl∑Kp=1fp(x)πpPr(G=k|X=x)Pr(G=l|X=x)=fk(x)πkfl(x)πllog(Pr(G=k|X=x)Pr(G=l|X=x))=logfk(x)fl(x)+logπk(x)πl(x) Now expression fk(x)fl(x) can be simplified as follows. Note Σ is assumed to be same for all the classes. log(fk(x)fl(x))=[−12(x−μk)TΣ−1(x−μk)+12(x−μl)TΣ−1(x−μl)] And =−12[xTΣ−1x−xTΣ−1μk−μTkΣ−1x+μTkΣ−1μk−xTΣ−1x+xTΣ−1μk+μlΣ−1x−μTlΣ−1μl] Next =12[−xTΣ−1x+xTΣ−1μk+μTkΣ−1x−μTkΣ−1μk+xTΣ−1x−xTΣ−1μl−μTlΣ−1x+μTlΣ−1μl]
Before simplying, we should note the following:
Depending on the models for class density, different techniques emerge.
* linear and quadratic discrimnant analysis use Gaussian densities
* Mixture of Gaussians lead to non-linear decision boundaries.
* Navie Bayes models assume that each class density is product of marginal densities.
Derivations:Suppose that each class density is modelled as multivariate Gaussian. fk(x)=1(2π)p/2|Σk|1/2e−12(x−μk)TΣ−1k(x−μk) Clear that Pr(G=k|X=x)=fk(x)πk∑Kl=1fl(x)πlPr(G=l|X=x)=fl(x)πl∑Kp=1fp(x)πpPr(G=k|X=x)Pr(G=l|X=x)=fk(x)πkfl(x)πllog(Pr(G=k|X=x)Pr(G=l|X=x))=logfk(x)fl(x)+logπk(x)πl(x) Now expression fk(x)fl(x) can be simplified as follows. Note Σ is assumed to be same for all the classes. log(fk(x)fl(x))=[−12(x−μk)TΣ−1(x−μk)+12(x−μl)TΣ−1(x−μl)] And =−12[xTΣ−1x−xTΣ−1μk−μTkΣ−1x+μTkΣ−1μk−xTΣ−1x+xTΣ−1μk+μlΣ−1x−μTlΣ−1μl] Next =12[−xTΣ−1x+xTΣ−1μk+μTkΣ−1x−μTkΣ−1μk+xTΣ−1x−xTΣ−1μl−μTlΣ−1x+μTlΣ−1μl]
Before simplying, we should note the following:
- Since Σ is a covariance matrix and since covariance between i and j is same as j and i, then ΣT=Σ=Σ−1.
- The mean matrices μk for class k and μl for class l are p×1 matrix.
- Terms such as μkΣ−1μl have dimensions 1×p∗p×p∗p×1=1×1 or scalars aka reals.