Computing classification means computing class posterior probabilities Pr(G|X) - that is probability that class is G given input X. If fk(x) is the class conditional density of X in class Gk and if πk is prior probability, then Bayes theorem gives,
Pr(G=k|X=x)=fk(x)πk∑Kl=1fl(x)πl
Depending on the models for class density, different techniques emerge.
Suppose that each class density is modelled as multivariate Gaussian. fk(x)=1(2π)p/2|Σk|1/2e−12(x−μk)TΣ−1k(x−μk)
Before simplying, we should note the following:
Depending on the models for class density, different techniques emerge.
* linear and quadratic discrimnant analysis use Gaussian densities
* Mixture of Gaussians lead to non-linear decision boundaries.
* Navie Bayes models assume that each class density is product of marginal densities.
Derivations:Suppose that each class density is modelled as multivariate Gaussian. fk(x)=1(2π)p/2|Σk|1/2e−12(x−μk)TΣ−1k(x−μk)
Clear that
Pr(G=k|X=x)=fk(x)πk∑Kl=1fl(x)πlPr(G=l|X=x)=fl(x)πl∑Kp=1fp(x)πpPr(G=k|X=x)Pr(G=l|X=x)=fk(x)πkfl(x)πllog(Pr(G=k|X=x)Pr(G=l|X=x))=logfk(x)fl(x)+logπk(x)πl(x)
Now expression fk(x)fl(x) can be simplified as follows. Note Σ is assumed to be same for all the classes.
log(fk(x)fl(x))=[−12(x−μk)TΣ−1(x−μk)+12(x−μl)TΣ−1(x−μl)]
And
=−12[xTΣ−1x−xTΣ−1μk−μTkΣ−1x+μTkΣ−1μk−xTΣ−1x+xTΣ−1μk+μlΣ−1x−μTlΣ−1μl]
Next
=12[−xTΣ−1x+xTΣ−1μk+μTkΣ−1x−μTkΣ−1μk+xTΣ−1x−xTΣ−1μl−μTlΣ−1x+μTlΣ−1μl]
Before simplying, we should note the following:
- Since Σ is a covariance matrix and since covariance between i and j is same as j and i, then ΣT=Σ=Σ−1.
- The mean matrices μk for class k and μl for class l are p×1 matrix.
- Terms such as μkΣ−1μl have dimensions 1×p∗p×p∗p×1=1×1 or scalars aka reals.
Collecting third and seventh terms x at end.
[12(μTk−μTl)Σ−1x]T=12xTΣ−1(μk−μl)
The remaining terms are
−12μlkΣ−1μl+12μTlΣ−1μl+12μTkΣ−1μl−12(μTkΣ−1μl)T=12(μk+μl)TΣ−1(μk−μl)
Combining all those simplification,
log(Pr(G=k|X=x)Pr(G=l|X=x))=log(πkπl)−12(μk+μl)TΣ−1(μk−μl)+xTΣ−1(μk−μl)
At the line that divides class k,l has probabilities Pr(G=k|X=x)=Pr(G=l|X=x)
log(1)=0=log(πk)−log(πl)−12μTkΣ−1μk+xTΣ−1μk+12μTlΣ−1μl−xTΣ−1μl
From here, the linear discriminant can be written as
δk=log(πk)−12μTkΣ−1μk+xTΣ−1μk
No comments:
Post a Comment