Wednesday, July 18, 2018

Machline Learning-Kernel Smoothing methods

Converted document
Chapter 5:
Kernel Smooth Methods:
The set up is as follows:
You have input data - this is called matrix X.
Dimension of this matrix are N × p where N is number of rows. Here each row corresponds to a sample from your experiment.
ie/ Inputs. p is number of features.
Your output is collected into a matrix called y. Most of the time y is a N × 1
Typical linear regression is expressed as y = Xβ where βare called coeff of linear regression.
For example you have a function y = β0 + β1x
These functions are linear in β. For example y = β0 + β1x + β2x2 is still a linear function.
If output is a real value, then it is called regression. And if output is categorical variable - ex: Obese/NonObese. We use logistic regression and its variations.
Key term here is “localization”. Idea is fit a simple model at each of the query points and from there infer a overall function f(X)
This locatization is achieved via using of “Kernels”. Basic set up and terminology is - kernel is Kλ(x0, x) where x0is query point and xis any arbitary point. λwhich is size of neighborhood,
In these models λis a parameter.
Simplest way to understand one dim Kernels.
One simple is to select a neighborhood λand simply average all the distances from your query point x0to all other points within λdistance or ball. Other way is to simply fit a linear function in each neighborhood. Here again you loose continity. This lack of continuity gets resolved using Nadara-Watson average.
f(x0) = ({i = 1}NKλ(x0, xi)yi)/({i = 1}NKλ(x0, xi))
epinichekov kernel.
Kλ(x0, xi) = D((|x0 − xi|)/(λ))
D(t) = (3)/(4)(1 − t2) if |t| < 1
else D(t) = 0
These algorithms have issues at Boundaries:
At boundaries local cluster or neighborhood of points may generate a curve that takes off in different direction compared with the direction overall function takes.
This leads to poor predictability.
We need to tackle this.
To tackle this - as a start we use linear regression for these points.
Linear regression is done for each cluster of points or neighborhood, but applied only to x0your query point at Boundary.
minNi = 1Kλ(x0, xi)[yi − α(x0) − β(x0)xi]2
Then your end function or boundary function is simply f(x0) = α(x0) + β(x0)x0
Defune b(x)T = (1, x) and define B as N × 2 and we define W(x0) which is a diagnol weights N × N
f(x0) = b(x0)T(BTW(x0)B){ − 1)BTy = Ni = 1li(x0)yi
where li(x0) are weights.

No comments:

Post a Comment

Chain complexes on Hilbert spaces

 Chain complexes are mathematical structures used extensively in algebraic topology, homological algebra, and other areas of mathematics. Th...