Mathematics&Physics : Machline Learning-Kernel Smoothing methods

Converted document

Chapter 5:

Kernel Smooth Methods:

The set up is as follows:

You have input data - this is called matrix X.

Dimension of this matrix are N × p where N is number of rows. Here each row corresponds to a sample from your experiment.

ie/ Inputs. p is number of features.

Your output is collected into a matrix called y. Most of the time y is a N × 1

Typical linear regression is expressed as y = Xβ where βare called coeff of linear regression.

For example you have a function y = β₀ + β₁x

These functions are linear in β. For example y = β₀ + β₁x + β₂x² is still a linear function.

If output is a real value, then it is called regression. And if output is categorical variable - ex: Obese/NonObese. We use logistic regression and its variations.

Key term here is “localization”. Idea is fit a simple model at each of the query points and from there infer a overall function f(X)

This locatization is achieved via using of “Kernels”. Basic set up and terminology is - kernel is K_λ(x₀, x) where x₀is query point and xis any arbitary point. λwhich is size of neighborhood,

In these models λis a parameter.

Simplest way to understand one dim Kernels.

One simple is to select a neighborhood λand simply average all the distances from your query point x₀to all other points within λdistance or ball. Other way is to simply fit a linear function in each neighborhood. Here again you loose continity. This lack of continuity gets resolved using Nadara-Watson average.

f(x₀) = (∑_{{i = 1}^N}K_λ(x₀, x_i)y_i)/(∑_{{i = 1}^N}K_λ(x₀, x_i))

epinichekov kernel.

K_λ(x₀, x_i) = D((|x₀ − x_i|)/(λ))

D(t) = (3)/(4)(1 − t²) if |t| < 1

else D(t) = 0

These algorithms have issues at Boundaries:

At boundaries local cluster or neighborhood of points may generate a curve that takes off in different direction compared with the direction overall function takes.

This leads to poor predictability.

We need to tackle this.

To tackle this - as a start we use linear regression for these points.

Linear regression is done for each cluster of points or neighborhood, but applied only to x₀your query point at Boundary.

min∑^N_i = 1K_λ(x₀, x_i)[y_i − α(x₀) − β(x₀)x_i]²

Then your end function or boundary function is simply f(x₀) = α(x₀) + β(x₀)x₀

Defune b(x)^T = (1, x) and define B as N × 2 and we define W(x₀) which is a diagnol weights N × N

f(x₀) = b(x₀)^T(B^TW(x₀)B)^{ − 1)B^Ty = ∑^N_i = 1l_i(x₀)y_i

where l_i(x₀) are weights.

Mathematics&Physics

Wednesday, July 18, 2018

Machline Learning-Kernel Smoothing methods

No comments:

Post a Comment

Chain complexes on Hilbert spaces

Search This Blog