GLMs provide a unified framework for modeling data originating from the exponential family of densities which include Gaussian, Binomial, and Poisson, among others. It is defined in [1] as:
"A generalised linear model (or GLM) consists of three components:
- A random component, specifying the conditional distribution of the response variable Yi (for the ith of n independently sampled observations).
- The linear predictor - that is a linear function of regressors:
ηi = α + β1Xi1+ β2Xi2+ ... - A smoothe and invertible link function, g, which converts the expectation of the response variable, μi ≡ E(Yi), to the linear predictor" [see 2]. So:
g(μi) = g(E[Yi]) = ηi
Note that the linear predictor allows for interaction effects, where one variable depends on another and vice versa, or curvilinear effects (ie powers of x terms), etc. Note that the right hand side represents a linear combination of the explanatory variables.
The key to understanding this is by asking: what is an exponential family? The probability mass function would look like this:
f(x|β) = h(x) e(T(x)g(β)-A(β))
where x and β are the data and parameters respectively, and h, g, T, and A are known functions. Now, the thing is that you can shoe-horn a few distributions into this form. Take the binomial distribution you were taught at high school:
f(k, n, p) = nCk pk(1 - p)n-k
with a bit of basic algebra (try it!), you can get it to look like:
f(x|p) = nCk e(x log[p/(1-p)] – n log(1-p))
Hey, that's the form of the exponential family! Here, we've set p=k/n and
g = log[p/(1-p)]
Interestingly, point 3 (above) says this function for g equals our linear predictor in 2, above, and you can derive the sigmoid/logit function (try it!).
Note that this is not why we use the logit function in linear regression. Often, the argument is that it must map the linear predictor that ranges from ±∞ to [0,1] as we're interested in probabilities. Although logit does this, there are an infinite number of equations that do also. So, here is a link that details why only logit can be the only suitable function.
Now, remember that a Bernoulli distribution is a just a binomial when n=1, so this completely describes the case for logistic regression. This is why when we're looking at a model with a binary response, we tell the GLM to use the Binomal family (see, for instace, here in the Spark docs). What we do for other use cases I shall deal with in another post.
[1] Applied Regression Analysis & Generalized Linear Models, Fox, 3rd edition.
No comments:
Post a Comment