What are GLMs?
The class of regression models known as the generalized linear model describes the response variable, Y, and the random error term using the exponential group of distributions such as Binomial, Gamma, normal, Poisson, inverse Gaussian, and so on. GLM presupposes that the response variable’s distribution belongs to the exponential family of distributions. This is in contrast to conventional linear models, in which the Y- response variable, and the random error term must be based only on the distribution. Models can be described in terms of the response variable’s anticipated value (mean).
Different link functions are utilized based on the estimated dispersion of the dependent variables to translate g(μ) to the output value, which is then modeled using various types of regression models. If the response variable has a regular distribution, the link function is the identity function, then the model looks like this. In the following equation, Y represents the anticipated value or E. (Y).
- General models are used to forecast the value of a response variable, where the return variable, Y, and error term (ϵ) all follow a normal distribution X as given predictor value
The normal distribution parameter reflects the mean as a combination of weights (W) and predictor (X), as well as the standard deviation. The general models are represented by generalized linear regression and ANOVA models.
The link function in generalized linear model machine learning is the identity function. Remember that a link function converts the probability of the ranges of a categorical response variable to an unbounded continuous scale. Once the transformation is complete, linear regression may be used to model the connection between the predictors and the response.
When training regression models, it is important to recognize that what is being simulated is the mean of the values of the dependent variable rather than the actual values. Because the response variable Y has a normal distribution, the sum of the weights and the predictor variable may be equated to the expected value of Y.
The identity function is a link function used in the linear regression model to connect the mean of the predicted value of the response variable, Y, and the sum of the weights and predictor variables. As a result, g(E(Y)) becomes E(Y), which is denoted as Ypredicted.
When to utilize GLMs?
Different types of generalized linear models are utilized compared with the theoretical dispersion of the response variables.
- Logistic regression can be employed if the response variable is connected to a binary outcome. Sklearn LogisticRegression may be used to model a binary response variable.
- Poisson regression with log-link is used when the response variable reflects quantities (non-negative integer-valued) or relative frequencies (non-negative).
- GammaRegressor with log-link can be used if the response variable value is high and skewed.
- If the response variable values appear to be heavier than those of a Gamma distribution, an Inverse distribution regressor may be used.
Key takeaways
- GLM can represent response variables using distributions like Gamma, binomial, Tweedie, and so on.
- Python Sklearn includes classes for training GLM models based on probability distribution and response variables.
- The response variable is modeled as a linear mixture of weights and predictor variables provided the response variable and the standard error follows the exponential family of distributions using generalized linear modeling.