- Published on
Poisson Model
- Authors
- Name
- Kevin Navarrete-Parra
I am writing quick and easy R guides for my didactic purposes and to provide useful starting places for my peers in grad school. If you see that I have made a mistake or would like to suggest some way to make the post better or more accurate, please feel free to email me. I am always happy to learn from others' experiences!
Table of contents
Model Formula
Poisson models are useful for running regressions on count response variables (i.e., nonnegative integers that follow a Poisson distribution). You can represent this model as
where represents the average or expected average of observed events in the dependent variable, is the intercept, and the values are the coefficients. As you can see, uses the log link.
Importantly, the Poisson model assumes that the mean of the count response variable is equal to the variable's variance, such that . We get the predicted mean for the count response by exponentiating both sides of the equation.
We can also specify the model so that the response variable represents a count value within a given set of times, which is called the incidence rate. This model can be specified as
where is a period of time and indicates the incidence rate. You can also represent this equation as
where is the offset in the model equation.
The Poisson model assumes the response variable follows the Poisson probability distribution, which can be expressed as
where y is the count value of the response variable, is the expected or average of events, and is a factorial of the response. Note that is often represented as as well.
In the Poisson distribution, the count variable's mean is equal to the variable's variance.
The log-likelihood function for the Poisson distribution can be expressed as
where is the log likelihood function of given the values of the count variable.
Incidence Ratios
Returning to the incidence rate from above, the Poisson model estimates the log of the expected counts of an event, given the predictor variables. We can get the expected counts of a given even by exponentiating both sides of the equation, so that
Once you have the incidence rate calculated, you can find the incidence ratio, which will tell you how the count value will change with a one-unit increase in the given independent variable. You can take the incidence rate to calculate the percent change in the response by doing the following:
Running it in R
You can run a Poisson regression in R by using either the glm
function from the stats
package or the vglm
function from the VGAM
package. The easiest way of doing this, though, is by using the glm
function since that comes with base R. Because of that, I'll be focusing on the glm
function, but the code should not be too different for the vglm
function.
poi.mod <- glm(y ~ x1 + x2 + x3, family = poisson, data = data)
summary(poi.mod)
poi.irr <- exp(coef(poi.mod))*sqrt(diag(vcov(poi.mod)))
Diagnostic Statistics
The diagnostic statistics are mostly the same as those for other models covered, so this section will be brief. The one thing worth pointing out is that you can run a likelihood ratio test by fitting a null model.
anova(poi.mod, update(poi.mod, ~ 1), test = "Chisq")
The above code will compare the fitted model to a null model using a Chi-squared test. A significant result indicates that the fitted model fits your data better than the null model.
You can also run Pseudo-, AIC, and BIC tests for this model.