Published on

Poisson Model

Authors
  • avatar
    Name
    Kevin Navarrete-Parra
    Twitter

I am writing quick and easy R guides for my didactic purposes and to provide useful starting places for my peers in grad school. If you see that I have made a mistake or would like to suggest some way to make the post better or more accurate, please feel free to email me. I am always happy to learn from others' experiences!

Table of contents

  1. Model Formula
  2. Incidence Ratios
  3. Running it in R
  4. Diagnostic Statistics

Model Formula

Poisson models are useful for running regressions on count response variables (i.e., nonnegative integers that follow a Poisson distribution). You can represent this model as

ln(μ)=α+β1X1+β2X2+...+βpXpln(\mu) = \alpha + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p

where μ\mu represents the average or expected average of observed events in the dependent variable, α\alpha is the intercept, and the β\beta values are the coefficients. As you can see, ln(μ)ln(\mu) uses the log link.

Importantly, the Poisson model assumes that the mean of the count response variable is equal to the variable's variance, such that E(Y)=Variance(Y)=μE(Y) = Variance (Y) = \mu. We get the predicted mean for the count response by exponentiating both sides of the equation.

μ=exp(α+β1X1+β2X2+...+βpXp)\mu = exp( \alpha + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p )

We can also specify the model so that the response variable represents a count value within a given set of times, which is called the incidence rate. This model can be specified as

ln(μ/t)=α+β1X1+β2X2+...+βpXpln(\mu /t) = \alpha + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p

where tt is a period of time and μ/t\mu /t indicates the incidence rate. You can also represent this equation as

ln(μ)=ln(t)+α+β1X1+β2X2+...+βpXpln(\mu) = ln(t) + \alpha + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p

where ln(t)ln(t) is the offset in the model equation.

The Poisson model assumes the response variable follows the Poisson probability distribution, which can be expressed as

P(y)=euuyy!P(y) = \frac{e^{-u} u^y}{y!}

where y is the count value of the response variable, μ\mu is the expected or average of events, and y!y! is a factorial of the response. Note that μ\mu is often represented as λ\lambda as well.

In the Poisson distribution, the count variable's mean is equal to the variable's variance.

μ=E(y)=Variance(y)\mu = E(y) = Variance(y)

The log-likelihood function for the Poisson distribution can be expressed as

l(μ;y)=i=1ny1lnμiμiln(yi!)l(\mu;y) = \sum^{n}_{i=1}{y_1 ln\mu_i - \mu_i - ln(y_i!)}

where l(μ;y)l(\mu;y) is the log likelihood function of μ\mu given the values of the count variable.

Incidence Ratios

Returning to the incidence rate from above, the Poisson model estimates the log of the expected counts of an event, given the predictor variables. We can get the expected counts of a given even by exponentiating both sides of the equation, so that

μ=exp(α+βX)\mu = exp(\alpha + \beta X)

Once you have the incidence rate calculated, you can find the incidence ratio, which will tell you how the count value will change with a one-unit increase in the given independent variable. You can take the incidence rate to calculate the percent change in the response by doing the following:

(IncidenceRate1)100(Incidence Rate - 1) * 100

Running it in R

You can run a Poisson regression in R by using either the glm function from the stats package or the vglm function from the VGAM package. The easiest way of doing this, though, is by using the glm function since that comes with base R. Because of that, I'll be focusing on the glm function, but the code should not be too different for the vglm function.


poi.mod <- glm(y ~ x1 + x2 + x3, family = poisson, data = data)
summary(poi.mod)

poi.irr <- exp(coef(poi.mod))*sqrt(diag(vcov(poi.mod)))

Diagnostic Statistics

The diagnostic statistics are mostly the same as those for other models covered, so this section will be brief. The one thing worth pointing out is that you can run a likelihood ratio test by fitting a null model.


anova(poi.mod, update(poi.mod, ~ 1), test = "Chisq")

The above code will compare the fitted model to a null model using a Chi-squared test. A significant result indicates that the fitted model fits your data better than the null model.

You can also run Pseudo-R2R^2, AIC, and BIC tests for this model.