The Poisson distribution

Last updated on 2024-03-12 | Edit this page

Overview

Questions

  • What is the Poisson distribution?
  • What kind of data is it used on?

Objectives

  • Explain how the Poisson distribution is derived from the binomial.
  • Learn to apply the Poisson distribution in R

A special case of the binomial distribution


There is an approximation for the binomial distribution which can often be convenient, specifically if we have

  • many trials (large \(n\)) and
  • a small success probability \(p\).
frog cartoon demonstrating how the Poisson is derived from the Binomial distribution
The Poisson distribution is a special case of the binomial

An example:

  • We now fill the net for 1h, which is for a fixed period of time.
  • We catch about 100 frogs per hour, which means after an hour we have about 100 frogs in the net, and \(n \approx 100\).
  • The fraction of light frogs is low, only 2 % (\(p=0.02\)).
  • After filling the net, we count only the light ones.

We can now ask how many light frogs we can expect to catch per hour. This expected number is called \(\lambda\), and it’s given by
\[\lambda = n*p = 100 \cdot 0.02 = 2.\]

For the possible outcomes, we again just look at the number of light frogs, we are not interested in the number of dark frogs, or how many frogs we caught in total. In this scenario we have reduced the parameters to just one, the rate lambda, which is the expected number of frogs per hour. And the probabilities of the outcomes can be approximated with a Poisson distribution.

What is the Poisson used for?


Even though the Poisson is derived as an approximation of the Binomial, we don’t necessarily need two categories of events to use it. We can also use it to count events of a single category.

lake with only one sort of frogs
Poisson example

For example, consider a two lakes with frogs of only one colour. We might want to compare the density of frogs in these lakes, which can be done by comparing the Poisson rates. For this, we count frogs within \(2 m^2\) regions in both lakes. For each individual lake, this counting process could be described by a Poisson, with the rate giving the average number of frogs per \(2 m^2\).

In general, the poisson describes counting events over a fixed domain, which can be a period of time, or a fixed space. We assume here that events have an underlying rate, called lambda.

Examples:

  • Counting frogs for an hour, or within a defined area of the lake.
  • counting cells or particles in microscopy images within a fixed volume
  • counting how many times something happens in the cell within a fixed period of time.
  • Also mutations in the genome can be approximated by Poisson, because the genome has many base pairs, and the fraction of mutated base pairs is low.

Properties of the Poisson distribution


The distribution has only one parameter, the rate.

The probability for counting \(k\) events over whatever fixed domain you have chosen is

\[\large P(X=k) = \frac{\color{purple}\lambda^k e^{-\color{purple}\lambda}}{k!} .\]

In the next plot, Poisson distributions for different rates are shown:

If we look at the shape of the distribution, we see that

  • for low rates, it is clinched towards zero and has long tails towards larger values.
  • for larger rates (for example a rate of \(10\) shown here in purple) the shape looks more and more like a Gaussian. Indeed, high-valued counts can often be well described with a Gaussian as well.

Another important feature of the Poisson is that its variance and mean are the same, they are both lambda. This means in turn that we can estimate lambda from the sample mean.

Challenge

We are in a diagnostic laboratory that gets blood samples from incoming hospital patients and tests them for some disease. Which of these experiments can be modeled with a Poisson distribution?

  1. Counting the number of positive samples out of 50 samples that get tested successively.
  2. Counting the number of samples that test positive within an hour.
  3. Counting the number of samples that get tested within an hour.

All of these scenarios can be modeled with a Poisson. Counting the number of positive samples out of 50 is more suitable for a binomial distribution, but can indeed be approximated by Poisson.