Skip to content

Probability and Probability Distribution

8 Probability

  • Probability is the measure of the likelihood of an event occurring. It is a key concept in determining the significance of biological observations.
  • Probability is represented by ""p"" and ranges from 0 to 1. A probability of 0 means an event is impossible, while a probability of 1 means it is certain.
  • The probability of an event not occurring is ""q,"" and the relationship between ""p"" and ""q"" is p + q = 1.
  • Probability is calculated by dividing the number of events occurring by the total number of trials.
  • A strong understanding of probability is crucial for evaluating the significance of biological findings.

8.1 Laws of probability

  • Mutually exclusive events: These are events where the occurrence of one eliminates the possibility of another happening at the same time (e.g., flipping a coin and getting heads prevents getting tails).
  • Addition law of probability: This law applies to mutually exclusive events. It states that the overall probability of an event occurring is the sum of the probabilities of each individual, mutually exclusive event.
  • Formula: If there are ‘n’ mutually exclusive events, and each has a probability of ‘pi’, then the overall probability (P) is: P = p1 + p2 + … + pn = 1
  • Keywords: The word ""or"" typically indicates the use of the addition law (e.g., ""birth of Rh-negative or Rh-positive baby"").

8.1.5 Probability of calculated values from tables

  • Probability from Tables: The probability of calculated values for ""t"" (t-statistic) and ""c2"" (Chi-square) can be found using statistical tables.
  • Life Tables and Mortality: Probabilities of death or survival at different ages are obtained from life tables, which are based on mortality data from a large population sample.
  • Modified Life Tables: Modified life tables are used to determine the probability of survival after a medical intervention or procedure.

6 Different types of distributions and their significance

  • What is a distribution? A distribution is a collection of data points associated with a variable, typically sorted from lowest to highest.
  • The Normal Distribution: The Gaussian (or Normal) distribution is a common and important distribution. It provides a formula for calculating the probability of each data point.
  • Probability Density Function: This function shows how the data is clustered or dense.
  • Cumulative Density Function: This function provides a comprehensive view of the relationships within the data.
  • Example: Height Distribution: The text uses height as an example. The distribution function describes the relationship between different height readings.
  • Importance of Distributions: Knowing the distribution of a data set can help in calculating probabilities, defining connections within the data, and making predictions.

6.1 Probability distributions

  • Probability distributions represent the potential readings and probabilities for a random vector within a specific range.
  • They are defined by parameters like average, standard deviation, skewness, and kurtosis.
  • The normal distribution (bell-shaped curve) is a common example, but many others exist.
  • Probability density is the procedure for obtaining information about a phenomenon and its probabilistic model.
  • Two types of probability distributions are used in data design:
    • Discrete probability function
    • Continuous probability function
  • Examples of probability distributions include:
    • Normal distribution
    • Chi-square test distribution
    • Binomial distribution
    • Poisson distribution
  • Binomial distribution calculates the likelihood of events occurring multiple times across a series of experiments. It is discrete because the outcome is either 1 or 0.

6.2.4 Exponential distribution

  • Exponential Distribution: This distribution models the time between events in a Poisson process, where events occur independently and at a constant rate.
  • Key Features:
    • It’s a continuous version of the geometric distribution.
    • It has ""memorylessness"", meaning past events don’t affect future probabilities.
  • Probability Density Function (PDF):
    • f(x; λ) = λe^(-λx) for x ≥ 0
    • f(x; λ) = 0 for x < 0
    • λ is the rate parameter (λ > 0)
  • Cumulative Distribution Function (CDF):
    • F(x; λ) = 1 - e^(-λx) for x ≥ 0
    • F(x; λ) = 0 for x < 0
  • Applications: Beyond Poisson processes, exponential distributions are used in various fields like reliability engineering, queuing theory, and survival analysis.

6.3 Discrete probability function

Key Concepts

  • Discrete Probability Function: This function describes the likelihood of each possible value for a discrete random variable. For example, the number of rotten eggs in a tray is a discrete probability function.
  • Binomial Distribution: This distribution is used when there are two possible outcomes (success or failure) in a series of independent trials.
  • Binomial Distribution Formula:
    • f(k; n; p) = Pr(k; n; p) = Pr(X = k) = (n choose k) * p^k * (1 - p)^(n-k)
    • This formula calculates the probability of getting exactly ‘k’ successes in ‘n’ trials, where ‘p’ is the probability of success on a single trial.
  • Cumulative Distribution Function: This function calculates the probability of getting ‘k’ or fewer successes in ‘n’ trials.

In simpler terms:

  • The text introduces the concept of a discrete probability function, which maps each possible value of a discrete random variable to its probability.
  • It then focuses on the binomial distribution, which is useful for scenarios with two outcomes (like success or failure) in multiple attempts.
  • The text provides the formula for calculating the probability of a specific number of successes in a given number of trials using the binomial distribution.
  • It also briefly mentions the cumulative distribution function, which gives the probability of getting up to a certain number of successes.

6.3.3 Poisson distribution

  • Poisson Distribution: This distribution is used to calculate the probability of a specific number of events occurring within a set time or space, given a constant rate of events.
  • Characteristics:
    • Discontinuous likelihood distribution
    • Positive skewness
    • Approaches a normal distribution with a large average
    • Skewness decreases as the average increases
  • Formula:
    • f(k; l) = Pr(X = k) = (l^k * e^-l) / k!
    • Where:
      • k = number of events
      • l = average rate of events
      • e = Euler’s number (2.71828)

6.4 Normal distribution and normal curve

  • Normal Distribution: A symmetrical and balanced distribution where data points near the mean occur more frequently than those far away.
  • Bell Curve: A graphical representation of a normal distribution, resembling a bell shape.
  • Characteristics of a Normal Distribution:
    • Symmetry: The distribution is symmetrical around the mean.
    • Mean, Standard Deviation: These parameters fully describe the distribution.
    • Kurtosis: The distribution exhibits a specific level of kurtosis (a measure of the peak’s sharpness).
  • Conditions for a Normal Distribution:
    • Standard deviation equal to one.
    • Skewness of zero (no asymmetry).
    • Kurtosis of 3.

6.5 Normal curve

  • Normal Curve: A symmetrical frequency curve formed by a histogram with a high number of observations and a narrow class interval.
  • Characteristics:
    • Balanced frequency distribution around a single peak (mean, median, and mode coincide).
    • Rises progressively from low frequencies at the extremes to the highest frequency at the peak.
  • Significance:
    • Used to test the standard deviation (SD) and determine if it represents a large, random sample.
    • Facilitates statistical analysis and indicates the probability of occurrence of observations within a population.
  • ""Normal"" and ""Abnormal"": The term ""normal"" refers to a statistical concept, not a clinical one. ""Abnormal"" findings may have a negative prognosis but are not used in clinical settings.

6.6 Asymmetrical distribution

  • Distribution Types: There are various types of distributions observed in nature, including the ""normal distribution"" and asymmetrical distributions.
  • Asymmetrical Distributions: These distributions have skews, meaning the long tail of the curve extends either to the left or right of the highest frequencies.
  • Skewness: A distribution can be skewed to the left or right, indicating the direction of the long tail.
  • Bimodal Distributions: These distributions have two peaks, suggesting the presence of two distinct groups within the population.
  • Diversity: Bimodal distributions often indicate a diverse sample, meaning the individuals in the sample belong to two different subgroups.