Probability and Probability Distribution
8 Probability
- Probability is the measure of the likelihood of an event occurring. It is a key concept in determining the significance of biological observations.
- Probability is represented by ""p"" and ranges from 0 to 1. A probability of 0 means an event is impossible, while a probability of 1 means it is certain.
- The probability of an event not occurring is ""q,"" and the relationship between ""p"" and ""q"" is p + q = 1.
- Probability is calculated by dividing the number of events occurring by the total number of trials.
- A strong understanding of probability is crucial for evaluating the significance of biological findings.
8.1 Laws of probability
- Mutually exclusive events: These are events where the occurrence of one eliminates the possibility of another happening at the same time (e.g., flipping a coin and getting heads prevents getting tails).
- Addition law of probability: This law applies to mutually exclusive events. It states that the overall probability of an event occurring is the sum of the probabilities of each individual, mutually exclusive event.
- Formula: If there are ‘n’ mutually exclusive events, and each has a probability of ‘pi’, then the overall probability (P) is: P = p1 + p2 + … + pn = 1
- Keywords: The word ""or"" typically indicates the use of the addition law (e.g., ""birth of Rh-negative or Rh-positive baby"").
8.1.5 Probability of calculated values from tables
- Probability from Tables: The probability of calculated values for ""t"" (t-statistic) and ""c2"" (Chi-square) can be found using statistical tables.
- Life Tables and Mortality: Probabilities of death or survival at different ages are obtained from life tables, which are based on mortality data from a large population sample.
- Modified Life Tables: Modified life tables are used to determine the probability of survival after a medical intervention or procedure.
6 Different types of distributions and their significance
- What is a distribution? A distribution is a collection of data points associated with a variable, typically sorted from lowest to highest.
- The Normal Distribution: The Gaussian (or Normal) distribution is a common and important distribution. It provides a formula for calculating the probability of each data point.
- Probability Density Function: This function shows how the data is clustered or dense.
- Cumulative Density Function: This function provides a comprehensive view of the relationships within the data.
- Example: Height Distribution: The text uses height as an example. The distribution function describes the relationship between different height readings.
- Importance of Distributions: Knowing the distribution of a data set can help in calculating probabilities, defining connections within the data, and making predictions.
6.1 Probability distributions
- Probability distributions represent the potential readings and probabilities for a random vector within a specific range.
- They are defined by parameters like average, standard deviation, skewness, and kurtosis.
- The normal distribution (bell-shaped curve) is a common example, but many others exist.
- Probability density is the procedure for obtaining information about a phenomenon and its probabilistic model.
- Two types of probability distributions are used in data design:
- Discrete probability function
- Continuous probability function
- Examples of probability distributions include:
- Normal distribution
- Chi-square test distribution
- Binomial distribution
- Poisson distribution
- Binomial distribution calculates the likelihood of events occurring multiple times across a series of experiments. It is discrete because the outcome is either 1 or 0.
6.2.4 Exponential distribution
- Exponential Distribution: This distribution models the time between events in a Poisson process, where events occur independently and at a constant rate.
- Key Features:
- It’s a continuous version of the geometric distribution.
- It has ""memorylessness"", meaning past events don’t affect future probabilities.
- Probability Density Function (PDF):
- f(x; λ) = λe^(-λx) for x ≥ 0
- f(x; λ) = 0 for x < 0
- λ is the rate parameter (λ > 0)
- Cumulative Distribution Function (CDF):
- F(x; λ) = 1 - e^(-λx) for x ≥ 0
- F(x; λ) = 0 for x < 0
- Applications: Beyond Poisson processes, exponential distributions are used in various fields like reliability engineering, queuing theory, and survival analysis.
6.3 Discrete probability function
Key Concepts
- Discrete Probability Function: This function describes the likelihood of each possible value for a discrete random variable. For example, the number of rotten eggs in a tray is a discrete probability function.
- Binomial Distribution: This distribution is used when there are two possible outcomes (success or failure) in a series of independent trials.
- Binomial Distribution Formula:
f(k; n; p) = Pr(k; n; p) = Pr(X = k) = (n choose k) * p^k * (1 - p)^(n-k)- This formula calculates the probability of getting exactly ‘k’ successes in ‘n’ trials, where ‘p’ is the probability of success on a single trial.
- Cumulative Distribution Function: This function calculates the probability of getting ‘k’ or fewer successes in ‘n’ trials.
In simpler terms:
- The text introduces the concept of a discrete probability function, which maps each possible value of a discrete random variable to its probability.
- It then focuses on the binomial distribution, which is useful for scenarios with two outcomes (like success or failure) in multiple attempts.
- The text provides the formula for calculating the probability of a specific number of successes in a given number of trials using the binomial distribution.
- It also briefly mentions the cumulative distribution function, which gives the probability of getting up to a certain number of successes.
6.3.3 Poisson distribution
- Poisson Distribution: This distribution is used to calculate the probability of a specific number of events occurring within a set time or space, given a constant rate of events.
- Characteristics:
- Discontinuous likelihood distribution
- Positive skewness
- Approaches a normal distribution with a large average
- Skewness decreases as the average increases
- Formula:
- f(k; l) = Pr(X = k) = (l^k * e^-l) / k!
- Where:
- k = number of events
- l = average rate of events
- e = Euler’s number (2.71828)
6.4 Normal distribution and normal curve
- Normal Distribution: A symmetrical and balanced distribution where data points near the mean occur more frequently than those far away.
- Bell Curve: A graphical representation of a normal distribution, resembling a bell shape.
- Characteristics of a Normal Distribution:
- Symmetry: The distribution is symmetrical around the mean.
- Mean, Standard Deviation: These parameters fully describe the distribution.
- Kurtosis: The distribution exhibits a specific level of kurtosis (a measure of the peak’s sharpness).
- Conditions for a Normal Distribution:
- Standard deviation equal to one.
- Skewness of zero (no asymmetry).
- Kurtosis of 3.
6.5 Normal curve
- Normal Curve: A symmetrical frequency curve formed by a histogram with a high number of observations and a narrow class interval.
- Characteristics:
- Balanced frequency distribution around a single peak (mean, median, and mode coincide).
- Rises progressively from low frequencies at the extremes to the highest frequency at the peak.
- Significance:
- Used to test the standard deviation (SD) and determine if it represents a large, random sample.
- Facilitates statistical analysis and indicates the probability of occurrence of observations within a population.
- ""Normal"" and ""Abnormal"": The term ""normal"" refers to a statistical concept, not a clinical one. ""Abnormal"" findings may have a negative prognosis but are not used in clinical settings.
6.6 Asymmetrical distribution
- Distribution Types: There are various types of distributions observed in nature, including the ""normal distribution"" and asymmetrical distributions.
- Asymmetrical Distributions: These distributions have skews, meaning the long tail of the curve extends either to the left or right of the highest frequencies.
- Skewness: A distribution can be skewed to the left or right, indicating the direction of the long tail.
- Bimodal Distributions: These distributions have two peaks, suggesting the presence of two distinct groups within the population.
- Diversity: Bimodal distributions often indicate a diverse sample, meaning the individuals in the sample belong to two different subgroups.