Skip to content

Hypothesis Test

9 Comparing the means of two or more data variables or groups

  • T-test compares means: The t-test is used to compare the average (mean) of a variable in one group to the average of the same variable in one or more other groups.
  • Null hypothesis: The null hypothesis assumes no difference in the population means between the groups.
  • One-tailed vs. two-tailed tests: A one-tailed test checks if the difference is specifically lower or higher than zero, while a two-tailed test checks for any difference (either higher or lower).
  • Common methods for comparing means: The text lists four common methods for comparing means, but the specific methods are not mentioned in this excerpt.

9.1 Independent samples t-test

  • Independent Samples t-test (Unpaired Samples t-test): This test compares the means of two independent groups of data.
  • Applications:
    • Determining if means are equivalent: Used to see if two samples drawn from the same demographic have similar means.
    • Inferring about population means: Helps determine if the means of two different populations are significantly different.
    • Example: Comparing average test scores of men and women to see if the difference is due to chance or a real effect.
  • When to use:
    • When the population mean and standard deviation are unknown.
    • When the two samples are distinct and unrelated.
  • Assumptions:
    • Independence: The two groups being compared must be independent (e.g., males and females).
    • Normality: The data for the dependent variable should be normally distributed.
    • Homogeneity of Variance: The variances of the two groups should be approximately equal.

9.2 One sample t-test

  • One-sample t-test: A statistical test to check if the mean of a single sample deviates from a pre-determined value (not from the sample itself).
  • Purpose: Used to see if there’s enough evidence to reject the null hypothesis that the population mean is equal to the given value.
  • Hypotheses:
    • Null hypothesis (H0): Population mean equals the given value.
    • Alternative hypothesis (H1): Population mean is different from the given value.
  • Distinction: Unlike most statistical tests, the one-sample t-test doesn’t assess relationships between variables or compare groups. It directly compares a single variable’s data to a researcher-chosen value.
  • Types: Can be one-tailed (testing for deviation in one direction) or two-tailed (testing for deviation in either direction).
  • Assumptions:
    • Sample data is independent.
    • Data is randomly sampled.
    • Data has a normal distribution.

9.3 Paired samples t-test

  • Purpose: The paired sample t-test (also called the dependent sample t-test) determines if the average difference between two sets of data is statistically significant.
  • Data Structure: Each object or entity is measured twice, resulting in two sets of related observations.
  • Applications: Commonly used in case-control studies and designs involving repeated measurements (e.g., evaluating the effectiveness of a training program before and after completion).
  • Hypotheses:
    • Null Hypothesis (H0): The true mean difference between the paired datasets is zero (no difference).
    • Alternative Hypothesis (H1):
      • Two-tailed: The true mean difference is not equal to zero (could be positive or negative).
      • Upper-tailed: The true mean difference is greater than zero (positive difference).
      • Lower-tailed: The true mean difference is less than zero (negative difference).

9.4 ANOVA

  • ANOVA (Analysis of Variance): A statistical method used to determine if there are significant differences between means of two or more populations.
  • How it works: ANOVA uses the law of total variance to partition observed variance into components associated with different sources of variation.
  • Purpose: It expands on the t-test, allowing for comparisons of more than two means.
  • Assumptions:
    • Independent Samples: Subjects in different groups cannot be the same individuals.
    • Equal Sample Sizes: Groups should have similar numbers of participants.
    • Normal Distribution: The dependent variable should follow a normal distribution.
    • Homoscedasticity: Population variances should be equal across groups.
  • Types of ANOVA:
    • One-Way ANOVA: One categorical independent variable (factor) and one continuous dependent variable. Example: Testing the effect of a treatment on anxiety.
    • Two-Way ANOVA: Two or more categorical independent variables and one continuous dependent variable. Example: Testing the effect of employment status on a dependent variable.

9.5 The Chi-square tests

  • Purpose: The Chi-square test is a statistical method used to compare two datasets without relying on assumptions about their distribution.
  • Data Type: It’s most commonly used when data consists of frequencies, like the number of responses in different groups.
  • Applications: It’s particularly useful in biomedical statistics for:
    • Proportion: Comparing proportions between groups.
    • Interrelation: Examining the relationship between two categorical variables.
    • Goodness of Fit: Assessing how well observed data fits a theoretical model.
  • Formula: The Chi-square statistic (c2) measures the difference between observed (O) and expected (E) frequencies: c2 = Σ (O - E)2 / E
  • Advantages:
    • Can compare binomial samples, even with small sample sizes (less than 30).
    • Can analyze multinomial samples, comparing frequencies across multiple categories (e.g., diabetic/non-diabetic in different weight groups).
  • Usefulness: It’s a valuable tool when other parametric tests are not suitable.

9.6 Test of independence

  • Test of Independence: This statistical test helps determine if there’s a relationship between two or more features in a dataset.
  • How it Works:
    • Data is classified based on the features being analyzed.
    • The test aims to see if the features are associated (positively, negatively, or not at all).
    • A null hypothesis of no association is tested.
  • Example Applications:
    • Examining the relationship between punctuality and passing rates.
    • Determining if paracetamol is effective in reducing fever.
  • Interpreting Results:
    • The calculated chi-square value is compared to a critical value based on degrees of freedom and significance level.
    • If the calculated chi-square is below the critical value, the null hypothesis (no association) is accepted.
    • If the calculated chi-square is above the critical value, the null hypothesis is rejected, suggesting an association between the features.

9.7 Test of goodness of fit

  • Purpose of the Chi-Square Test: The most valuable application of the Chi-Square test is assessing ""goodness of fit."" This means determining how well an observed frequency distribution aligns with a theoretical distribution.
  • Process:
    • Hypotheses: Formulate a null hypothesis (no difference between observed and expected) and an alternative hypothesis (there is a difference). Choose a significance level for rejecting the null hypothesis.
    • Sampling: Draw a random sample from the population.
    • Expected Frequencies: Calculate expected frequencies based on the assumed probability distribution.
    • Comparison: Compare observed frequencies to expected frequencies.
  • Interpreting Results:
    • Good Fit: If the calculated chi-square value is less than the table value (at the chosen significance level and degrees of freedom), the fit is considered good. The discrepancy is likely due to random sampling.
    • Poor Fit: If the calculated chi-square value is greater than the table value, the fit is poor. The discrepancy is likely due to the theory not adequately explaining the observed data.

9.8 Correlation and regression

  • Correlation and Regression Analyze Relationships: These two statistical methods are used to examine the connection between two quantitative variables.
  • Correlation Focuses on Presence of a Link: Correlation seeks to determine if a relationship exists between two variables (""x"" and ""y"").
  • Spearman’s and Pearson’s Correlation Coefficients: The most widely recognized correlation coefficients are Spearman’s rho and Pearson’s product-moment correlation coefficient.
  • Regression Predicts Dependent Variable: Regression uses the relationship between variables to predict the value of the dependent variable based on the independent variable’s value.
  • Understanding the Relationship is Key: There is complexity in discerning the true meaning of the relationship between variables.

9.9 A look into correlation and regression

  • Correlation: Measures the strength and direction of the relationship between two variables.
    • Positive Correlation: Variables move in the same direction (e.g., profit and investment).
    • Negative Correlation: Variables move in opposite directions (e.g., price and demand).
  • Linear Regression: Predicts the change in a dependent variable based on the change in one or more independent variables.
    • Used to establish the relationship between variables and make predictions about future outcomes.
    • Equation for a simple linear regression: y = a + bx (where y is the dependent variable, x is the independent variable, a is a constant, and b is the regression coefficient).
  • Examples of Correlation and Regression:
    • Food quantity and weight
    • Medicine dose and blood pressure
    • Air temperature and metabolic rate