9 Comparing the means of two or more data variables or groups
T-test compares means: The t-test is used to compare the average (mean) of a variable in one group to the average of the same variable in one or more other groups.
Null hypothesis: The null hypothesis assumes no difference in the population means between the groups.
One-tailed vs. two-tailed tests: A one-tailed test checks if the difference is specifically lower or higher than zero, while a two-tailed test checks for any difference (either higher or lower).
Common methods for comparing means: The text lists four common methods for comparing means, but the specific methods are not mentioned in this excerpt.
9.1 Independent samples t-test
Independent Samples t-test (Unpaired Samples t-test): This test compares the means of two independent groups of data.
Applications:
Determining if means are equivalent: Used to see if two samples drawn from the same demographic have similar means.
Inferring about population means: Helps determine if the means of two different populations are significantly different.
Example: Comparing average test scores of men and women to see if the difference is due to chance or a real effect.
When to use:
When the population mean and standard deviation are unknown.
When the two samples are distinct and unrelated.
Assumptions:
Independence: The two groups being compared must be independent (e.g., males and females).
Normality: The data for the dependent variable should be normally distributed.
Homogeneity of Variance: The variances of the two groups should be approximately equal.
9.2 One sample t-test
One-sample t-test: A statistical test to check if the mean of a single sample deviates from a pre-determined value (not from the sample itself).
Purpose: Used to see if there’s enough evidence to reject the null hypothesis that the population mean is equal to the given value.
Hypotheses:
Null hypothesis (H0): Population mean equals the given value.
Alternative hypothesis (H1): Population mean is different from the given value.
Distinction: Unlike most statistical tests, the one-sample t-test doesn’t assess relationships between variables or compare groups. It directly compares a single variable’s data to a researcher-chosen value.
Types: Can be one-tailed (testing for deviation in one direction) or two-tailed (testing for deviation in either direction).
Assumptions:
Sample data is independent.
Data is randomly sampled.
Data has a normal distribution.
9.3 Paired samples t-test
Purpose: The paired sample t-test (also called the dependent sample t-test) determines if the average difference between two sets of data is statistically significant.
Data Structure: Each object or entity is measured twice, resulting in two sets of related observations.
Applications: Commonly used in case-control studies and designs involving repeated measurements (e.g., evaluating the effectiveness of a training program before and after completion).
Hypotheses:
Null Hypothesis (H0): The true mean difference between the paired datasets is zero (no difference).
Alternative Hypothesis (H1):
Two-tailed: The true mean difference is not equal to zero (could be positive or negative).
Upper-tailed: The true mean difference is greater than zero (positive difference).
Lower-tailed: The true mean difference is less than zero (negative difference).
9.4 ANOVA
ANOVA (Analysis of Variance): A statistical method used to determine if there are significant differences between means of two or more populations.
How it works: ANOVA uses the law of total variance to partition observed variance into components associated with different sources of variation.
Purpose: It expands on the t-test, allowing for comparisons of more than two means.
Assumptions:
Independent Samples: Subjects in different groups cannot be the same individuals.
Equal Sample Sizes: Groups should have similar numbers of participants.
Normal Distribution: The dependent variable should follow a normal distribution.
Homoscedasticity: Population variances should be equal across groups.
Types of ANOVA:
One-Way ANOVA: One categorical independent variable (factor) and one continuous dependent variable. Example: Testing the effect of a treatment on anxiety.
Two-Way ANOVA: Two or more categorical independent variables and one continuous dependent variable. Example: Testing the effect of employment status on a dependent variable.
9.5 The Chi-square tests
Purpose: The Chi-square test is a statistical method used to compare two datasets without relying on assumptions about their distribution.
Data Type: It’s most commonly used when data consists of frequencies, like the number of responses in different groups.
Applications: It’s particularly useful in biomedical statistics for:
Proportion: Comparing proportions between groups.
Interrelation: Examining the relationship between two categorical variables.
Goodness of Fit: Assessing how well observed data fits a theoretical model.
Formula: The Chi-square statistic (c2) measures the difference between observed (O) and expected (E) frequencies: c2 = Σ (O - E)2 / E
Advantages:
Can compare binomial samples, even with small sample sizes (less than 30).
Can analyze multinomial samples, comparing frequencies across multiple categories (e.g., diabetic/non-diabetic in different weight groups).
Usefulness: It’s a valuable tool when other parametric tests are not suitable.
9.6 Test of independence
Test of Independence: This statistical test helps determine if there’s a relationship between two or more features in a dataset.
How it Works:
Data is classified based on the features being analyzed.
The test aims to see if the features are associated (positively, negatively, or not at all).
A null hypothesis of no association is tested.
Example Applications:
Examining the relationship between punctuality and passing rates.
Determining if paracetamol is effective in reducing fever.
Interpreting Results:
The calculated chi-square value is compared to a critical value based on degrees of freedom and significance level.
If the calculated chi-square is below the critical value, the null hypothesis (no association) is accepted.
If the calculated chi-square is above the critical value, the null hypothesis is rejected, suggesting an association between the features.
9.7 Test of goodness of fit
Purpose of the Chi-Square Test: The most valuable application of the Chi-Square test is assessing ""goodness of fit."" This means determining how well an observed frequency distribution aligns with a theoretical distribution.
Process:
Hypotheses: Formulate a null hypothesis (no difference between observed and expected) and an alternative hypothesis (there is a difference). Choose a significance level for rejecting the null hypothesis.
Sampling: Draw a random sample from the population.
Expected Frequencies: Calculate expected frequencies based on the assumed probability distribution.
Comparison: Compare observed frequencies to expected frequencies.
Interpreting Results:
Good Fit: If the calculated chi-square value is less than the table value (at the chosen significance level and degrees of freedom), the fit is considered good. The discrepancy is likely due to random sampling.
Poor Fit: If the calculated chi-square value is greater than the table value, the fit is poor. The discrepancy is likely due to the theory not adequately explaining the observed data.
9.8 Correlation and regression
Correlation and Regression Analyze Relationships: These two statistical methods are used to examine the connection between two quantitative variables.
Correlation Focuses on Presence of a Link: Correlation seeks to determine if a relationship exists between two variables (""x"" and ""y"").
Spearman’s and Pearson’s Correlation Coefficients: The most widely recognized correlation coefficients are Spearman’s rho and Pearson’s product-moment correlation coefficient.
Regression Predicts Dependent Variable: Regression uses the relationship between variables to predict the value of the dependent variable based on the independent variable’s value.
Understanding the Relationship is Key: There is complexity in discerning the true meaning of the relationship between variables.
9.9 A look into correlation and regression
Correlation: Measures the strength and direction of the relationship between two variables.
Positive Correlation: Variables move in the same direction (e.g., profit and investment).
Negative Correlation: Variables move in opposite directions (e.g., price and demand).
Linear Regression: Predicts the change in a dependent variable based on the change in one or more independent variables.
Used to establish the relationship between variables and make predictions about future outcomes.
Equation for a simple linear regression: y = a + bx (where y is the dependent variable, x is the independent variable, a is a constant, and b is the regression coefficient).