Intro to Statistical methods using RStudio
Page 1: Data handling and descriptive statistics,
Page 2: Probability,
Page 3:
Intervals and sample size,
Page 4: Hypothesis Testing,
Page 5: Contingency tables,
Page 6: Linear Regression.
Page 1 | Page 2 | Page 3 | Page 4 | Page 6 |
Page 5: Contingency tables,
1. The test of independence:
Contingency Tables In a test of independence, the hypotheses are expressed in words. A contingency table consists of two factors, the null hypothesis states that the factors are independent and the alternative hypothesis states that they are not independent. The chi square test is conducted using the R function: chisq.test
Construct a table or matrix: In the 2020 elections, men voted 45% for Biden, 53% for Trump; women, 57% for Biden, 42% for Trump.
Suorce: https://ropercenter.cornell.edu/how-groups-voted-2020
Voting for one candidate or the other is independent of the voter gender?
men<-c(45,53)
women<-c(57,42)
table1<-rbind(men, women);table1
colnames(table1) <- c("Biden","Trump");table1
chisq.test(table1)
2. Generate a table from a dataset:
require(MASS)
head(Melanoma,3) #https://stat.ethz.ch/R-manual/R-devel/library/boot/html/melanoma.html
str(Melanoma) #Indicator of ulceration; 1=present, 0=absent.
#
The patients sex; 1=male, 0=female.
table2 <- xtabs(~sex+ulcer, data=Melanoma);table2
rownames(table2) <- c("fem","male");table2
prop.table(table2)
chisq.test(table2)
3. Goodness of fit test:
In Chi-Square goodness of fit test, the term goodness of fit is used to compare the observed sample distribution with the expected probability distribution.
#goodness of fit
#say that you roll a die 180 times and observe the following data.
#
Does it show that it is fair die?
#on the long run, prob for each output in a fair die is 1/6
obs <- c(24,32,26,40,33,25) # number of ones, twos, threes, etc
p <- c(1/6,1/6,1/6,1/6,1/6,1/6)
chisq.test(obs, p=p)
4. Homogeneity test:
The test of independence makes use of a contingency table to determine the independence of two factors. The test for homogeneity determines whether two populations come from the same distribution, even if this distribution is unknown.
library(MASS)
head(survey)
table3 <- xtabs(~Sex+Smoke, data=survey);table3
chisq.test(table3)