From: aaronshaw Date: Thu, 9 May 2019 00:29:01 +0000 (-0500) Subject: updated to match content of readings X-Git-Url: https://code.communitydata.science/stats_class_2019.git/commitdiff_plain/15a75e0555c577856bf7d01910c2e9861ae4d640?ds=inline;hp=792aab610bb1281787880b132a74d329b380ffa6 updated to match content of readings --- diff --git a/r_lectures/w07-R_lecture.Rmd b/r_lectures/w07-R_lecture.Rmd index 1861cb6..ecc441c 100644 --- a/r_lectures/w07-R_lecture.Rmd +++ b/r_lectures/w07-R_lecture.Rmd @@ -61,13 +61,9 @@ chisq.test(table(iris$Species, iris$Sepal.Width > 3)) The incredibly low p-value means that it is very unlikely that these came from the same distribution and that sepal width differs by species. +## BONUS: Using simulation to test hypotheses and calculate "exact" p-values - -## Using Simulation - -When the assumptions of Chi-squared tests aren't met, we can use simulation to approximate how likely a given result is. - -The book uses the example of a medical practitioner who has 3 complications out of 62 procedures, while the typical rate is 10%. +When the assumptions of $\chi^2$ tests aren't met, we can use simulation to approximate how likely a given result is. The material here comes from the final two sections of Chapter 6 of the *OpenIntro* textbook. The book uses the example of a medical practitioner who has 3 complications out of 62 procedures, while the typical rate is 10%. The null hypothesis is that this practitioner's true rate is also 10%, so we're trying to figure out how rare it would be to have 3 or fewer complications, if the true rate is 10%. @@ -84,7 +80,6 @@ simulation <- function(rate = .1, n = 62){ } # The replicate function runs a function many times - simulated_complications <- replicate(5000, simulation()) ``` @@ -92,12 +87,10 @@ simulated_complications <- replicate(5000, simulation()) We can look at our simulated complications ```{r} - hist(simulated_complications) ``` -And determine how many of them are as extreme or more extreme than the value we saw. This is the p-value. - +And determine how many of them are as extreme or more extreme than the value we saw. This is the "exact" p-value. ```{r} sum(simulated_complications <= 3)/length(simulated_complications)