assessment/interactive_assessment.rmd

   1 ---
   2 title: "Interactive Self-Assessment"
   3 subtitle: "Fall 2020 MTS 525 / COMMST 395 Statistics and Statistical Programming"
   4 output: learnr::tutorial
   5 runtime: shiny_prerendered
   6 ---
   7
   8
   9 ```{r setup, include=FALSE}
  10 library(learnr)
  11 library(tidyverse)
  12
  13 knitr::opts_chunk$set(echo = FALSE, tidy=TRUE)
  14
  15 t <- Sys.time()
  16 question_filename <- paste("question_submission_", t, ".csv", sep="")
  17 code_filename <- paste("code_", t, ".csv", sep="")
  18
  19 #df <- data.frame(label=c('test'), question=c('asd'), answer=c('asd'), correct=c(TRUE), stringsAsFactors=FALSE)
  20 df <- data.frame()
  21
  22 tutorial_event_recorder <- function(tutorial_id, tutorial_version, user_id,
  23                                     event, data) {
  24   # quiz question answered
  25   if (event == "question_submission"){
  26     # nick exasperatedly believes this is the correct way to index the result of strsplit... [[1]][[1]]
  27     data$category <- strsplit(data$label, '_')[[1]][[1]]
  28
  29     df <<- rbind(df, data, stringsAsFactors=FALSE)
  30     #write.table(data, question_filename, append=TRUE, sep=",", row.names=TRUE, col.names=FALSE)
  31     write.table(df, question_filename, append=FALSE, sep=",", row.names=TRUE, col.names=TRUE)
  32
  33   }
  34   # code
  35   if (event == "exercise_submitted"){
  36     write.table(data, code_filename, append=TRUE, sep=",", row.names=TRUE, col.names=FALSE)
  37   }
  38
  39 }
  40 options(tutorial.event_recorder = tutorial_event_recorder)
  41 ```
  42
  43
  44
  45 ## Overview
  46
  47 This is document contains R Markdown code for an *Interactive Self Assessment*. By completing this assessment, both students and the teaching can check in on learning progress.
  48
  49 The Self Assessment is broken into six sections, described below. You can navigate throughout the document using the left-hand column. In general, completely this assessment should take about 60 minutes.
  50
  51 * Overview: you are here.
  52 * Section 1, Warmup Exercises. Contains warm-ups to help you become familiar with the interactive `learnr` environment (learnr is the R package that this assessment relies on). 1 coding question, 2 multiple choice questions. Time estimate: 5 min.
  53 * Section 2, Debugging and Reading R Code. Contains a series of questions that will require you to work with existing R code. 3 coding questions, 4 multiple choice questions. Time estimate: 15 minutes.
  54 * Section 3, Statistics Concepts and Definitions. 12 multiple choice questions about statistics concepts and definitions. Time estimate: 15 minutes.
  55 * Section 4, Distributions. 3 multiples choice questions that involve some minor calculations. Time estimate: 5 minutes.
  56 * Section 5, Computing Probabilities. 6 multiple choice questions that involve calulating probabilities of events. These calculations are more involved than Section 4. Time estimate: 15 minutes.
  57 * Section 6, Helpful Formulas. Contains some helpful formulas that may be useful for Sections 3-5.
  58 * Section 7, Answer Report. Time estimate: 5 minutes. Here, you can use R code (some of which is prepopulated for you) to analyze (or visualize) your performance on the assessment. This provides a way for you to (1) practice exploratory analyses R with data you created yourself (by answering questions) and (2) get immediate feedback about your performance.
  59
  60 Note that you can clear **all** your answers to *all* questions by clicking "Start Over" in the left-hand sidebar, but doing that basically erases all progress in the document and your answers to any questions will be deleted. *Use with caution* (if at all)!
  61
  62 ## Section 1, Warm-up Exercises
  63
  64 This section contains quick warm-up questions, so you can become familiar with how `learnr` works and what to expect from this activity.
  65
  66 ### Code Chunk Warm-up
  67
  68 To get familiar with how code chunks work in `learnr`, let's write R code required to add two numbers: 1234 and 5678 (and the answer is 6912).
  69
  70 The code chunk below is editable and is "pre-populated" with an unfinished function definition. The goal is to add arguments and fill in the body of the function. When finished, you can run the code chunk and it should produce the answer.
  71
  72 If you click "Run Code", you should see the answer below the chunk. That answer will persist as you navigate around this doc.
  73
  74 You can clear your answers by clicking "Start Over" in the top-left of the chunk. You can also clear **all** your answers by clicking "Start Over" in the left-hand sidebar, but doing that basically erases all progress in the document *Use with caution!*
  75
  76 ```{r WarmUp_1, exercise=TRUE, exercise.lines=10}
  77 add <- function() {
  78
  79 }
  80 x = 1234
  81 y = 5678
  82 add(x,y)
  83 ```
  84
  85 ```{r WarmUp_1-solution}
  86 add <- function(value1, value2) {
  87   return(value1 + value2)
  88 }
  89
  90 x = 1234
  91 y = 5678
  92
  93 add(x,y)
  94 ```
  95
  96 ### Multiple Choice Question Warmup
  97 The question below shows how the multiple choice answering and feedback works.
  98 ```{r WarmUp_2}
  99 quiz(
 100   question("Select the answer choice that will return `TRUE` in R.",
 101     answer("1 == 1", message="Good work! Feedback appears here.", correct=TRUE),
 102     answer("1 == 0", message="Not quite! Feedback appears here."),
 103     allow_retry = TRUE
 104   )
 105 )
 106 ```
 107
 108
 109 ## Section 2: Writing and Debugging R Code
 110
 111 ### Debugging a Function
 112 Below, you'll see code to define a function that is *supposed* to perform a transformation on a vector. The problem is that it doesn't work right now.
 113
 114 In theory, the function will take a numeric vector as input (let's call it $x$) and scale the values so they lie between zero and one. (This is sometimes called min-max [feature scaling](https://en.wikipedia.org/wiki/Feature_scaling), and is sometimes used for machine learning.)
 115
 116 The way it *should* do this is by first subtracting the minimum value of $x$ from each element of $x$. Then, the function will divide each element by the difference between the maximum value of $x$ and the minimum value of $x$.
 117
 118 As written now, however, the function does not work! There are at least three issues you will need to fix to get it working. Once you fix them, you should be able to confirm that your function works with the pre-populated example (with the correct output provided). You might also be able to make this code more "elegant" (or alternatively, improve the comments and variable names as you see fit).
 119
 120 Bonus: how might we update this function to scale between any "floor" and "ceiling" value?
 121
 122 ```{r R_debug1, exercise=TRUE}
 123 zeroToOneRescaler <- function() {
 124   # the minimum value
 125   minval <- min(x)
 126   # let's "shift" our vector by subtracting the minimum value of x from each element
 127   shifted <- x - minval
 128
 129   # let's find the difference between max val and min val
 130   difference <- min(x) - max(x)
 131
 132   scaled <- shifted / difference
 133   scaled
 134 }
 135
 136 test_vector = c(1,2,3,4,5)
 137 # Should print c(0, 0.25, 0.5, 0.75, 1.00)
 138 zeroToOneRescaler(test_vector)
 139 ```
 140
 141 ```{r R_debug1-solution}
 142 zeroToOneRescaler <- function(x) {
 143   shifted <- x - min(x)
 144   difference = max(x) - min(x)
 145   return(shifted / difference)
 146 }
 147
 148 test_vector = c(1,2,3,4,5)
 149 # Should print c(0, 0.25, 0.5, 0.75, 1.00)
 150 zeroToOneRescaler(test_vector)
 151 ```
 152
 153 ```{r R_debug1-response}
 154 quiz(
 155   question("Were you able to solve the debugging question? (this question is for feedback purposes)",
 156     answer("Yes", message="Nice work!", correct = TRUE),
 157     answer("No", message="Good try! If there were specific aspects that were challenging, feel free to reach out to the teaching team.")
 158   )
 159 )
 160 ```
 161
 162
 163 The following commented chunk has at least five (annoying) bugs. Can you uncomment the code, fix all the bugs, and get this chunk to run? These are drawn from real experiences from your TA!
 164 ```{r R_debug2, exercise=TRUE}
 165 # ps2 <- readcsv(file = url(
 166 #   " https://communitydata.science/~ads/teaching/2020/stats/data/week_04/group_03.csv"), row.names = NULL
 167 # )
 168 #
 169 # ps2$y[is.na(ps2$y)] <- 0
 170 # "ps2$'My First New Column' <- ps2$y * -1"
 171 # ps2$'My Second New Column" <- ps2$y + ps2$'My First New Column'
 172 #
 173 # summary(ps2$'My Second New Column']
 174 ```
 175
 176 ```{r R_debug2-solution}
 177 ps2 <- read.csv(file = url("https://communitydata.science/~ads/teaching/2020/stats/data/week_04/group_03.csv"), row.names = NULL)
 178 ps2$y[is.na(ps2$y)] <- 0
 179 ps2$'My First New Column' <- ps2$y * -1
 180 ps2$'My Second New Column' <- ps2$y + ps2$'My First New Column'
 181 summary(ps2$'My Second New Column')
 182 ```
 183
 184 ```{r R_debug2-response}
 185 quiz(
 186   question("Were you able to solve the above debugging question? (this question is for feedback purposes)",
 187            answer("Yes", message="Nice work!", correct = TRUE),
 188            answer("No", message="Good try! If there were specific aspects that were challenging, feel free to reach out to the teaching team."),
 189            allow_retry = TRUE
 190   )
 191 )
 192 ```
 193
 194 ### Updating a visualization
 195 Imagine you've created a histogram to visualize some data from your research (below, we'll use R's built-in "PlantGrowth" dataset). You show your collaborator a histogram of this plot using default R, and they express some concerns about your plot's aesthetics. Replace the base-R histogram with a `ggplot2` histogram that also includes a density plot overlaid on it (maybe in a bright, contrasting color like red).
 196
 197 ```{r R_ggplot, exercise=TRUE}
 198 data("PlantGrowth")
 199 hist(PlantGrowth$weight)
 200 ```
 201
 202 ```{r R_ggplot-solution}
 203 library(ggplot2)
 204
 205 ggplot(PlantGrowth, aes(weight, after_stat(density))) + geom_histogram() + geom_density(color = "red")
 206 ```
 207
 208 Bonus: How would you find more information about the source of this dataset?
 209
 210 ```{r R_ggplot-response}
 211 quiz(
 212   question("Were you able to solve the above plotting question? (this question is for feedback purposes)",
 213            answer("Yes", message="Nice work!", correct = TRUE),
 214            answer("No", message="Good try! If there were specific aspects that were challenging, feel free to reach out to the teaching team."),
 215            allow_retry = TRUE
 216   )
 217 )
 218 ```
 219
 220 ### Interpret a dataframe
 221 ```{r R_columns-setup, exercise=TRUE}
 222 data <- mtcars
 223 data$mpgGreaterThan20 <- data$mpg > 20
 224 data$gear <- as.factor(data$gear)
 225 data$mpgRounded <- round(data$mpg)
 226 ```
 227
 228 The below questions relate to the `data` data.frame defined above, which is a modified version of the classic `mtcars`.
 229
 230 For all answers, assume the above code chunks *has completely run*, i.e. assume all modifications described above were made.
 231 ```{r R_columns}
 232 quiz(
 233   question("Which of the following best describes the `mpg` variable?",
 234     answer("Numeric, continuous", correct=TRUE),
 235     answer("Numeric, discrete"),
 236     answer("Categorical, dichotomous"),
 237     answer("Categorical, ordinal"),
 238     answer("Categorical")
 239   ),
 240   question("Which of the following best describes the `mpgGreaterThan20` variable?",
 241     answer("Numeric, continuous"),
 242     answer("Numeric, discrete"),
 243     answer("Categorical, dichotomous", correct=TRUE),
 244     answer("Categorical, ordinal"),
 245     answer("Categorical")
 246   ),
 247   question("Which of the following best describes the `mpgRounded` variable?",
 248     answer("Numeric, continuous"),
 249     answer("Numeric, discrete", correct=TRUE),
 250     answer("Categorical, dichotomous"),
 251     answer("Categorical, ordinal"),
 252     answer("Categorical")
 253   ),
 254   question("Which of the following best describes the `gear` variable?",
 255     answer("Numeric, continuous"),
 256     answer("Numeric, discrete"),
 257     answer("Categorical, dichotomous"),
 258     answer("Categorical, ordinal", correct=TRUE),
 259     answer("Categorical")
 260   )
 261 )
 262 ```
 263
 264 ## Section 3, Statistics Concepts and Definitions
 265 The following is a series of short multiple choice questions. These questions focus on definitions, and should not require performing any computations or writing any code.
 266 ```{r StatsConcepts_lightninground}
 267 m1 <- ""
 268 wolf <- "Think of the 'Boy who cried wolf', with a null hypothesis that no wolf exists. First the boy claims the alternative hypothesis: there is a wolf. The villagers believe this, and reject the correct null hypothesis. Second, the villagers make an error by not believing the boy when he presents a correct alternative hypothesis."
 269
 270 quiz(
 271   question("A hypothesis is typically concerned with a:",
 272     answer("population statistic.", correct = TRUE),
 273     answer("sample statistic.")
 274   ),
 275   question("A sampling distribution is:",
 276     answer("critical to report in your papers."),
 277     answer("theoretically helpful, but rarely available to researchers in practice.", correct = TRUE),
 278     answer("practically useful, but not relies on assumptions that are rarely met.")
 279   ),
 280   question("Z-scores tell us about a value in terms of:",
 281     answer("mean and standard deviation.", correct = TRUE),
 282     answer("sample size and sampling strategy."),
 283     answer("if an effect is causal or not.")
 284   ),
 285   question("A distribution that is right-skewed has a long tail to the:",
 286     answer("right.", correct = TRUE),
 287     answer("left.")
 288   ),
 289   question("A normal distribution can be characterized with only this many parameters:",
 290     answer("1."),
 291     answer("2.", correct = TRUE),
 292     answer("3.")
 293   ),
 294   question("When we calculate standard error, we calculate",
 295     answer("it using a different formula for every type of variable."),
 296     answer("the sample standard error, which is an estimate of the population standard error.", correct = TRUE),
 297     answer("whether or not our result is causal.")
 298   ),
 299   question("When we calculate standard error, we calculate",
 300     answer("using a different formula for every type of variable."),
 301     answer("the sample standard error, which is an estimate of the population standard error.", correct = TRUE),
 302     answer("whether or not our result is causal.")
 303   ),
 304   question("P values tell us about",
 305     answer("the world in which our null hypothesis is true.", correct = TRUE),
 306     answer("the world in which our null hypothesis is false."),
 307     answer("the world in which our data describe a causal effect.")
 308   ),
 309   question("P values are",
 310     answer("a conditional probability.", correct = TRUE),
 311     answer("completely misleading."),
 312     answer("only useful when our data has a normal distribution.")
 313   ),
 314   question("A type 1 error occurs when",
 315     answer("when we reject a correct null hypothesis (i.e. false positive).", correct = TRUE, message=wolf),
 316     answer("when we accept a correct null hypothesis", message=wolf),
 317     answer("when we accept an incorrect null hypothesis (i.e. false negative)", message=wolf)
 318   ),
 319   question("Before we assume independence of two random samples, it is useful to check that",
 320     answer("both samples include over 90% of the population."),
 321     answer("both samples include less than 10% of the population.", correct = TRUE)
 322   )
 323 )
 324 ```
 325
 326 ```{r StatsConcepts_sampling}
 327 quiz(
 328   question("A political scientist is interested in the effect of teaching style type on standardized test performance
 329 She wants to use a sample of 30 classes evenly represented among the Communication, Computer Science, and Business to conduct her analysis. What type of study should she use to ensure that
 330 classes are selected from each region of the world? Assume a limited research budget.",
 331     answer("Observational - simple random sample"),
 332     answer("Observational - cluster"),
 333     answer("Observational - stratifed", correct=TRUE),
 334     answer("Experimental")
 335   )
 336 )
 337 ```
 338
 339 ## Section 4: Distributions
 340 The following questions are in the style of pen-and-paper statistics class exam questions. This section includes three questions about distributions. These questions involve some minor calculations.
 341
 342 ### Percentiles and the Normal Distribution
 343 For the following question, you may want to use this "scratch paper" code chunk.
 344 ```{r Distributions_quartile-scratch, exercise=TRUE}
 345
 346 ```
 347
 348 ```{r Distributions_quartile}
 349 quiz(
 350   question("Heights of boys in a high school are approximately normally distributed with mean of 175 cm
 351 standard deviation of 5 cm. What is the first quartile of heights?",
 352     answer("25 cm"),
 353     answer("167.3 cm"),
 354     answer("171.7 cm", correct=TRUE),
 355     answer("173.5 cm"),
 356     answer("178.3 cm")
 357   )
 358 )
 359 ```
 360
 361
 362 ### Outliers and Skew
 363 Suppose we are reading a paper which reports the following about a column of a dataset:
 364
 365 Minimum value is 0.00125 and Maximum Value is 2.1100.
 366
 367 Mean is 0.41100 and median is 0.27800.
 368
 369 1st quartile is 0.13000 and 3rd quartile is 0.56200.
 370
 371 ```{r Distributions_summary-scratch, exercise=TRUE}
 372
 373 ```
 374
 375 ```{r Distributions_summary}
 376 m1 <- "Under R's default setting, outliers are values that are either greater than the upper bound $Q_3 + 1.5\\times IQR$ OR less than the lower bound $Q_1 - 1.5\\times IQR$. Here, $IQR = 0.562-0.130=0.432$. The upper bound $= 0.562 + 1.5\\times (0.432) = 1.21$. The lower bound is $0.13 - 1.5\\times (0.432) = -0.518$. We see that the maximum value is 2.11, greater than the upper bound. Thus, there is at least one outlier in this sample."
 377
 378 m2 <- "There is at least one outlier on the right, whereas there is none on the left. $|Q_3-Q_2| > |Q_2-Q_1|$, so the whisker for this box plot would be longer on the right-hand side. The mean is larger than the median."
 379 quiz(
 380   question("Are there outliers (in terms of IQR) in this sample?",
 381     answer("Yes", correct = TRUE, message=m1),
 382     answer("No", message=m1)
 383   ),
 384   question("Based on these summary statistics, we might expect the skew of the distribution to be:",
 385     answer("left-skewed", message=m2),
 386     answer("right-skewed", message=m2, correct=TRUE),
 387     answer("symmetric", message=m2)
 388   )
 389 )
 390 ```
 391
 392
 393 ## Sections 5, Computing Probabilities
 394 For each of the below questions, you will need to calculate some probabilities by hand.
 395 You may want to use this "scratch paper" code chunk (possibly in conjunction with actual paper).
 396
 397 ```{r Probabilities-scratch, exercise=TRUE}
 398
 399 ```
 400
 401 ```{r Probabilities_probs}
 402 m1 <- "$P(\\text{Coffee} \\cap \\text{No Milk}) = P(\\text{Coffee})\\cdot P(\\text{No Milk}) = 0.5 \\cdot (1-0.1)  = 0.45$"
 403
 404 m2 <- "Let H be the event of hypertension, M be event of being a male. We see here that $P(H) = 0.15$ whereas $P(H|M) = 0.18$. Since $P(H) \\neq P(H|M)$, then hypertension is not independent of sex."
 405
 406 m3 <- "$P(HIV \\cap HCV) = P(HIV|HCV)\\cdot P(HCV) = 0.1\\cdot 0.02 = 0.002$"
 407
 408 quiz(
 409   question("Suppose in a population, half prefer coffee to tea, and assume that 10 percent of the population does not put milk in their coffee or tea. If coffee vs. tea preference and cow milk are independent, what fraction of the population both prefers coffee and does put milk in their coffee?",
 410     answer("40%", message=m1),
 411     answer("45%", correct = TRUE, message=m1),
 412     answer("50%", message=m1),
 413     answer("55%", message=m1)
 414   ),
 415   question("In the general population, about 15 percent of adults between 25 and 40 years of age are hypertensive.  Suppose that among males of this age, hypertension occurs about 18 percent of the time. Is hypertension independent of sex? ",
 416     answer("Yes", message=m2),
 417     answer("No.", correct=TRUE, message=m2)
 418   ),
 419   question("Co-infection with HIV and hepatitis C (HCV) occurs when a patient has both diseases, and is on the rise in some countries. Assume that in a given country, only about 2% of the population has HCV,  but 25% of the population with HIV have HCV.  Assume as well that 10% of the population with HCV have HIV.  What is the probability that a randomly chosen member of the population has both HIV and HCV?",
 420     answer("0.001", message=m3),
 421     answer("0.01", message=m3),
 422     answer("0.002", correct=TRUE, message=m3),
 423     answer("0.02", message=m3)
 424   )
 425   #question("What might you search for (in Google, your notes, the OpenIntro PDF, etc.) to help with this question?",
 426   #  answer("t test"),
 427   #  answer("laws of probability", correct=TRUE),
 428   #  answer("linear regression"),
 429   #  answer("R debugging")
 430   #)
 431 )
 432 ```
 433
 434 ### Biostats Example
 435 This question is adapted from a biostats midterm exam.
 436 In the past (2015, to be specific), the US Preventive Services
 437 Task Force recommended that women under the age of 50 should
 438 not get routine mammogram screening for breast cancer.  The Task Force
 439 argued that for a woman with a positive mammogram (one suggesting the
 440 presence of breast cancer), the chance that she has breast cancer was
 441 too low to justify a surgical biopsy.
 442
 443 Suppose the data below describe a cohort of 100,000 women age 40 -
 444 49 in whom mammogram screening and breast cancer behaves just like the
 445 larger population.  For instance, in this table, the 3,333 women with
 446 breast cancer represent a rate of 1 in 30 women with undiagnosed
 447 cancer. The numbers in the table are realistic for US women in this
 448 age category.
 449
 450 Has Breast Cancer: 3,296 Positive Test Results and 37 negative test results (3,333 total)
 451
 452 Does not Have Breast Cancer: 8,313 Positive Test Results and 88,354 negative test results (96,667 total)
 453
 454 First, compute the "margins" of the above contingency table.
 455 Row margins: How many total women have breast cancer? How many total women do not have breast cancer?
 456 Column margins: How many total positive test? How many total negative tests?
 457 ```{r Probabilities_mammogram-chunk, exercise=TRUE}
 458
 459 ```
 460
 461 ```{r Probabilities_mammogram}
 462 m1 <- "
 463 $\\Pr(\\textrm{Test}^+ \\cap \\textrm{Cancer}) = 3,296$
 464
 465 $\\Pr(Cancer) = 3,333$
 466
 467 $\\Pr(\\textrm{Test}^+|\\textrm{Cancer}) =$ \
 468 $\\dfrac{\\Pr(\\textrm{Test}^+ \\cap \\textrm{Cancer})}{\\Pr(\\textrm{Cancer})} =$\
 469 $\\dfrac{3,296}{3,333} = 0.989$"
 470
 471 m2 <- "
 472 $Pr(\\textrm{Cancer}|\\textrm{Test}^+) =$
 473
 474 $\\dfrac{\\Pr(\\textrm{Cancer} \\cap \\textrm{Test}^+)}
 475      {\\Pr(\\textrm{Test}^+)}=$
 476
 477
 478  $\\dfrac{3,296}{11,609} = 0.284$"
 479
 480 quiz(
 481   question("Based on this data, what is the probability that a woman has a positive test given that women has cancer?",
 482     answer("98.9%", correct = TRUE, message=m1),
 483     answer("99.9%",message=m1),
 484     answer("89.9%",message=m1),
 485     answer("88.9%",message=m1)
 486   ),
 487   question("Based on this data, what is the probability that a woman has cancer receives a positive test?",
 488     answer("28.4%", correct = TRUE,message=m2),
 489     answer("10.3%",message=m2),
 490     answer("50.7%",message=m2),
 491     answer("97.9%",message=m2)
 492   ),
 493   question("Is the Task Force correct to claim that there is a low probability that a women between 40-49 who tests positive has breast cancer?",
 494     answer("Yes", correct=TRUE),
 495     answer("No")
 496   )
 497 )
 498 ```
 499
 500
 501
 502 ## Useful Formulas
 503 Sample Mean (sample statistic):
 504 $\bar{x}=\frac{\sum_{i=1}^n x_i}{n}$ |
 505 Standard deviation:
 506 $s=\sqrt{\frac{\sum_{i=1}^n (x_i-\bar{x})^2}{n-1}}$ |
 507 Variance:
 508 $var = s^2$
 509
 510 Useful probability axioms:
 511 $\mbox{Pr}(A^c)=1-\mbox{Pr}(A)$ | Pr(A and B) = Pr(A) $\times$ Pr(B) | Pr(A or B) = Pr(A) + Pr(B) - Pr(A and B)
 512
 513 $\mbox{Pr}(A|B)=\frac{\mbox{Pr(A and B)}}{\mbox{Pr(B)}}$\\
 514
 515 Population mean (population statistic):
 516 $\mu = \sum_{i=1}^{n}x\mbox{Pr}(x)$
 517
 518 Z-score:
 519 $z=\frac{x-\mu}{\sigma}$
 520
 521 $x=\mu + z\sigma$\\
 522
 523 $\mbox{P}(x)=\frac{n!}{x!(n-x)!}p^x(1-p)^{n-x}$
 524     ~for~ $x=0,1,2,...,n$
 525
 526 $\mu=np$, $\sigma=\sqrt{np(1-p)}$\\
 527
 528 $\sigma_{\bar{x}}=\frac{\sigma}{\sqrt{n}}$
 529
 530 $\sigma_{\hat{p}}=\sqrt{\frac{p(1-p)}{n}}$
 531
 532 $Q_1 - 1.5 \times IQR, \quad Q_3 + 1.5 \times IQR$
 533
 534
 535
 536 ## Answer Report
 537 Finally, let's generate a report that summarizes your answers to this evaluation.
 538
 539 Answers are written to a file that looks like this: `question_submission-{CURRENT TIME}.csv`.
 540
 541 Take note of this csv file: this is what you will submit to Canvas.
 542
 543 They're also saved in R Studio's global environment as a variable called `df`. Run the below code chunk to see what `df` looks like.
 544
 545 ```{r report1, exercise=TRUE}
 546 df
 547 ```
 548
 549
 550 To check your percentage of correct answers:
 551 ```{r report2, exercise=TRUE}
 552 mean(df$correct)
 553 ```
 554
 555 To check your percentage of correct answers by section:
 556
 557 ```{r report3, exercise=TRUE}
 558 df %>% group_by(category) %>% summarize(avg = mean(correct))
 559 ```